Our client is building advanced AI systems with real physical capability. Their work spans experimentation, engineering and automated manufacturing, and they have already delivered large scale projects in the public and private sector. This is a team that invents from first principles and builds end to end systems that push the frontier of physical AI.
They are now searching for a Senior ML Infrastructure / MLOps Engineer to design, operate and scale the backbone that powers large model development. Your work will shape the training, fine tuning and deployment infrastructure across LLMs, RL agents and physics-driven surrogate models.

The role

You will own the systems that enable large scale training, RLHF and DPO workflows, dataset governance, experimentation, reproducibility and model deployment. This includes distributed training design, containerized model runners, data and versioning pipelines, and evaluation automation that keeps model development reliable and fast.

Responsibilities

  • Build and maintain scalable infrastructure for training, fine tuning and distributed ML workflows.
  • Develop dataset pipelines, versioning systems, experiment tracking and reproducibility frameworks.
  • Operate containerized training and inference environments, including CI/CD for models and evaluation tooling.
  • Partner closely with researchers, RL teams, data engineering and systems engineers to support rapid iteration and robust deployment.

What they’re looking for

  • Strong experience in ML infrastructure, distributed training, experiment management or production ML systems.
  • Comfort with containerization, orchestration, dataset governance and model evaluation pipelines.
  • Ability to design reliable, high throughput training and deployment workflows.
  • Someone who enjoys working across ML, infra and data systems in a fast moving research environment.