Senior MLOps / ML Infrastructure Engineer About the Company Our client is a Series B, venture-backed deep-tech company building a Physics AI platform that helps engineering teams bring products to market faster, reduce development risk, and explore better designs with greater confidence. The platform combines large-scale simulation data with modern machine learning to generate high-fidelity predictions of physical behavior in near real time. Customers include leading organizations across aerospace, automotive, and advanced manufacturing, working on some of the most demanding real-world engineering problems. The Role This role focuses on building and operating the infrastructure that powers physics-based AI systems at scale. The position enables ML engineers and scientists to train, track, deploy, and monitor models reliably without managing low-level infrastructure. The work sits at the intersection of ML systems, cloud infrastructure, and large-scale simulation data, with a strong emphasis on performance, reliability, and developer productivity. It is a hands-on engineering role in a fast-moving, in-office environment, working closely with ML researchers, platform engineers, and product teams. What You’ll DoDesign, build, and maintain robust MLOps infrastructure supporting the full ML lifecycle, from experimentation and training through to production deployment and monitoringImplement automated training pipelines, experiment tracking, and model lifecycle management using tools such as Kubeflow, MLflow, and Argo WorkflowsDevelop scalable data pipelines capable of handling large volumes of unstructured data, particularly 3D geometric data and physics simulation outputsDeploy machine learning models into production inference systems with strong standards for performance, reliability, and observabilityManage model registries and integrate them with CI/CD workflows to support consistent and reliable model releasesImplement monitoring systems that continuously track model health and performance in productionCollaborate closely with ML researchers, platform engineers, and product teams to evolve the infrastructure platform for physics-based AI applicationsWrite production-grade code and optimize cloud infrastructure, primarily on Google Cloud Platform, while making thoughtful trade-offs around scalability, cost, and operational simplicity using Docker and KubernetesWhat We’re Looking ForBachelor’s degree or higher in Computer Science, Data Science, Applied Mathematics, or a closely related field5 years of industry experience building MLOps platforms or ML systems in production environmentsStrong proficiency in Python, with working knowledge of BASH and SQLHands-on experience with cloud infrastructure such as GCP, AWS, or AzureExperience with containerization and orchestration tools including Docker and KubernetesFamiliarity with modern MLOps frameworks such as Kubeflow, MLflow, and Argo WorkflowsExperience building and maintaining scalable data pipelines, ideally working with unstructured or high-dimensional dataAbility to independently deploy models and implement monitored inference systems in productionComfortable troubleshooting complex distributed systems and building reliable infrastructure that other teams depend onNice to HaveInterest in physics simulation, scientific computing, or HPC environmentsExperience building production MLOps platforms in deep-tech or simulation-heavy environmentsFamiliarity with additional programming languages such as Go or C Working Style and Culture This role suits someone who enjoys startup environments, learns quickly, and communicates clearly across disciplines. The team works on-site five days a week and values close collaboration, fast feedback loops, and hands-on problem solving. There is a strong belief that great infrastructure should be largely invisible, enabling engineers and scientists to move faster without friction.
Sam Warwick