About the Company
Our client is a Series B, venture-backed deep-tech company building a Physics AI platform that helps engineering teams bring products to market faster, reduce development risk, and explore better designs with greater confidence. The platform combines large-scale simulation data with modern machine learning to generate high-fidelity predictions of physical behavior in near real time.
Customers include leading organizations across aerospace, automotive, and advanced manufacturing, working on some of the most demanding real-world engineering problems.
The Role
This role focuses on building and operating the infrastructure that powers physics-based AI systems at scale. The position enables ML engineers and scientists to train, track, deploy, and monitor models reliably without managing low-level infrastructure. The work sits at the intersection of ML systems, cloud infrastructure, and large-scale simulation data, with a strong emphasis on performance, reliability, and developer productivity. It is a hands-on engineering role in a fast-moving, in-office environment, working closely with ML researchers, platform engineers, and product teams.
What You’ll Do
- Design, build, and maintain robust MLOps infrastructure supporting the full ML lifecycle, from experimentation and training through to production deployment and monitoring
- Implement automated training pipelines, experiment tracking, and model lifecycle management using tools such as Kubeflow, MLflow, and Argo Workflows
- Develop scalable data pipelines capable of handling large volumes of unstructured data, particularly 3D geometric data and physics simulation outputs
- Deploy machine learning models into production inference systems with strong standards for performance, reliability, and observability
- Manage model registries and integrate them with CI/CD workflows to support consistent and reliable model releases
- Implement monitoring systems that continuously track model health and performance in production
- Collaborate closely with ML researchers, platform engineers, and product teams to evolve the infrastructure platform for physics-based AI applications
- Write production-grade code and optimize cloud infrastructure, primarily on Google Cloud Platform, while making thoughtful trade-offs around scalability, cost, and operational simplicity using Docker and Kubernetes
- Bachelor’s degree or higher in Computer Science, Data Science, Applied Mathematics, or a closely related field
- 5 years of industry experience building MLOps platforms or ML systems in production environments
- Strong proficiency in Python, with working knowledge of BASH and SQL
- Hands-on experience with cloud infrastructure such as GCP, AWS, or Azure
- Experience with containerization and orchestration tools including Docker and Kubernetes
- Familiarity with modern MLOps frameworks such as Kubeflow, MLflow, and Argo Workflows
- Experience building and maintaining scalable data pipelines, ideally working with unstructured or high-dimensional data
- Ability to independently deploy models and implement monitored inference systems in production
- Comfortable troubleshooting complex distributed systems and building reliable infrastructure that other teams depend on
- Interest in physics simulation, scientific computing, or HPC environments
- Experience building production MLOps platforms in deep-tech or simulation-heavy environments
- Familiarity with additional programming languages such as Go or C
This role suits someone who enjoys startup environments, learns quickly, and communicates clearly across disciplines. The team works on-site five days a week and values close collaboration, fast feedback loops, and hands-on problem solving. There is a strong belief that great infrastructure should be largely invisible, enabling engineers and scientists to move faster without friction.