Senior Machine Learning Infra Engineer | San Francisco | Competitive Salary EquityOur client is an early-stage AI company building foundation models for physics to enable end-to-end industrial automation, from simulation and design through optimization, validation, and production. They are assembling a small, elite, founder-led team focused on shipping real systems into production, backed by world-class investors and technical advisors.They are hiring a Machine Learning Cloud Infrastructure Engineer to own the full ML infrastructure stack behind physics-based foundation models. Working directly with the CEO and founding team, you will build, scale, and operate production-grade ML systems used by real customers. What you will doOwn distributed training and fine-tuning infrastructure across multi-GPU and multi-node clustersDesign and operate low-latency, highly reliable inference and model serving systemsBuild secure fine-tuning pipelines allowing customers to adapt models to their data and workflowsDeliver deployments across cloud and on-prem environments, including enterprise and air-gapped setupsDesign data pipelines for large-scale simulation and CFD datasetsImplement observability, monitoring, and debugging across training, serving, and data pipelinesWork directly with customers on deployment, integration, and scaling challengesMove quickly from prototype to production infrastructure What our client is looking for3 years building and scaling ML infrastructure for training, fine-tuning, serving, or deploymentStrong experience with AWS, GCP, or AzureHands-on expertise with Kubernetes, Docker, and infrastructure-as-codeExperience with distributed training frameworks such as PyTorch Distributed, DeepSpeed, or RayProven experience building production-grade inference systemsStrong Python skills and deep understanding of the end-to-end ML lifecycleHigh execution velocity, strong debugging instincts, and comfort operating in ambiguity Nice to haveBackground in physics, simulation, or computer-aided engineering softwareExperience deploying ML systems into enterprise or regulated environmentsFoundation model fine-tuning infrastructure experienceGPU performance optimization experience (CUDA, Triton, etc.)Large-scale ML data engineering and validation pipelinesExperience at high-growth AI startups or leading AI research labsCustomer-facing or forward-deployed engineering experienceOpen-source contributions to ML infrastructure This role suits someone who earns respect through hands-on technical contribution, thrives in intense, execution-driven environments, values deep focused work, and takes full ownership of outcomes. The company offers ownership of core infrastructure, direct collaboration with the CEO and founding team, work on high-impact AI and physics problems, competitive compensation with meaningful equity, an in-person-first culture five days a week, strong benefits, daily meals, stipends, and immigration support.
Sam Warwick