I’m working with a well-funded AI research company building the technical foundations for a new class of embodied agents and digital humans - systems designed with genuine, human-like qualities that can interact, collaborate, and form real connections with people. Their long-term aim is to scale this work into multi-agent simulations and entire societies of autonomous AI entities.
As their Member of Technical Staff (ML Infrastructure), you’d design and scale the platforms that make this possible - from high-performance inference engines to distributed training pipelines and large-scale compute clusters that power intelligent, interactive AI systems. You’d work closely with researchers and product engineers to push the limits of inference performance, strengthen the foundations for agentic AI, and evolve the next generation of training and post-training pipelines.
Responsibilities:
- Accelerate research velocity by enabling SOTA experimentation from day one.
- Build and optimize the full model training pipeline, including data collection, data loading, SFT, and RL.
- Design and optimize a high-performance inference platform leveraging both open-source and proprietary engines.
- Develop and scale technologies for large-scale cluster scheduling, distributed training, and high-performance AI networking.
- Drive engineering excellence across observability, reliability, and infrastructure performance.
- Partner with research and product teams to turn cutting-edge ideas into robust, production-ready systems.
Qualifications:
- Expertise in one or more of: inference engines, GPU optimization, cluster scheduling, or cloud-native infrastructure.
- Proficiency with modern ML frameworks such as PyTorch, vLLM, Verl, or similar.
- Experience building scalable, high-performance systems used in production.
- Start-up mindset - adaptable, fast-moving, and high-ownership.
Why This Opportunity Stands Out:
- Elite founding team: Engineers and researchers from MIT, Stanford, Google X, Citadel, and top AI labs.
- Strong funding and backing: Over $40M raised from Prosus, First Spark Ventures, Patron, and notable investors including Patrick Collison and Eric Schmidt.
- Serious traction: Their flagship AI companion product has already achieved significant user growth and is generating real revenue.
- Impact and autonomy: A flat, fast-moving environment where you’ll own critical systems and ship meaningful work within weeks.
- Longevity in vision: This company is not chasing quick exits - they’re deliberately building what they believe will be a historical company, with long-lasting influence on how humans and AI interact.