Machine Learning Engineer – Speech Model Training
$250,000 - $300,000
San Francisco, CA
Hybrid, 3x per week in office
Full time / Permanent
 
In this role you won’t be wrapping APIs or fine-tuning existing models. You’ll be building models across raw acoustic signal processing all the way through to production inference on edge devices. At a company that actually ships to 1.5M live users.
 
A profitable, fast-growing AI company ($250M ARR in under three years, no VC dependency) is standing up a SpeechLLM lab from scratch. This is a founding seat on that team.
 
They build a hardware-software AI companion used daily by over 1.5 million professionals worldwide. The next chapter is a world-class speech intelligence core and they need the engineers to architect it.
 
What you'd own:
  • Design and train large-scale speech models end-to-end. Unified SpeechLLMs, ASR, expressive TTS, generative audio
  • Own the full stack from acoustic feature engineering to GPU cluster optimisation
  • Run and optimise distributed training at scale via PyTorch or JAX, FSDP, DeepSpeed, etc
  • Drive real-time inference performance with vLLM, TensorRT-LLM, or SGLang
  • Apply RL alignment techniques to improve conversational quality
  • Debug the hard problems in distributed infrastructure and ship solutions
 
You likely have:
  • Proven experience training large-scale audio or speech models from the ground up
  • Deep PyTorch or JAX expertise with real distributed training experience
  • Genuine comfort traversing the entire ML stack from signal processing to production
  • A bias toward shipping: you take ownership, you iterate fast
Strong bonus: neural audio codecs, diffusion/flow-matching architectures, or LLM pretraining experience.
 
Why join
  • Profitable company at ~$250M run rate - you'll see the impact of your work immediately in a product used daily by professionals worldwide
  • Direct ownership of the live speech quality stack, not a supporting role in a large org
  • Hybrid San Francisco team with real access to large, diverse, multilingual audio datasets
  • Short feedback loops - improvements ship fast and metrics are visible
  • Clear path toward senior technical leadership as the audio team grows