$250,000 - $300,000
San Francisco, CA
Hybrid, 3x per week in office
Full time / Permanent
In this role you won’t be wrapping APIs or fine-tuning existing models. You’ll be building models across raw acoustic signal processing all the way through to production inference on edge devices. At a company that actually ships to 1.5M live users.
A profitable, fast-growing AI company ($250M ARR in under three years, no VC dependency) is standing up a SpeechLLM lab from scratch. This is a founding seat on that team.
They build a hardware-software AI companion used daily by over 1.5 million professionals worldwide. The next chapter is a world-class speech intelligence core and they need the engineers to architect it.
What you'd own:
- Design and train large-scale speech models end-to-end. Unified SpeechLLMs, ASR, expressive TTS, generative audio
- Own the full stack from acoustic feature engineering to GPU cluster optimisation
- Run and optimise distributed training at scale via PyTorch or JAX, FSDP, DeepSpeed, etc
- Drive real-time inference performance with vLLM, TensorRT-LLM, or SGLang
- Apply RL alignment techniques to improve conversational quality
- Debug the hard problems in distributed infrastructure and ship solutions
You likely have:
- Proven experience training large-scale audio or speech models from the ground up
- Deep PyTorch or JAX expertise with real distributed training experience
- Genuine comfort traversing the entire ML stack from signal processing to production
- A bias toward shipping: you take ownership, you iterate fast
Why join
- Profitable company at ~$250M run rate - you'll see the impact of your work immediately in a product used daily by professionals worldwide
- Direct ownership of the live speech quality stack, not a supporting role in a large org
- Hybrid San Francisco team with real access to large, diverse, multilingual audio datasets
- Short feedback loops - improvements ship fast and metrics are visible
- Clear path toward senior technical leadership as the audio team grows
