$200,000 - $300,000
San Francisco, hybrid (3x per week)
Full time / Permanent
This company builds AI-powered tools that help professionals capture and use what's said in the real world of work - meetings, conversations, voice notes. It's profitable, bootstrapped, and growing fast: $250M revenue run rate in under three years, with over 1.5 million users globally.
The product is working. The next step is making the speech engine significantly better by making it smaller, faster, and more accurate across every device and language it runs on.
What you'll do
- Design and train lightweight on-device ASR models (e.g. Streaming Transducer, CTC) that run efficiently on mobile and embedded hardware
- Compress and optimize models using quantization, pruning, and knowledge distillation
- Clean, align, and augment multilingual speech data; handle low-resource languages and noisy real-world conditions
- Work closely with engineering teams to convert and deploy models into production
What "great" looks like
- You've trained or fine-tuned ASR models at production scale, not just in research settings
- You know at least one major ASR framework deeply (Wenet, Espnet, Icefall/K2, or Zipformer) and understand how they actually work at a structural level
- You've deployed on-device or offline ASR models and solved the messy problems that come with real hardware constraints
- You've done hands-on post-training quantization and know how to recover accuracy when it degrades
- Master's or PhD in Computer Science, Signal Processing, or similar, and 3–5 years in speech algorithms
Bonus: published research at ICASSP or Interspeech, experience with Zipformer / Paraformer / SenseVoice, or knowledge distillation from large speech models to compact ones.
Why join
- Profitable, fast-moving company. Your work ships and gets used by over a million people
- Real ownership of the on-device speech stack, not one task on a large team's backlog
- Hybrid San Francisco team building both hardware and AI systems in parallel
- Meaningful datasets and global product scale to test and prove your work
- Clear growth toward senior technical leadership as the audio function expands
