$200,000 - $300,000
San Francisco, hybrid (3x per week)
Full time / Permanent
This company builds AI tools and devices that help professionals capture and use what's said in real conversations across meetings, calls, voice notes. It's profitable, bootstrapped, and scaling fast: $250M revenue run rate in under three years, used by over 1.5 million people globally.
The product works. Now they need someone to make the live speech experience feel polished and seamless, fixing the small things that frustrate users at scale.
What you'll do
- Build and maintain test suites and automated evaluation platforms for multilingual, multi-model live systems. Covering hallucinations, casing, punctuation, number formatting, and segmentation
- Set up benchmarks for live agent systems: VAD false triggers, interruption latency, and turn-taking transitions
- Fix the friction points that hurt user experience: poor segmentation, inconsistent casing, hallucinated words
- Optimize VAD, barge-in models, and turn-taking logic to reduce end-to-end latency and false interruption rates
What "great" looks like
- 1–3 years of hands-on experience in speech algorithm training, with a focus on pre- or post-processing, or full-duplex voice system optimization
- You've worked on ASR pre-processing or post-processing in a real product
- You understand how live voice systems break and know how to fix them
- You have published research at Interspeech or ICASSP, or possess speech-related patents
Why join
- Profitable company at ~$250M run rate - you'll see the impact of your work immediately in a product used daily by professionals worldwide
- Direct ownership of the live speech quality stack, not a supporting role in a large org
- Hybrid San Francisco team with real access to large, diverse, multilingual audio datasets
- Short feedback loops - improvements ship fast and metrics are visible
- Clear path toward senior technical leadership as the audio team grows
