Multimodal LLM Researcher
$300,000 - $400,000
Remote, Palo Alto
Full-time / Permanent

DeepRec has partnered with a high-growth generative AI company (Series B, $130M raised). They're building multimodal, multi-agent systems that combine language, vision, audio, and video. If you've been looking for a role where your research reaches production and shapes how millions interact with creative AI, this is worth a closer look.

You'll help define the next generation of multimodal AI systems. Your work will span research, experimentation, and deployment, with a focus on real-time performance, multimodal reasoning, and agent-based workflows. You'll have the freedom to explore ambitious ideas while working alongside engineers who can bring them into production.

What You'll Do
- Lead research across LLMs, VLMs, and Audio Language Models
- Design novel multimodal model architectures and training approaches
- Improve real-time inference across text, image, audio, and video
- Train and fine-tune autoregressive and diffusion models
- Build and curate high-quality multimodal datasets
- Collaborate with engineering teams to deploy research outcomes
- Publish findings at leading AI conferences and journals

What You'll Bring

Essential
- Strong research track record in multimodal AI or foundation models
- First-author publications at recognised ML, vision, or audio conferences
- Deep expertise in LLMs, VLMs, Audio LMs, or related fields
- Strong Python and deep learning experience using modern frameworks

Desirable
- Experience with diffusion models or world models
- Background in real-time AI systems and model serving
- Experience building large-scale multimodal datasets

We encourage you to apply even if you don't meet every requirement. The right mindset matters as much as the right CV.

What's In It For You

- USD 300,000–400,000 salary
- Fully remote working arrangement
- Ownership of research that shapes production systems
- Opportunity to publish and contribute to the field
- Direct collaboration with product and engineering leadership

This role offers the chance to work on multimodal AI problems that sit at the intersection of research and real-world deployment. If you're excited by advancing the field while seeing your work reach users, we'd love to hear from you.