We are seeking a senior leader to define and deliver the architecture and research direction for large-scale multimodal AI systems. This role combines scientific leadership with hands-on system ownership, spanning model innovation, training, inference, and production deployment.
You will lead the design of multimodal architectures across LLMs, VLMs, video models, and multimodal agents, while driving cutting-edge research in multimodal understanding and generation. The role owns the full lifecycle from novel algorithms and publications to scalable, optimized systems (autotuning, quantization, inference efficiency).
Requirements
- Deep expertise in multimodal learning with hands-on experience training large-scale vision-language, video, or multimodal models.
- Strong understanding of transformers, diffusion models, and large multimodal model inference.
- Proven research impact (top-tier conferences preferred) and/or significant open-source contributions.
- Ability to translate frontier research into production-grade AI systems.