They recently closed a $41 million Seed round co-led by two top-tier US venture firms, with participation from a leading global investor, and are rapidly expanding across Product, Engineering, Go-to-Market, and Growth.
About the Role You'll focus on the full training stack, profiling GPU behavior, debugging training pipelines, improving throughput, choosing the right parallelism strategies, and designing the infrastructure that lets the team train models efficiently at scale. The work spans cluster management, model training, efficient data pipelines for video and audio, inference, and optimizing PyTorch code. Your contribution will shape the foundation on which all of their generative models are built and iterated.
Key Responsibilities
- Identify ideal training strategies (parallelism approaches, precision trade-offs) for a variety of model sizes and compute loads
- Profile, debug, and optimize single and multi-GPU operations using tools like Nsight and stack trace viewers to understand what's actually happening at the hardware level
- Analyze and improve the entire training pipeline end to end, including efficient data storage, data loading, distributed training, checkpoint and artifact saving, and logging
- Set up scalable systems for experiment tracking, data and model versioning, and experiment insights
- Design, deploy, and maintain large-scale ML training clusters running SLURM for distributed workload orchestration
- Familiarity with the latest and most effective techniques for optimizing training and inference workloads, not from reading papers but from implementing them
- Deep understanding of GPU memory hierarchy and computation capabilities, knowing what the hardware can do in theory and what prevents you from achieving it in practice
- Experience optimizing for both memory-bound and compute-bound operations, with a clear sense of when each constraint matters
- Expertise with efficient attention algorithms and their performance characteristics at different scales
- Experience implementing custom GPU kernels and integrating them into PyTorch
- Experience with diffusion and autoregressive models and an understanding of their specific optimization challenges
- Familiarity with high-performance storage solutions (VAST, blob storage) and their performance characteristics for ML workloads
- Experience managing SLURM clusters at scale
- Pivotal moment. Fresh funding is secured and traction is building, this is the point where your contributions can make a real difference to the company's trajectory.
- True ownership from day one. Genuine autonomy and responsibility, with ideas and work that directly shape both product and company direction.
- Competitive compensation and equity. Strong packages that ensure you share in the success you help create.
- Build for the next generation of creators. Be part of the innovation that will transform how creators work and thrive.
