Research Engineer – Training Optimisation and Infrastructure
Location: Berlin - Remote within Europe (±2 hours CET)
Level: Mid to Staff
Package: Competitive salary plus equity

The Opportunity
A Series A generative AI company is hiring a Research Engineer to drive optimisation across training strategy and ML infrastructure. The business builds state-of-the-art audio and music generation models and is backed by a leading generative AI fund. The team includes researchers and engineers from Google Brain, Meta FAIR, Amazon, ETH Zürich, and Max Planck.

Role Summary
You will focus on optimising end-to-end training pipelines for large generative models. This includes GPU-level performance tuning, distributed systems work, and driving efficiency across data, storage, orchestration, and experimentation systems.

Key Responsibilities
  • Develop and refine training strategies including parallelism approaches and precision choices for varied model scales and compute profiles
  • Profile, debug, and optimise single and multi-GPU workloads using tools such as Nsight
  • Improve training pipelines covering data storage, data loading, distributed training, checkpointing, and logging
  • Build scalable systems for experiment tracking, model and data versioning, and experiment insights
  • Design, deploy, and maintain large-scale training clusters using SLURM

Ideal Experience
  • Strong hands-on experience optimising training and inference workloads
  • Deep understanding of GPU memory hierarchy and hardware performance limits
  • Experience tuning both memory-bound and compute-bound operations
  • Knowledge of efficient attention algorithms and their performance implications at different scales

Nice-to-Have
  • Experience writing custom GPU kernels and integrating them into PyTorch
  • Familiarity with diffusion or autoregressive models
  • Understanding of high-performance storage solutions such as VAST
  • Experience running SLURM clusters at scale

Why Apply
  • Work on frontier audio and music generation models
  • Influence training strategy and infrastructure at scale
  • Join a high-calibre research and engineering team