Training Infrastructure Engineer
Salary: €80,000 to €150,000 equity
Location: Fully remote within Europe (CET ±2 hours)
Stage: Recently funded Series A AI startup

We are partnering with a fast-growing generative AI company building the next generation of creative tooling. Their platform generates hyper-realistic sound, speech, and music directly from video, effectively bringing silent content to life. The technology is already being used across gaming, video platforms, and creator ecosystems, with a clear ambition to become foundational infrastructure for audio-visual storytelling.

Backed by top-tier venture capital and fresh Series A funding, the company is now scaling its core engineering group. This is a chance to join at a point where the technical challenges are deep, the scope is wide, and individual impact is unmistakable.

The Role:

As a Training Infrastructure Engineer, you will own and evolve the full model training stack. This is a hands-on, systems-level role focused on making large-scale training fast, reliable, and efficient. You will work close to the hardware and close to the models, shaping how cutting-edge generative systems are trained and iterated.

What You Will Do:
  • Design and evaluate optimal training strategies including parallelism approaches and precision trade-offs across different model sizes and workloads
  • Profile, debug, and optimise GPU workloads at single and multi-GPU level, using low-level tooling to understand real hardware behaviour
  • Improve the entire training pipeline end to end, from data storage and loading through distributed training, checkpointing, and logging
  • Build scalable systems for experiment tracking, model and data versioning, and training insights
  • Design, deploy, and maintain large-scale training clusters orchestrated with SLURM
What We Are Looking For:
  • Proven experience optimising training and inference workloads through hands-on implementation, not just theory
  • Deep understanding of GPU memory hierarchy and compute constraints, including the gap between theoretical and practical performance
  • Strong intuition for memory-bound vs compute-bound workloads and how to optimise for each
  • Expertise in efficient attention mechanisms and how their performance characteristics change at scale
Nice to Have:
  • Experience writing custom GPU kernels and integrating them into PyTorch
  • Background working with diffusion or autoregressive models
  • Familiarity with high-performance storage systems such as VAST or large-scale object storage
  • Experience managing SLURM clusters in production environments
Why This Role:
  • Join at a pivotal growth stage with fresh funding and strong momentum
  • Genuine ownership and autonomy from day one, with direct influence over technical direction
  • Competitive salary and equity so you share in the upside you help create
  • Work on technology that is redefining how creators produce and experience content
If you want to operate at the intersection of deep systems engineering and frontier generative AI, this is one of the strongest opportunities in the European market right now.