Training Infrastructure Engineer
Salary: €80,000 to €150,000 equity
Location: Fully remote within Europe (CET ±2 hours)
Stage: Recently funded Series A AI startup

We are partnering with a fast-growing generative AI company building the next generation of creative tooling. Their platform generates hyper-realistic sound, speech, and music directly from video, effectively bringing silent content to life. The technology is already being used across gaming, video platforms, and creator ecosystems, with a clear ambition to become foundational infrastructure for audio-visual storytelling.

Backed by top-tier venture capital and fresh Series A funding, the company is now scaling its core engineering group. This is a chance to join at a point where the technical challenges are deep, the scope is wide, and individual impact is unmistakable.

The Role:

As a Training Infrastructure Engineer, you will own and evolve the full model training stack. This is a hands-on, systems-level role focused on making large-scale training fast, reliable, and efficient. You will work close to the hardware and close to the models, shaping how cutting-edge generative systems are trained and iterated.

What You Will Do:

Design and evaluate optimal training strategies including parallelism approaches and precision trade-offs across different model sizes and workloads
Profile, debug, and optimise GPU workloads at single and multi-GPU level, using low-level tooling to understand real hardware behaviour
Improve the entire training pipeline end to end, from data storage and loading through distributed training, checkpointing, and logging
Build scalable systems for experiment tracking, model and data versioning, and training insights
Design, deploy, and maintain large-scale training clusters orchestrated with SLURM

What We Are Looking For:

Proven experience optimising training and inference workloads through hands-on implementation, not just theory
Deep understanding of GPU memory hierarchy and compute constraints, including the gap between theoretical and practical performance
Strong intuition for memory-bound vs compute-bound workloads and how to optimise for each
Expertise in efficient attention mechanisms and how their performance characteristics change at scale

Nice to Have:

Experience writing custom GPU kernels and integrating them into PyTorch
Background working with diffusion or autoregressive models
Familiarity with high-performance storage systems such as VAST or large-scale object storage
Experience managing SLURM clusters in production environments

Why This Role:

Join at a pivotal growth stage with fresh funding and strong momentum
Genuine ownership and autonomy from day one, with direct influence over technical direction
Competitive salary and equity so you share in the upside you help create
Work on technology that is redefining how creators produce and experience content

If you want to operate at the intersection of deep systems engineering and frontier generative AI, this is one of the strongest opportunities in the European market right now.

Training Infrastructure Engineer

APPLY HERE