Training Infrastructure Engineer Salary: €80,000 to €150,000 equity Location: Fully remote within Europe (CET ±2 hours) Stage: Recently funded Series A AI startup We are partnering with a fast-growing generative AI company building the next generation of creative tooling. Their platform generates hyper-realistic sound, speech, and music directly from video, effectively bringing silent content to life. The technology is already being used across gaming, video platforms, and creator ecosystems, with a clear ambition to become foundational infrastructure for audio-visual storytelling. Backed by top-tier venture capital and fresh Series A funding, the company is now scaling its core engineering group. This is a chance to join at a point where the technical challenges are deep, the scope is wide, and individual impact is unmistakable. The Role:As a Training Infrastructure Engineer, you will own and evolve the full model training stack. This is a hands-on, systems-level role focused on making large-scale training fast, reliable, and efficient. You will work close to the hardware and close to the models, shaping how cutting-edge generative systems are trained and iterated. What You Will Do:Design and evaluate optimal training strategies including parallelism approaches and precision trade-offs across different model sizes and workloadsProfile, debug, and optimise GPU workloads at single and multi-GPU level, using low-level tooling to understand real hardware behaviourImprove the entire training pipeline end to end, from data storage and loading through distributed training, checkpointing, and loggingBuild scalable systems for experiment tracking, model and data versioning, and training insightsDesign, deploy, and maintain large-scale training clusters orchestrated with SLURMWhat We Are Looking For:Proven experience optimising training and inference workloads through hands-on implementation, not just theoryDeep understanding of GPU memory hierarchy and compute constraints, including the gap between theoretical and practical performanceStrong intuition for memory-bound vs compute-bound workloads and how to optimise for eachExpertise in efficient attention mechanisms and how their performance characteristics change at scaleNice to Have:Experience writing custom GPU kernels and integrating them into PyTorchBackground working with diffusion or autoregressive modelsFamiliarity with high-performance storage systems such as VAST or large-scale object storageExperience managing SLURM clusters in production environmentsWhy This Role:Join at a pivotal growth stage with fresh funding and strong momentumGenuine ownership and autonomy from day one, with direct influence over technical directionCompetitive salary and equity so you share in the upside you help createWork on technology that is redefining how creators produce and experience contentIf you want to operate at the intersection of deep systems engineering and frontier generative AI, this is one of the strongest opportunities in the European market right now.
Anthony Kelly