We are fully licensed across the UK, Ireland, Switzerland, Germany and the USA, enabling us to support customers with compliant cross-border talent acquisition.

undefined
Introducing DeepRec.ai's Hertfordshire HQ, our basecamp for connecting incredible candidates with Deep Tech opportunities across the globe.
Hayley Killengrey
HI, I'M Hayley
Co-Founder & MD USA

CUSTOMERS SUPPORTED IN BISHOP'S STORTFORD

MEET THE TEAM

Anthony Kelly

Co-Founder & MD EU/UK

Hayley Killengrey

Co-Founder & MD USA

Nathan Wills

Team Lead | Switzerland

Paddy Hobson

Team Lead | DACH

Sam Oliver

Principal AI Consultant | DACH Contract

Jonathan Harrold

Principal Consultant | DACH

Harry Crick

Principal Consultant | USA

Sam Warwick

Senior Consultant - ML Systems + AI Infra

Benjamin Reavill

Consultant - US

George Templeman

Senior Consultant

Edward Killin

Principal Recruitment Consultant

Andrew Brophy

Recruitment Consultant

Luke Weekes

Senior Consultant

Viki Dowthwaite

Commercial Director

Marita Harper

HR Partner

Micha Swallow

Head of Talent, People, & Performance

Aaron Gonsalves

Head of Talent

Sabrina Jones

Commercial Payroll Lead

Matthew Goddard

Head of Legal & Compliance

David Rodwell

Senior Recruitment Consultant

Oliver Perry

COO

SALARY GUIDE

Built with fresh insights from our talent network, we developed this guide for anyone hoping to benchmark salaries, align remuneration with the wider market, or learn more about the trends and opportunities across the German Deep Tech space. Download your copy here:  

DOWNLOAD

LATEST JOBS

Berlin, Germany
Senior Inference Optimization Engineer
About our client:Our client is a fast-scaling automation platform that operates cloud-native and AI infrastructure at scale. By embedding autonomous decision-making directly into Kubernetes and cloud environments, the platform continuously optimizes performance, reliability, and efficiency in production, replacing tickets, alerts, and manual tuning with continuous automation that adapts infrastructure as conditions change. The company is trusted by over two thousand organizations, including a number of globally recognized enterprises across technology, automotive, media, and financial services. It operates as a distributed, international team spanning more than thirty countries across Europe, North America, Latin America, and APAC. The business recently reached unicorn status following a strategic investment from a major corporate venture arm, with a valuation now in excess of one billion dollars and strong momentum behind its next phase of growth. About the role:  Throughput. Latency. KV cache utilization. Move those three numbers in the right direction, and two things happen. Customers get faster, cheaper inference, and our client's margins improve. That is the entire thesis of this role. Every kernel you tune, every quantization scheme you ship, and every scheduler tweak you land shows up directly in a customer's p99 and on the P&L. This is a high-impact seat, and a high-autonomy one. You will be given the room to lead the technical direction of inference optimization rather than execute someone else's roadmap. The problem is that running LLMs in production is a moving target. The right model and serving configuration for a workload depend on traffic shape, sequence-length distribution, batch dynamics, GPU SKU, memory bandwidth, quantization tolerance, and a dozen other variables that shift week to week. Most teams pick a model once, over-provision GPUs, and absorb the cost. Our client's system makes that decision automatically, continuously matching workloads to the most cost-efficient, best-performing LLM and serving configuration on a customer's infrastructure. The team is building the optimization layer between the model and the hardware, and needs engineers who understand both sides deeply. Stack Python; vLLM; SGLang; TensorRT-LLM; PyTorch; CUDA-adjacent tooling; Kubernetes; gRPC; ClickHouse; PostgreSQL; GCP Pub/Sub; AWS, GCP, and Azure; GitLab CI; ArgoCD; Prometheus; Grafana; Loki; Tempo. RequirementsFive or more years building real ML systems, with a portfolio that shows depth in inference or training infrastructure, not just model training notebooks.Strong Python, with experience building production services rather than scripts.Hands-on experience with at least one of vLLM, SGLang, or TensorRT-LLM, and a working mental model of why an inference engine performs the way it does on a given GPU.Fluency with quantization tradeoffs. You have measured quality regressions, not just compression ratios.Comfort with distributed systems, including collective communication, sharding strategies, and the practical failure modes of multi-GPU and multi-node setups.A bias toward measurement. You instrument before you optimize, and you can tell the difference between a real win and a benchmark artifact.Self-direction. This role comes with a wide mandate, and you should be excited by that rather than unsettled by it.ResponsibilitiesPush throughput. Continuous batching, speculative decoding, chunked prefill, and kernel-level tuning across vLLM, SGLang, and TensorRT-LLM. Find the ceiling on each GPU SKU, then raise it.Cut latency. Attack TTFT and TPOT separately. Profile, identify the actual bottleneck whether compute, memory bandwidth, scheduling, or networking, and fix it rather than the bottleneck you assumed.Get more out of the KV cache. Paged attention, prefix caching, eviction policies, cache reuse across requests, and quantized KV. This is where a lot of the unrealized throughput lives, and it is an area you will own.Quantize without regressing quality. INT8, INT4, and FP8 across weights, activations, and KV. Empirical work that measures quality on real workloads, not just perplexity benchmarks.Shrink cold starts and memory footprint. Faster init, smarter weight loading, and tighter memory accounting, which is the difference between a model that scales and one that does not.Scale across nodes. Distributed inference topologies, network-aware placement, and checkpointing strategies that do not bottleneck on storage or interconnect.Set the technical direction. Decide what to benchmark, what to adopt, and what to build in-house. Bring the team along with strong writeups and reproducible experiments.
Sam WarwickSam Warwick
Palo Alto, California, United States
Senior Inference Engineer
Senior Inference Engineer AI Video Generation Company (Stealth) | Palo Alto, CA | HybridAbout the Role We are seeking a Senior Inference Engineer to accelerate the performance of our AI-driven video generation products. In this highly technical role, you will operate at the intersection of cutting-edge inference acceleration, GPU parallelism, advanced model deployment, and video generation technologies. Your expertise will drive significant improvements to model speed and efficiency, ensuring our creative AI systems deliver industry-leading user experiences at scale. You will design and optimize inference pipelines, implement state-of-the-art acceleration techniques, and work closely with researchers and engineers across the team to push the boundaries of what's possible in real-time AI deployment. Your efforts will play a foundational role in powering the next generation of our video and language models.   What You'll DoAccelerate Inference: Lead and implement advanced inference acceleration techniques, including attention optimization and quantization for efficient model serving.Maximize GPU Parallelism: Engineer and optimize GPU strategies across tensor, sequence, and pipeline parallelism (TP, SP, PP) for maximal efficiency and scalability.Programming for Performance: Develop and optimize high-performance computing kernels and distributed workloads using CUDA and NCCL.Advance AI Deployment: Collaborate with research and engineering teams to bring state-of-the-art video generation and large language models into production.Improve Training Efficiency: Contribute to improvements in model training speed, stability, and resource utilization as part of our deployment lifecycle. (Bonus)Technical Excellence: Drive rigorous code reviews, participate in technical discussions, and mentor fellow engineers on best practices in inference and GPU programming.  What We're Looking ForExperience: 5 years of engineering experience, with a strong track record in inference acceleration and model deployment at scale.Inference Mastery: Proven expertise in inference optimization, including quantization, attention acceleration, and deep learning compiler stacks.GPU and Parallelism: Deep knowledge of GPU programming (CUDA, NCCL) and experience with SP, TP, PP, and other forms of parallelism for distributed inference.AI Domain Knowledge: Familiarity with video generation models and large language models (LLMs).Collaboration: Strong cross-discipline communication skills; able to drive shared goals across research and engineering functions.Ownership Mindset: Self-driven, solutions-oriented, and capable of managing ambiguity in a fast-paced startup environment.  Nice to HaveExperience with high-throughput video or real-time streaming model deployment.Familiarity with distributed training and optimization toolkits.Contributions to open source projects in AI infrastructure or deep learning compilers.Startup or rapid prototyping experience.  What We OfferCompetitive salary commensurate with AI industry benchmarks.Equity in a fast-growing company shaping the future of generative AI.Comprehensive health benefits, monthly stipends, and company retreats.A collaborative, in-office culture focused on building and shipping together.About the Company A well-funded, early-stage AI video generation startup headquartered in Palo Alto, CA. The team is building technology to make video creation seamless, intuitive, and universally accessible through the transformative power of AI. Tight-knit and highly energetic, the company values efficiency, intellectual curiosity, and the ambition to make a meaningful impact on the world.
Sam WarwickSam Warwick
Palo Alto, California, United States
Staff Software Engineer (AI Infrastructure)
Staff/Lead Software Engineer, AI Infrastructure About the Company A well-funded Bay Area AI startup operating at the frontier of generative media, with a product shipping to users at scale. The company is building the core infrastructure that powers its AI capabilities, and this is a senior, high-ownership hire on that team. About the Role This is a critical hire to build and scale the infrastructure behind the company's AI capabilities. You'll lead the design and implementation of GPU infrastructure, AI model serving APIs, and general AI infrastructure execution, enabling the machine learning features that drive the product. You'll architect robust, distributed systems optimized for high-performance AI workloads, large-scale GPU orchestration, and low-latency, reliable API serving. Your work will directly shape how users experience generative AI at scale. As a senior technical leader, you'll also mentor engineers, drive best practices, and set the technical vision for AI infrastructure. What You'll DoDesign, develop, and maintain scalable GPU infrastructure for training and serving state-of-the-art AI models.Architect and optimize high-throughput, low-latency APIs for AI model serving and inference.Lead the orchestration, scheduling, and efficient utilization of heterogeneous GPU resources across clusters.Build and support robust systems for model deployment, monitoring, scaling, and reliability in production.Collaborate with ML, backend, and platform engineering teams to deliver seamless AI-powered product features.Drive technical direction, code reviews, and mentorship across the AI Infrastructure team.What We're Looking For5 years as a software engineer working on systems infrastructure, including hands-on ML serving and GPU orchestration.Deep knowledge of distributed systems, Kubernetes (or similar orchestration frameworks), and cloud-native infrastructure (AWS/GCP/Azure).Proven expertise building and optimizing APIs for large-scale AI model serving (TensorFlow Serving, Triton, TorchServe, or similar).Familiarity with the challenges of high-throughput, scalable GPU fleet management, scheduling, and efficient model execution.Proficiency in backend languages such as Python, Go, or C , with experience optimizing for performance and reliability.Ownership mentality and the drive to solve complex problems independently in ambiguous, high-growth environments.Excellent communication, collaboration, and mentorship skills.Nice to HaveExperience with multi-modal AI model infrastructure (LLMs, generative models, video/image/speech models).Background building infra for multi-tenant SaaS, enterprise AI/ML platforms, or operational automation at scale.Previous startup experience, or a track record leading high-impact projects through ambiguity and rapid iteration.Experience with competitive coding or large-scale distributed computing environments.
Sam WarwickSam Warwick
Philadelphia, Pennsylvania, United States
Machine Learning Engineer (Inference Optimization)
Machine Learning Engineer – Inference Optimization Overview We are looking for a Machine Learning Engineer focused on low-latency inference optimization to help build, tune, and productionize high-performance model serving systems. This role sits at the intersection of machine learning, systems engineering, and GPU performance. You will work on inference workloads where latency, throughput, reliability, and hardware efficiency all matter, and where a deep understanding of modern inference runtimes can meaningfully improve production outcomes. You will work closely with researchers and engineers to understand model structure, identify inference bottlenecks, and turn research ideas into efficient production systems. The work may involve other types of models, but focuses on transformer-style architectures and structured inference workloads. You will evaluate and tune frameworks and related serving or compilation systems, while also reasoning about GPU execution, memory layout, batching strategies, precision tradeoffs, and end-to-end latency. What you'll do:Design, build, and optimize low-latency inference systems for production machine learning workloads.Profile model inference pipelines across model execution, runtime configuration, batching, memory movement, serialization, networking, and I/O.Evaluate, integrate, and tune inference runtime systems.Improve latency, throughput, and GPU utilization for production inference workloads.Build and support benchmarking and profiling tools to compare model variants, hardware targets, runtime configurations, and deployment strategies.Debug performance issues involving GPU memory, compute saturation, kernel behavior, CPU/GPU coordination, data movement, and serving-layer overhead.Help shape model and system design choices so that research models are efficient to deploy under real latency constraints.Where necessary, collaborate with lower-level systems or GPU specialists on custom operators, kernel-level optimization, or hardware-specific performance work.What we're looking for:Experience deploying, optimizing, or operating machine learning inference workloads in production or production-like environments.Programming experience in Python, Java, C# etc. and at least one systems language such as C, C , Rust, or Go.Solid understanding of modern ML frameworks such as PyTorch, including model execution, export, tracing, compilation, and performance profiling.Ability to reason about latency, throughput, batching, memory use, GPU utilization, and reliability under real workloads.Strong practical judgment around tradeoffs between model quality, latency, throughput, implementation complexity, and maintainability.Preferred qualifications:Experience optimizing inference for latency-sensitive or high-throughput applications.Experience with model optimization techniques such as quantization, pruning, distillation, operator fusion, graph lowering, custom operators, or model compilation.Exposure to CUDA, Triton language, ROCm, PTX, CuTe, CUTLASS, FlashInfer, or similar low-level GPU programming tools.Experience running inference workloads on Kubernetes or GPU clusters, including scheduling, autoscaling, observability, and resource management.Background in mathematics, physics, computer science, engineering, statistics, or another technical field.Demonstrated ability to improve real-world inference performance beyond a baseline framework implementation.
Sam WarwickSam Warwick
London, Greater London, South East, England
Machine Learning Research Engineer
Machine Learning Research Engineer Location: London (Hybrid) Job Type: Full-Time Salary: £80,000 - 100,000 We're partnering with an innovative deep-tech organisation developing next-generation computing technology and advanced AI solutions. As the company continues to expand its research capabilities, they're looking to appoint a Machine Learning Research Engineer to help drive the development of novel machine learning algorithms and applications. This is a rare opportunity to work on cutting-edge technology at the intersection of machine learning, advanced computing, and emerging hardware platforms, helping to solve complex real-world challenges for customers and partners worldwide.  The Opportunity As a Machine Learning Research Engineer, you'll develop and benchmark new machine learning approaches, with a particular focus on generative AI and advanced neural network architectures. You'll work alongside a multidisciplinary team of researchers and engineers, translating innovative ideas into practical solutions and customer-facing applications. Key ResponsibilitiesDevelop, test and benchmark advanced machine learning algorithms.Work with generative AI models including diffusion models, flow models and GANs. Design and evaluate novel hybrid machine learning architectures.Collaborate with customers and strategic partners to identify and solve complex technical challenges.Contribute to software products through the development of new algorithms, tools and examples. Support research activities through experimentation, analysis and publication of results. Contribute to the creation and protection of intellectual property arising from your work. Essential RequirementsProven experience developing and benchmarking machine learning algorithms. Hands-on experience with diffusion models, flow models and/or GANs. Strong programming skills in Python and PyTorch. Experience using version control systems such as Git. MSc or PhD in Computer Science, Machine Learning, Physics, Mathematics or a related discipline. Excellent communication skills and the ability to work effectively within multidisciplinary teams. Desirable ExperienceExperience working with advanced computing infrastructure including HPC, GPUs, NPUs, ASICs or other specialist hardware.Experience training and optimising multi-GPU models. Published research in machine learning or a related field. Experience working directly with customers or external stakeholders. An interest in or understanding of quantum computing technologies. What's On OfferThe opportunity to work on genuinely innovative technology at the forefront of AI and advanced computing.A collaborative environment alongside highly experienced researchers and engineers.Exposure to cutting-edge machine learning research and commercial applications.Competitive salary and benefits package.Hybrid working based in London.
Edward KillinEdward Killin
Michigan, United States
Senior Quantum Error Correction Theorist
SummarySeeking an experienced Technical Lead in Quantum Error Correction to drive the development of fault-tolerant quantum computing architectures. This role combines deep technical expertise with scientific leadership, guiding research direction, influencing system architecture, and collaborating across multidisciplinary teams. It is ideal for a senior researcher who wants to be a recognised technical leader without extensive people management responsibilities. ResponsibilitiesLead the development of quantum error correction strategies for fault-tolerant quantum computing.Design and optimise hardware-aware error correction protocols under realistic noise and system constraints.Develop simulation and modelling tools to support system design and technology decisions.Provide technical leadership and mentorship across multidisciplinary research teams.Collaborate with physicists, engineers, and computer scientists to translate theory into practical solutions.Influence technical roadmaps by evaluating emerging approaches and identifying new research opportunities.Contribute to scientific publications, technical strategy, and intellectual property. RequirementsPhD in Physics, Applied Physics, Computer Science, Electrical Engineering, or a related field.Extensive experience in quantum error correction, ideally within an industrial research environment.Proven track record of technical leadership in quantum computing research.Strong scientific programming and simulation skills.Experience leading cross-functional research projects.Excellent communication skills and the ability to influence technical direction.Desirable ExperienceExperience translating research into commercial technologies or product roadmaps.Hands-on experience implementing quantum error correction on quantum hardware.Expertise in quantum decoding and hardware-aware error correction protocols. BenefitsCompetitive salary and equity package.Visa sponsorship where applicable.Health and wellbeing benefits.Flexible paid time off.Opportunity to shape the future of fault-tolerant quantum computing within a collaborative, innovation-driven environment.
George TemplemanGeorge Templeman
Netherlands
Customer Success Team Lead
SummarySeeking an experienced Team Lead to manage and scale a technical customer success team supporting advanced hardware and software technologies. The role combines leadership, operational excellence, customer success strategy, and cross-functional collaboration to deliver exceptional customer experiences while driving continuous process improvement and service scalability. ResponsibilitiesLead, mentor, and develop a high-performing team of technical customer success engineers.Design, implement, and improve scalable customer success processes and service models.Develop and maintain standard operating procedures, playbooks, and operational best practices.Oversee the customer lifecycle, including onboarding, training, ongoing support, and customer engagement.Ensure timely resolution of complex technical issues involving hardware and software products.Act as the voice of the customer by gathering feedback and communicating product improvements to engineering and product teams.Collaborate with sales, engineering, and product teams to align customer priorities and support product development.Support operational growth across multiple regions while promoting consistent global standards. RequirementsSignificant experience leading customer success, technical support, or customer engineering teams.Proven ability to build, manage, and scale high-performing technical teams.Experience supporting complex hardware and software products.Background in designing and implementing customer service models, service packages, or service level agreements (SLAs).Experience working in fast-paced start-up or scale-up environments, with an understanding of organisational growth.Strong leadership, coaching, communication, and stakeholder management skills.Ability to develop structured processes and drive operational excellence.Experience with high-performance computing (HPC), data centre technologies, or quantum technologies is advantageous. BenefitsOpportunity to lead and shape a growing technical customer success function.Work with cutting-edge deep technology in a collaborative environment.Career development within a rapidly growing, innovation-focused organisation.Exposure to international customers and cross-functional teams.Competitive salary and benefits package.
George TemplemanGeorge Templeman
Paris, Ile De France, France
Quantum Information Scientist
Researcher – Theoretical Quantum Optics and Quantum InformationSeeking a theoretical researcher to develop quantum optics and quantum information approaches for advanced spectroscopy applications. The role focuses on designing novel quantum protocols, performing theoretical analysis, and collaborating with experimental and interdisciplinary research teams to advance next-generation quantum-enabled sensing technologies. ResponsibilitiesDevelop theoretical quantum protocols using discrete-variable and continuous-variable quantum systems, including non-Gaussian states.Perform theoretical modelling, calculations, and literature reviews to support research objectives.Collaborate closely with experimental researchers to align theoretical developments with practical implementation.Work with external academic collaborators on cutting-edge quantum research.Contribute to the development of new theoretical approaches for quantum-enhanced spectroscopy.Publish high-quality scientific research and support knowledge dissemination. RequirementsPhD in Quantum Information, Quantum Optics, or a closely related field with a strong theoretical focus.Postdoctoral research experience is desirable but not essential.Strong background in theoretical quantum optics and quantum information.Experience with quantum state modelling, including discrete-variable and continuous-variable systems.Excellent analytical, problem-solving, and scientific communication skills.Ability to work independently while collaborating effectively within multidisciplinary research teams.Curiosity, creativity, and a proactive approach to tackling open-ended research challenges. BenefitsCompetitive salary based on experience.Equity or share option opportunities.Opportunity to contribute to pioneering research with real-world applications.Collaboration with leading researchers across academia and industry.Professional development through publication and interdisciplinary research.Early-stage environment offering significant ownership and long-term growth opportunities.
George TemplemanGeorge Templeman