AI Inference & Serving Model Efficiency

DeepRec.ai recruits top engineers specialising in inference performance, scalable serving architectures, and efficiency-critical AI infrastructure.

DeepRec.ai specialises in identifying and placing engineers who focus on running AI systems in the real world. Whether they're optimising how models are served and scaled, maintaining inference pipelines, or building real-time AI workloads, these essential roles sit at the intersection of machine learning and infrastructure. 

This is a high-value area where adaptability and practical experience matter most, traits that are hard to hire for in deep tech's shallow talent pool. This is where the DeepRec.ai team are uniquely positioned to add value. Through our global AI engineering community and our proven delivery experience, we've developed a granular understanding of how strong production engineers operate. 

When you need to move beyond standard job titles, narrow your search, and invest real time into identifying the ideal candidate for your roles, DeepRec.ai takes care of it. 

Find incredible candidates:

Talk to a Consultant

Explore the latest jobs in AI Inference and Serving:

Live jobs

Where DeepRec.ai Specialises

Inference, Serving, and Model Efficiency recruitment goals vary widely between organisations, but they share a common focus: running large models reliably and efficiently in production.

DeepRec.ai supports hiring engineers working across areas such as model serving, inference optimisation, and performance-critical AI infrastructure. This often includes engineers building and maintaining serving stacks using frameworks such as vLLM, TensorRT-LLM, TGI, SGLang, Ray Serve, KServe, and BentoML, depending on scale, latency requirements, and deployment environments.

Rather than relying on job titles alone, we focus on engineers with ownership of production systems and measurable real-world impact.

Engineers in this space are responsible for improving performance, reducing cost, and ensuring reliability under real-world workloads.

This commonly involves:

  • Applying optimisation techniques such as quantization (INT8 / FP8), pruning, and distillation

  • Implementing model compression strategies to reduce memory footprint and inference cost

  • Leveraging low-level performance improvements such as FlashAttention and FlashDecoding

  • Tuning inference pipelines for throughput, latency, and hardware efficiency

These decisions are often highly context-dependent, requiring engineers to balance accuracy, speed, cost, and operational complexity.

Why Choose DeepRec.ai as Your Talent Partner? 

The AI industry often approaches hiring for roles across AI Infrastructure & Distributed Systems as an extension of traditional ML or backend recruitment. Generic job titles are reused, CVs are screened for surface-level familiarity, and critical production experience is assumed rather than validated.

In reality, these roles demand engineers who can operate under real-world constraints, which means balancing latency, throughput, reliability, and cost in production AI systems. Hiring successfully requires context, judgement, and a deep understanding of how these systems behave at scale.

This is where DeepRec.ai adds value. We specialise in identifying engineers who have built, operated, and optimised production AI systems, not just experimented with them. Our experience delivering complex hiring mandates in performance-critical AI environments allows us to assess beyond titles and tooling, focusing instead on real-world capability and impact.

When you partner with DeepRec.ai, you get: 

  • A dedicated delivery team who specialise purely in Inference, Serving & Efficiency across AI infrastructure and distributed systems. This guarantees faster shortlists and higher-confidence hiring decisions.

  • The niche expertise of a boutique agency, but the resilience and resources of a global brand. We're part of Trinnovo Group, an international staffing business that provides the operational scale, governance, and delivery capability required to support business-critical hiring initiatives.

  • Adaptable recruitment models to suit your unique business goals. From embedded hiring solutions for high-volume hiring, all the way through to executive search for critical leadership hires. 

  • Access to a global AI engineering community of engaged, qualified, and production-ready engineers. 

  • A consultative, delivery-first approach to recruitment.

Check out our case studies

Roles We Recruit For

We support hiring across a range of production-focused AI engineering roles, including:

  • AI Inference Engineers

  • Model Serving Engineers

  • AI Infrastructure Engineers

  • Backend Engineers supporting AI workloads

  • Distributed Systems Engineers working on AI platforms

  • Performance and optimisation-focused AI engineers

Common Use Cases We Support

Inference, serving, and model efficiency hiring is most critical for teams:

  • Scaling LLM-powered products into production

  • Operating real-time or low-latency AI systems

  • Managing high-throughput inference workloads

  • Optimising infrastructure cost as AI usage grows

  • Building internal AI platforms or developer tooling

We work with teams where inference performance and system reliability directly affect product quality and commercial outcomes.

FAQ

What makes inference and serving roles difficult to hire for?

These roles require hybrid skill sets across ML, backend engineering, and infrastructure, combined with real-world production experience that is difficult to validate through CVs alone.

Do you recruit for MLOps roles?

Yes. We specialise in MLOps hiring alongside inference, serving, and AI infrastructure roles, supporting teams responsible for deploying, operating, and maintaining production AI systems.

Do you support startup and enterprise hiring?
Yes. We work with startups, scale-ups, and established organisations where production AI systems are business-critical.

Can you support confidential or business-critical hires?
Absolutely. We regularly deliver complex and sensitive hiring mandates where discretion and precision are essential.

Which locations do you service? 

We primarily deliver recruitment services across the UK, Ireland, the DACH region, and the United States, where we have deep market knowledge and an established presence. Alongside this, we regularly deliver AI infrastructure, inference, serving, and MLOps hiring mandates on a global basis.

Ready to Build Production-Grade AI Teams?

If you’re building or scaling AI systems where performance and reliability matter, DeepRec.ai can help.

Speak with a specialist

 

 

 

AI INFERENCE & SERVING MODEL EFFICIENCY CONSULTANTS

Anthony Kelly

Co-Founder & MD EU/UK

Sam Warwick

Senior Consultant - ML Systems + AI Infra

Jacob Graham

Senior Consultant

LATEST JOBS

Massachusetts, United States
BMS AI Edge Software Engineer
BMS & AI Edge Software Engineer Battery Systems | AI for Science | Energy Storage Our client is a publicly listed, AI driven energy technology company operating at the intersection of advanced materials science, battery engineering, and machine learning. Their mission is simple but ambitious: accelerate the global energy transition by using AI to fundamentally change how batteries are designed, validated, and operated. They are pioneers in applying AI directly to battery chemistry, materials discovery, and battery management systems, enabling next generation Li ion and Li metal batteries across transportation, energy storage, robotics, aviation, and defense adjacent applications. The Opportunity Our client’s Energy Storage Systems R&D group is seeking a BMS & AI Edge Software Engineer to design and deploy AI centric State of X (SoX) algorithms that run on edge devices. This role sits squarely between battery physics, embedded software, and applied machine learning. You will own algorithm development from concept through edge deployment, working closely with battery scientists, hardware engineers, and customer facing teams to bring production ready software into real world environments. Key Responsibilities Algorithm R&D for SoXDesign and implement SoX architectures covering charge, health, power, safety, degradation, and related metricsTranslate models and logic into production grade code running on edge devicesCollaborate with battery physicists and engineers on model selection and validationModel Design & OptimizationResearch and evaluate alternative algorithms to improve accuracy, robustness, and performanceOptimize models and software for real world operating constraintsPresent results internally and demonstrate measurable improvementsVerification & DeliveryTest and validate software as a production ready product using defined methodologiesSupport validation at customer sites or manufacturing plants as requiredEngage directly with customers to support deployment and technical approvalRequirements EducationPhD or Master’s in Electrical Engineering, Computer Science, AI, or a closely related fieldEquivalent hands on industry experience will be consideredExperience5 to 9 years of experience in Li ion batteries, BMS, or ESS software engineering (10 years for Senior level)Strong background in BMS sensing and control software including voltage, temperature, current, and diagnosticsSolid understanding of battery chemistries and characteristics such as OCV, C rate behavior, and impedanceExperience developing data driven or AI based algorithms for battery systems, ideally deployed on edge or cloudProven experience coding, integrating, validating, and delivering production softwareExposure to customer facing delivery or deployment projectsPreferred BackgroundBattery characterization methods such as GITT, dQ/dV, or similarPower electronics knowledge including DC/DC or DC/AC conversionFamiliarity with power delivery architectures such as UPS or battery backup systems for data centersWhat’s On OfferHighly competitive base salary and strong benefitsMeaningful equity participation in a publicly listed businessDirect impact on globally relevant energy and sustainability challengesWork alongside leading experts in AI, battery science, and engineeringLong term growth opportunities in a technically serious R&D environment
Sam WarwickSam Warwick