Location: Europe (strong preference for Spain, ideally Madrid) Type: Full-time About the Company We're working with a high-growth startup developing AI systems that allow industrial robots to perform tasks they currently cannot, starting with complex warehouse operations like mixed palletizing. Their technology combines deep reinforcement learning (DRL) with modern sequence modeling to tackle control and combinatorial optimization problems where classical approaches fail.They are a small, highly skilled team. Joining us means having direct impact, minimal bureaucracy, and ownership over core technology that will be deployed in real-world, high-throughput environments. Role Overview As the second hire in the DRL team, you will own the end-to-end reinforcement learning stack: from problem formulation to algorithm design, large-scale training, evaluation, and deployment. You will work closely with the technical leadership to translate cutting-edge DRL research into practical production throughput at operational sites. This role is highly autonomous, requiring a hands-on expert capable of leading experiments, troubleshooting complex issues, and establishing best practices for algorithm development and deployment. Key ResponsibilitiesDesign, implement, and ship DRL algorithms (e.g., PPO, SAC, DDQN and variants) incorporating advanced architectures such as encoders, cross-attention, and pointer networksOptimize stability and sample efficiency using techniques such as GAE, reward shaping, normalization, entropy/KL control, curriculum learning, and distributional/value-loss tuningSet up and manage large-scale training pipelines: multi-GPU training, parallel rollouts, efficient replay/storage, reproducible experimentsProductionize algorithms with clean, maintainable PyTorch code, profiling, Dockerized services, cloud deployments (AWS), experiment tracking, and dashboardsCollaborate with leadership to align technology with business goals and customer needsMentor and grow future team members, fostering a culture of technical excellence and innovationRequired QualificationsProven track record delivering DRL systems beyond academic demos: led at least one end-to-end DRL system from concept to production or achieved a state-of-the-art benchmark in the last 3–5 yearsDeep expertise in reinforcement learning and deep learning, with strong PyTorch skillsSolid understanding of DRL theory: MDPs, Bellman operators, policy gradients, trust-region/KL methods, λ-returns, stability and regularization in on-policy/off-policy regimesSystems experience: Python, Linux, multi-GPU training, Docker, cloud deployments (AWS preferred)Comfortable taking ownership of experiments, code quality, and results in a small, high-impact teamPhD or equivalent experience in DRL is acceptable; strong academic-only candidates considered if they demonstrate deep expertiseNice to HaveRobotics experience is not requiredProduction system deployment experience is beneficial but not mandatoryLocation & TravelEU-based (CET ±1) with occasional travel to customer sitesPreference for candidates in Spain; otherwise, EuropeCompetitive Compensation & Real Equity Offered.Interview ProcessDeep Technical Session – with CTO, focused on past DRL work (no coding tests, no homework)Traits & Skills Interviews – Two × 1-hour sessions with co-founders to assess problem-solving, communication, and startup fitTeam Meet & Offer – final discussion and reference checkWhy This Role is ExcitingWork at the frontier of DRL robotics in real-world, high-throughput industrial applicationsHigh autonomy, technical ownership, and direct impact on deployed AI systemsSmall, experienced founding team and strong early customer traction reduces commercial risk while maximizing technical challengeOpportunity to join a founding-stage team with equity and influence over core product and technology
Paddy Hobson