Location: Europe (strong preference for Spain, ideally Madrid)
Type: Full-time

About the Company

We're working with a high-growth startup developing AI systems that allow industrial robots to perform tasks they currently cannot, starting with complex warehouse operations like mixed palletizing. Their technology combines deep reinforcement learning (DRL) with modern sequence modeling to tackle control and combinatorial optimization problems where classical approaches fail.

They are a small, highly skilled team. Joining us means having direct impact, minimal bureaucracy, and ownership over core technology that will be deployed in real-world, high-throughput environments.

Role Overview

As the second hire in the DRL team, you will own the end-to-end reinforcement learning stack: from problem formulation to algorithm design, large-scale training, evaluation, and deployment. You will work closely with the technical leadership to translate cutting-edge DRL research into practical production throughput at operational sites.

This role is highly autonomous, requiring a hands-on expert capable of leading experiments, troubleshooting complex issues, and establishing best practices for algorithm development and deployment.

Key Responsibilities

Design, implement, and ship DRL algorithms (e.g., PPO, SAC, DDQN and variants) incorporating advanced architectures such as encoders, cross-attention, and pointer networks
Optimize stability and sample efficiency using techniques such as GAE, reward shaping, normalization, entropy/KL control, curriculum learning, and distributional/value-loss tuning
Set up and manage large-scale training pipelines: multi-GPU training, parallel rollouts, efficient replay/storage, reproducible experiments
Productionize algorithms with clean, maintainable PyTorch code, profiling, Dockerized services, cloud deployments (AWS), experiment tracking, and dashboards
Collaborate with leadership to align technology with business goals and customer needs
Mentor and grow future team members, fostering a culture of technical excellence and innovation

Required Qualifications

Proven track record delivering DRL systems beyond academic demos: led at least one end-to-end DRL system from concept to production or achieved a state-of-the-art benchmark in the last 3–5 years
Deep expertise in reinforcement learning and deep learning, with strong PyTorch skills
Solid understanding of DRL theory: MDPs, Bellman operators, policy gradients, trust-region/KL methods, λ-returns, stability and regularization in on-policy/off-policy regimes
Systems experience: Python, Linux, multi-GPU training, Docker, cloud deployments (AWS preferred)
Comfortable taking ownership of experiments, code quality, and results in a small, high-impact team
PhD or equivalent experience in DRL is acceptable; strong academic-only candidates considered if they demonstrate deep expertise

Nice to Have

Robotics experience is not required
Production system deployment experience is beneficial but not mandatory

Location & Travel

EU-based (CET ±1) with occasional travel to customer sites
Preference for candidates in Spain; otherwise, Europe

Competitive Compensation & Real Equity Offered.

Interview Process

Deep Technical Session – with CTO, focused on past DRL work (no coding tests, no homework)
Traits & Skills Interviews – Two × 1-hour sessions with co-founders to assess problem-solving, communication, and startup fit
Team Meet & Offer – final discussion and reference check

Why This Role is Exciting

Work at the frontier of DRL robotics in real-world, high-throughput industrial applications
High autonomy, technical ownership, and direct impact on deployed AI systems
Small, experienced founding team and strong early customer traction reduces commercial risk while maximizing technical challenge
Opportunity to join a founding-stage team with equity and influence over core product and technology

Deep Reinforcement Learning Engineer

APPLY HERE