Type: Full-time
About the Company
We're working with a high-growth startup developing AI systems that allow industrial robots to perform tasks they currently cannot, starting with complex warehouse operations like mixed palletizing. Their technology combines deep reinforcement learning (DRL) with modern sequence modeling to tackle control and combinatorial optimization problems where classical approaches fail.
They are a small, highly skilled team. Joining us means having direct impact, minimal bureaucracy, and ownership over core technology that will be deployed in real-world, high-throughput environments.
Role Overview
As the second hire in the DRL team, you will own the end-to-end reinforcement learning stack: from problem formulation to algorithm design, large-scale training, evaluation, and deployment. You will work closely with the technical leadership to translate cutting-edge DRL research into practical production throughput at operational sites.
This role is highly autonomous, requiring a hands-on expert capable of leading experiments, troubleshooting complex issues, and establishing best practices for algorithm development and deployment.
Key Responsibilities
- Design, implement, and ship DRL algorithms (e.g., PPO, SAC, DDQN and variants) incorporating advanced architectures such as encoders, cross-attention, and pointer networks
- Optimize stability and sample efficiency using techniques such as GAE, reward shaping, normalization, entropy/KL control, curriculum learning, and distributional/value-loss tuning
- Set up and manage large-scale training pipelines: multi-GPU training, parallel rollouts, efficient replay/storage, reproducible experiments
- Productionize algorithms with clean, maintainable PyTorch code, profiling, Dockerized services, cloud deployments (AWS), experiment tracking, and dashboards
- Collaborate with leadership to align technology with business goals and customer needs
- Mentor and grow future team members, fostering a culture of technical excellence and innovation
- Proven track record delivering DRL systems beyond academic demos: led at least one end-to-end DRL system from concept to production or achieved a state-of-the-art benchmark in the last 3–5 years
- Deep expertise in reinforcement learning and deep learning, with strong PyTorch skills
- Solid understanding of DRL theory: MDPs, Bellman operators, policy gradients, trust-region/KL methods, λ-returns, stability and regularization in on-policy/off-policy regimes
- Systems experience: Python, Linux, multi-GPU training, Docker, cloud deployments (AWS preferred)
- Comfortable taking ownership of experiments, code quality, and results in a small, high-impact team
- PhD or equivalent experience in DRL is acceptable; strong academic-only candidates considered if they demonstrate deep expertise
- Robotics experience is not required
- Production system deployment experience is beneficial but not mandatory
- EU-based (CET ±1) with occasional travel to customer sites
- Preference for candidates in Spain; otherwise, Europe
Interview Process
- Deep Technical Session – with CTO, focused on past DRL work (no coding tests, no homework)
- Traits & Skills Interviews – Two × 1-hour sessions with co-founders to assess problem-solving, communication, and startup fit
- Team Meet & Offer – final discussion and reference check
- Work at the frontier of DRL robotics in real-world, high-throughput industrial applications
- High autonomy, technical ownership, and direct impact on deployed AI systems
- Small, experienced founding team and strong early customer traction reduces commercial risk while maximizing technical challenge
- Opportunity to join a founding-stage team with equity and influence over core product and technology