This is an opportunity to join one of the smartest, most ambitious teams in the AI space. Founded in 2023, this fast-growing research and product company is already being talked about alongside some of the biggest names in foundational model development. They’re building powerful, intelligent agent systems and frontier-scale models - and they believe software engineering is the most direct path toward achieving AGI.
With major backing from industry leaders, significant compute infrastructure, and a focus on mission-critical enterprise and public-sector environments, they’re tackling some of the hardest AI challenges out there.
The Role
As a Member of Technical Staff (Pre-Training / Data), you’ll be part of a high-performing Data team inside the Applied Research machinery that powers the company’s pre-training and reinforcement learning breakthroughs. Your goal: build the datasets that make better models possible. This is a hands-on, deeply technical role at the intersection of data engineering, research, and large-scale systems.
What You’ll Do
- Build, scale, and refine huge datasets made up of natural language and source code to train next-gen language models
- Work closely with pre-training, RL, and infrastructure teams to validate your work through fast feedback loops
- Stay ahead of the curve on data generation, curation, and pre-training strategies
- Develop systems to ingest, filter, and structure billions of tokens across diverse sources
- Design controlled experiments that help uncover what works and what doesn’t
- Be a core voice in shaping how the team approaches data for model training - a vital part of their long-term AGI mission
What You Bring
- Solid hands-on experience with large language models or large-scale ML systems
- Strong track record building or working with massive datasets - from raw extraction through to filtering and packaging
- Exposure to training models from scratch - ideally using distributed GPU clusters
- Proficient in Python and ML frameworks like PyTorch or JAX, plus confidence working in Linux, Git, Docker, and cloud/HPC environments
- Great if you also have some C++/CUDA, Triton kernels, or GPU debugging background
- You’re a thinker and a builder - someone who can read the latest paper and turn it into something real, quickly
What’s In It for You
- Fully remote US
- 37 days of paid time off annually
- Comprehensive health cover for you and your dependents
- Monthly team meetups - travel, accommodation, and even family attendance covered
- Home office and wellbeing budget
- A competitive salary plus meaningful equity
- The chance to work with some of the brightest minds in AGI and do genuinely original work
What the Process Looks Like
- Recruiter intro call
- First technical interview focused on LLMs, performance, or core engineering skills
- Second technical deep dive into your domain (pre-training, data, scaling, etc.)
- Culture conversation with the founding engineers
- Final discussion on compensation and alignment
If you’re driven by building systems that could reshape how intelligence works - and you want to be surrounded by people who share that fire - this team is where you belong.