This is a hands-on, builder-focused role. You will be designing and training models that solve real clinical and operational problems, integrating structured and unstructured data, and shaping the long-term ML roadmap as the company scales its US product. What you will be doing Data preprocessing
- Clean, transform, and prepare large, complex healthcare datasets for ML model development.
- Handle missing values, outlier detection, feature engineering, and normalization at scale.
- Identify, collect, and curate relevant industry-specific datasets for retraining and fine-tuning.
- Format data appropriately for the chosen LLM and training pipeline.
- Design, train, and fine-tune LLMs on extensive healthcare data to solve specific clinical or operational problems.
- Set up and manage the training environment, including GPU instances and supporting tooling.
- Fine-tune pre-trained LLMs on custom datasets to hit specific objectives.
- Run hyperparameter experiments (learning rate, batch size, training epochs) to optimize performance.
- Integrate structured and unstructured data into multimodal and multi-input models.
- Evaluate model performance using appropriate metrics, identify gaps, and implement targeted optimizations.
- Build and maintain robust, scalable data and ML pipelines spanning training, inference, and deployment.
- Collaborate closely with data scientists, clinicians, and software engineers to integrate models into production.
- Maintain clear documentation of models, pipelines, and experimental results.
- 5 years of experience in Machine Learning Engineering or a comparable role.
- Proven experience with large-scale data preprocessing, LLM and model training, and fine-tuning.
- Distributed training experience with PyTorch Distributed, DeepSpeed, Ray, or Hugging Face Accelerate.
- GPU/TPU optimization and memory management for large language models.
- Strong Python and core ML stack: PyTorch, TensorFlow, Scikit-learn, Pandas, NumPy.
- Solid grasp of ML algorithms, large language models, and deep learning architectures.
- Hands-on healthcare data experience.
- Experience with cloud platforms (GCP strongly preferred; AWS considered) and distributed compute frameworks like Spark.
- Familiarity with MLOps practices and tooling.
- Bachelor's or Master's in Computer Science, Machine Learning, AI, or a related quantitative field.
Most of the cancer industry focuses on treatment. This team is focused on detection and prevention, where the impact on survival rates is greatest. The founders are practising doctors who have lived in the problem space first-hand, and the company is tech-first, with the majority of headcount sitting in engineering, data, and ML. Why join
- Real-world impact: AI that directly contributes to earlier cancer detection and improved patient outcomes.
- Greenfield US build at a critical inflection point, with high ownership from day one.
- Series A backing from a top-tier global VC.
- Builder culture: production-grade work, not research or prototypes.
- Direct exposure to the CTO and senior AI leadership in a flat, fast-moving environment.
- Continuous learning, with access to the latest tools and methods in AI and healthcare.
- Competitive base salary plus meaningful equity.
- Fully remote across the United States.
- Flexible working arrangements.
- Continuous learning opportunities and access to leading AI tooling.
- The chance to do work that genuinely matters: building AI that helps save lives.
