Staff/Lead Software Engineer, AI Infrastructure

About the Company

A well-funded Bay Area AI startup operating at the frontier of generative media, with a product shipping to users at scale. The company is building the core infrastructure that powers its AI capabilities, and this is a senior, high-ownership hire on that team.

About the Role

This is a critical hire to build and scale the infrastructure behind the company's AI capabilities. You'll lead the design and implementation of GPU infrastructure, AI model serving APIs, and general AI infrastructure execution, enabling the machine learning features that drive the product.

You'll architect robust, distributed systems optimized for high-performance AI workloads, large-scale GPU orchestration, and low-latency, reliable API serving. Your work will directly shape how users experience generative AI at scale. As a senior technical leader, you'll also mentor engineers, drive best practices, and set the technical vision for AI infrastructure.

What You'll Do

  • Design, develop, and maintain scalable GPU infrastructure for training and serving state-of-the-art AI models.
  • Architect and optimize high-throughput, low-latency APIs for AI model serving and inference.
  • Lead the orchestration, scheduling, and efficient utilization of heterogeneous GPU resources across clusters.
  • Build and support robust systems for model deployment, monitoring, scaling, and reliability in production.
  • Collaborate with ML, backend, and platform engineering teams to deliver seamless AI-powered product features.
  • Drive technical direction, code reviews, and mentorship across the AI Infrastructure team.
What We're Looking For

  • 5 years as a software engineer working on systems infrastructure, including hands-on ML serving and GPU orchestration.
  • Deep knowledge of distributed systems, Kubernetes (or similar orchestration frameworks), and cloud-native infrastructure (AWS/GCP/Azure).
  • Proven expertise building and optimizing APIs for large-scale AI model serving (TensorFlow Serving, Triton, TorchServe, or similar).
  • Familiarity with the challenges of high-throughput, scalable GPU fleet management, scheduling, and efficient model execution.
  • Proficiency in backend languages such as Python, Go, or C , with experience optimizing for performance and reliability.
  • Ownership mentality and the drive to solve complex problems independently in ambiguous, high-growth environments.
  • Excellent communication, collaboration, and mentorship skills.
Nice to Have

  • Experience with multi-modal AI model infrastructure (LLMs, generative models, video/image/speech models).
  • Background building infra for multi-tenant SaaS, enterprise AI/ML platforms, or operational automation at scale.
  • Previous startup experience, or a track record leading high-impact projects through ambiguity and rapid iteration.
  • Experience with competitive coding or large-scale distributed computing environments.