About the Role As an IT Infrastructure Engineer, you'll be responsible for establishing infrastructure from the ground up, including capacity planning, disaster recovery, and day-to-day operations. You'll manage, configure, and monitor the company's IT infrastructure, including automated backups, ensure the security and availability of resources, and work closely with engineering and operations teams to provide a robust, scalable IT environment that supports AI and robotics development workflows.
Your Responsibilities
- Infrastructure architecture and operations. Design, implement, and maintain on-premise IT infrastructure across compute, storage, and networking. Perform capacity planning, develop and execute backup and disaster recovery strategies, and maintain comprehensive infrastructure documentation.
- Physical data center and cloud infrastructure. Manage and monitor on-premise IT facilities (servers, cooling, power) and hardware. Design and provision storage and compute/GPU infrastructure for high-performance ML and AI workloads.
- Enterprise networking. Design and implement WAN/LAN/WiFi network topology with proper segmentation and security controls (firewalls, IDS/IPS). Configure and manage enterprise networking equipment including switches, routers, and load balancers.
- System administration and support. Deploy and manage Linux server infrastructure. Configure and deploy employee workstations across Linux, macOS, and Windows, and manage IT equipment procurement. Provide technical troubleshooting and support, and manage user accounts with SSO.
- Vendor management. Establish and manage relationships with technology vendors, negotiate contracts, and coordinate with service providers including ISPs and colocation partners.
- Proven track record in building or transforming infrastructure
- Deep expertise in enterprise networking (WAN/LAN, VLANs, routing, switching, firewalls, VPNs)
- Strong hands-on experience with server hardware assembly, configuration, and maintenance
- Expert knowledge of storage (RAID, SAN/NAS) and backup and recovery solutions
- Experience with Linux server administration and troubleshooting
- Solid understanding of data center operations (power, cooling, security)
- Hands-on experience provisioning and managing GPU infrastructure
- Scripting skills in Python and Bash for automation
- Experience with Infrastructure-as-Code tools such as Terraform and Ansible
- Strong problem-solving and troubleshooting skills for complex hardware and network issues
- Excellent documentation and communication skills
- Self-motivated and able to work independently in a fast-paced environment
