← Back to Jobs
N
Featured
AI/ML Infrastructure Specialist
NeuralScale AI
San Jose, CA
Full-time
Posted Dec 27, 2025
Emerging Technologies & Specialty RolesAIGPUMachine LearningNVIDIAHPCInfiniBand
Job Description
NeuralScale AI is seeking an Infrastructure Specialist to design and operate GPU clusters for AI/ML workloads.
Key Focus Areas:
• Deploy and maintain NVIDIA DGX and HGX systems
• Optimize GPU cluster performance and utilization
• Implement high-speed InfiniBand/RoCE networks
• Manage storage systems for ML datasets (100+ PB)
• Troubleshoot GPU hardware and driver issues
• Work with data scientists on infrastructure needs
Requirements
• 3+ years GPU/HPC infrastructure experience
• Deep knowledge of NVIDIA hardware and CUDA
• Experience with InfiniBand, NVLink, GPUDirect
• Strong Linux administration skills
• Understanding of ML frameworks (PyTorch, TensorFlow)
• Python scripting for automation
• Bachelor's in Computer Science or related field
