Work In Data Center
← Back to Jobs
N
Featured

AI/ML Infrastructure Specialist

NeuralScale AI
San Jose, CA
Full-time
Posted Dec 27, 2025
Emerging Technologies & Specialty RolesAIGPUMachine LearningNVIDIAHPCInfiniBand

Job Description

NeuralScale AI is seeking an Infrastructure Specialist to design and operate GPU clusters for AI/ML workloads. Key Focus Areas: • Deploy and maintain NVIDIA DGX and HGX systems • Optimize GPU cluster performance and utilization • Implement high-speed InfiniBand/RoCE networks • Manage storage systems for ML datasets (100+ PB) • Troubleshoot GPU hardware and driver issues • Work with data scientists on infrastructure needs

Requirements

• 3+ years GPU/HPC infrastructure experience • Deep knowledge of NVIDIA hardware and CUDA • Experience with InfiniBand, NVLink, GPUDirect • Strong Linux administration skills • Understanding of ML frameworks (PyTorch, TensorFlow) • Python scripting for automation • Bachelor's in Computer Science or related field

Apply for this job

Sign in or create an account to apply

About the Company

NeuralScale AI

Posted by: DataCenter Solutions Inc

Job Details

Job TypeFull-time
CategoryEmerging Technologies & Specialty Roles
Salary Range$130k - $180k
PostedDec 27, 2025
ExpiresJan 26, 2026