Go Back

NVIDIA GPU Engineer WFH

Qubrid AI ↗

📍 India, India 🇮🇳

full-time

junior

remote

Posted —

Apply Now ↗

Key Skills

GPUCUDANCCLTensorRTKubernetes

Industry

Consumer ElectronicsAI

Job Description

Read everything carefully. The requirements and screening questions are critical and if not answered correctly and satisfactorily will result in auto-rejection and waste of your time.

Work from Home.
This is a full-time role. If you plan to do 2 or more jobs at the same time or want to do this part-time, that won't work for us. In that case please do not apply as it will get auto-rejected
Note - this job requires working late night India time until 4AM to overlap with USA working times. Do not apply if this timing doesn't work
Salary depends on experience and current verifiable (paychecks) compensation.
Junior candidates with 2 years experience are suitable

About Qubrid AI

Qubrid AI is building a full-stack AI infrastructure platform that combines GPU cloud, inference APIs, AI orchestration software, and enterprise AI infrastructure. Our platform powers AI workloads across cloud, hybrid, and on-prem environments using state-of-the-art NVIDIA technologies and open-source AI frameworks.

We are looking for a hands-on GPU Infrastructure Engineer with deep expertise in NVIDIA GPU systems, inference platforms, clustering, and high-performance networking. This role requires someone who has built and operated production GPU environments and understands the entire stack from hardware and drivers to inference serving and performance optimization.

Role Overview

As a GPU Infrastructure Engineer, you will be responsible for deploying, managing, and optimizing NVIDIA GPU infrastructure used for AI training and inference. You will work on GPU servers, cluster orchestration, partitioning technologies, networking, monitoring, and inference frameworks such as NVIDIA Triton Inference Server. You should be comfortable troubleshooting issues at the hardware, OS, networking, and application layers.

Responsibilities

Deploy, configure, and maintain NVIDIA GPU servers and clusters.
Install and manage NVIDIA drivers, CUDA, cuDNN, NCCL, TensorRT, and related software stacks.
Build and operate GPU clusters supporting AI training and inference workloads.
Configure and manage GPU partitioning technologies including MIG (Multi-Instance GPU) and time-slicing.
Deploy and optimize NVIDIA Triton Inference Server environments.
Monitor GPU utilization, temperatures, memory usage, power consumption, and overall cluster health.
Implement high-availability and resource-sharing mechanisms for multi-tenant environments.
Configure and troubleshoot RDMA, RoCEv2, InfiniBand, NVLink, and high-speed Ethernet networking.
Optimize inference performance using TensorRT, batching, model parallelism, and quantization techniques.
Deploy GPU workloads on Kubernetes using NVIDIA GPU Operator and container technologies.
Develop automation for provisioning and monitoring using Python, Bash, Ansible, or APIs.
Troubleshoot hardware, driver, CUDA, NCCL, and networking issues.
Support customer deployments and production environments.
Maintain operational documentation and best practices.

Required Qualifications

2-3+ years of experience managing GPU servers and AI infrastructure.
Deep expertise with NVIDIA GPUs including H100, H200, B200, A100, L40S, and related platforms.
Strong understanding of:
CUDA
cuDNN
NCCL
TensorRT
NVLink and NVSwitch
MIG (Multi-Instance GPU)
GPU virtualization and sharing
Hands-on experience with NVIDIA Triton Inference Server.
Experience deploying and managing AI clusters and distributed environments.
Strong Linux administration skills (Ubuntu, RHEL, Rocky Linux).
Experience with Docker and Kubernetes.
Knowledge of GPU scheduling and resource management.
Understanding of high-performance networking concepts including:
RoCEv2
RDMA
InfiniBand
100G/200G/400G Ethernet
Mellanox/NVIDIA ConnectX adapters
Strong troubleshooting skills across hardware, drivers, CUDA libraries, and networking.
Experience with monitoring and observability platforms.

Preferred Qualifications

Experience with NVIDIA GPU Operator, Kubernetes, and Helm.
Experience with Slurm or Kubernetes-based scheduling.
Familiarity with vLLM, TensorRT-LLM, SGLang, Ollama, and other inference frameworks.
Experience serving large language models such as Llama, DeepSeek, Qwen, Mistral, and Gemma.
Knowledge of quantization techniques including FP8, INT8, and AWQ.
Experience with Prometheus, Grafana, DCGM, and NVIDIA NIM.
Understanding of SONiC, Cumulus, or Arista data center networking.
Familiarity with Ceph, BeeGFS, Lustre, or high-performance storage systems.
Experience supporting AI factories, NeoClouds, or HPC environments.

Nice to Have

NVIDIA certifications or equivalent experience.
Experience with DGX, HGX, and Supermicro GPU platforms.
Familiarity with NVIDIA Base Command Manager and Mission Control.
Python scripting and infrastructure automation experience.
Contributions to open-source AI infrastructure projects.

What You'll Work On

Multi-node NVIDIA GPU clusters.
Enterprise AI infrastructure and AI factories.
High-performance inference platforms.
GPU sharing and multi-tenancy environments.
NVIDIA Triton and TensorRT optimization.
Kubernetes-based GPU orchestration.
Hybrid cloud and on-prem AI deployments.
Next-generation AI infrastructure powering large-scale inference.

Why Join Qubrid AI?

At Qubrid AI, you'll help build the infrastructure powering the next wave of AI. You'll work with cutting-edge NVIDIA technologies, large-scale GPU clusters, and modern inference platforms to deliver AI infrastructure for enterprises and developers worldwide.

If you are passionate about GPUs, performance optimization, and building world-class AI infrastructure, we'd love to hear from you.