NVIDIA logo

NVIDIA

GPU Kernel Compiler Engineer, AI Inference

NVIDIA

📍 Santa Clara, United States 🇺🇸

full-time
mid-level
148000
Posted —
Key Skills
C++ Python CUDA MLIR LLVM
Industry
Semiconductor AI

Job Description

NVIDIA’s AI and GPU software is at the forefront of computing fueling breakthroughs across deep learning, LLMs, and intelligent applications. Our team is building solutions for rapid development and deployment of GPU kernels for AI systems. We take the latest AI models, rigorously analyze them, develop and deploy high-performance GPU kernels that define model performance and integrate the derived techniques and methodologies into the tools that automate this process.

This role is a unique opportunity to shape the next generation of AI performance and efficiency. You will work hands-on with emerging AI models, collaborating across compiler, AI inference, and model performance teams. The focus is on building programming solutions that can be applied to concrete AI inference use cases to deliver real-world performance and development efficiency wins.

What You Will Be Doing

  • Analyze state-of-the-art AI models, identifying key performance bottlenecks and opportunities at the kernel level.
  • Develop, optimize, and evaluate both hand-tuned and compiler-generated kernels for inference workloads, balancing speed and flexibility.
  • Design and build high-level DSLs and innovative compiler infrastructure to increase kernel developer productivity while achieving near peak performance.
  • Collaborate with model AI inference and compiler teams to iterate on kernel fusion, auto tuning, and sophisticated GPU programming techniques.
  • Benchmark performance across real workloads, diagnose root causes, and rapidly deploy optimizations that maximize hardware utilization on NVIDIA platforms.

What We Need To See

  • Bachelor’s, master’s or PhD degree in Computer Science, Computer Engineering or related field, or equivalent experience.
  • At least 3+ years Strong C++ and/or Python programming skills for system and performance engineering.
  • Understanding of GPU architecture and proficiency in CUDA programming.
  • Intellectual curiosity and interest to solve exciting problems and deliver practical results in production environments.

Ways To Stand Out From The Crowd

  • Experience designing, developing and optimizing high-efficiency GPU kernels for modern AI workloads.
  • Experience building compilers, domain-specific languages, or automatic optimization systems
  • Familiarity with popular compiler, GPU programming and AI frameworks such as MLIR, LLVM, PyTorch, XLA, Triton or Cutlass.
  • Experience with AI/ML inference workloads and model performance analysis.
  • Strong communication skills and ability to collaborate in a cross-team environment.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 148,000 USD - 235,750 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.

You will also be eligible for equity and benefits .

Applications for this job will be accepted at least until October 27, 2025.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

JR2006516