GPU Engineer Team Leader

Moreh Vietnam 

📍 Vietnam, Vietnam 🇻🇳

full-time
lead
Posted —

Key Skills

CUDAHIPOpenCLGPUPython

Industry

Consumer ElectronicsRobotics

Job Description

ABOUT MOREH VIETNAM

Moreh Vietnam is developing Moreh-vLLM, a highly optimized inference framework for LLMs and generative models across diverse GPU and NPU architectures. Our innovations span GPU kernel development and optimization for single-GPU and multi-GPU systems, AI compiler optimization and R&D at the intersection of HPC and AI framework development.


ROLE OVERVIEW

We are looking for a GPU Engineer Team Leader to lead a focused team of 4–5 engineers in building and optimizing high-performance GPU software. You will own technical delivery, guide the team's day-to-day work, conduct code reviews, and grow each engineer's skills — while staying hands-on with the most critical and complex challenges for GPU.


This role sits at the intersection of technical excellence and team leadership. You are expected to be a strong individual contributor as well as a people-first leader who takes responsibility for the team's output, culture, and growth.


KEY RESPONSIBILITIES

Team Leadership & Delivery

  • Lead, mentor, and develop a team of 4–5 GPU/HPC engineers, fostering a culture of technical excellence and continuous improvement.
  • Plan and coordinate sprint tasks, manage workload distribution, and ensure timely, high-quality delivery of GPU software components.
  • Conduct regular 1:1s, provide constructive feedback, and support engineers' career development.
  • Translate high-level technical goals from senior engineers or engineering management into actionable team tasks.
  • Track team progress, surface blockers early, and communicate status clearly to stakeholders.

Hands-on Technical Contribution

  • Write production-quality GPU kernel code in CUDA, HIP, or OpenCL for AI training and inference workloads.
  • Lead code reviews and enforce coding standards, GPU optimization patterns, and software quality across the team.
  • Conduct performance profiling and optimization of GPU kernels and memory hierarchies
  • Contribute directly to the most technically challenging features, debugging complex issues that require deep GPU systems knowledge.


MINIMUM QUALIFICATIONS

  • Bachelor's degree in Computer Science, Computer Engineering, or a related field.
  • Strong proficiency in C++ and Python.
  • Experience with CUDA, HIP, or OpenCL.
  • Experience with GPU memory hierarchy optimization (shared memory, registers, coalescing, occupancy).
  • Familiarity with deep learning frameworks (PyTorch or TensorFlow) and how they interact with GPU computers.
  • Demonstrated experience leading or mentoring a small engineering team (2+ people) in a technical setting.
  • Strong analytical and problem-solving skills; ability to diagnose and resolve complex GPU software issues.
  • Good written and verbal communication skills for team coordination and documentation.


PREFERRED QUALIFICATIONS

  • Master's degree or Ph.D. in Computer Science, Computer Engineering, AI, or related field.
  • 2+ years of experience writing system software for GPUs in a professional setting.
  • Experience with distributed GPU computing, multi-GPU coordination, or parallel runtime systems.
  • Knowledge of AI model architecture and its impact on GPU workload design (e.g., attention mechanisms, matrix operations).
  • Track record of successfully owning delivery for a team or module end-to-end.
  • Experience with profiling tools such as Nsight Compute, Nsight Systems, or AMD ROCm profiler.
  • Contributions to open-source GPU/HPC projects or publications at relevant conferences (PPoPP, HPDC, SC, MICRO, etc.).

Key Skills

CUDAHIPOpenCLGPUPython

Industry

Consumer ElectronicsRobotics

Join the Live Jobs Feed hot

  • Instant Telegram alerts for top jobs
  • Organized by specialization and region
  • Connect & chat with fellow engineers
Join us on Telegram

Free forever • No spam • Leave anytime

Moreh Vietnam logo
Apply ↗
Moreh Vietnam
Vietnam, Vietnam