Go Back

GPU Engineer Team Leader

Moreh Vietnam ↗

📍 Vietnam, Vietnam 🇻🇳

full-time

lead

Posted —

Apply Now ↗

Key Skills

CUDAHIPOpenCLGPUPython

Industry

Consumer ElectronicsRobotics

Job Description

ABOUT MOREH VIETNAM

Moreh Vietnam is developing Moreh-vLLM, a highly optimized inference framework for LLMs and generative models across diverse GPU and NPU architectures. Our innovations span GPU kernel development and optimization for single-GPU and multi-GPU systems, AI compiler optimization and R&D at the intersection of HPC and AI framework development.

ROLE OVERVIEW

We are looking for a GPU Engineer Team Leader to lead a focused team of 4–5 engineers in building and optimizing high-performance GPU software. You will own technical delivery, guide the team's day-to-day work, conduct code reviews, and grow each engineer's skills — while staying hands-on with the most critical and complex challenges for GPU.

This role sits at the intersection of technical excellence and team leadership. You are expected to be a strong individual contributor as well as a people-first leader who takes responsibility for the team's output, culture, and growth.

KEY RESPONSIBILITIES

Team Leadership & Delivery

Lead, mentor, and develop a team of 4–5 GPU/HPC engineers, fostering a culture of technical excellence and continuous improvement.
Plan and coordinate sprint tasks, manage workload distribution, and ensure timely, high-quality delivery of GPU software components.
Conduct regular 1:1s, provide constructive feedback, and support engineers' career development.
Translate high-level technical goals from senior engineers or engineering management into actionable team tasks.
Track team progress, surface blockers early, and communicate status clearly to stakeholders.

Hands-on Technical Contribution

Write production-quality GPU kernel code in CUDA, HIP, or OpenCL for AI training and inference workloads.
Lead code reviews and enforce coding standards, GPU optimization patterns, and software quality across the team.
Conduct performance profiling and optimization of GPU kernels and memory hierarchies
Contribute directly to the most technically challenging features, debugging complex issues that require deep GPU systems knowledge.

MINIMUM QUALIFICATIONS

Bachelor's degree in Computer Science, Computer Engineering, or a related field.
Strong proficiency in C++ and Python.
Experience with CUDA, HIP, or OpenCL.
Experience with GPU memory hierarchy optimization (shared memory, registers, coalescing, occupancy).
Familiarity with deep learning frameworks (PyTorch or TensorFlow) and how they interact with GPU computers.
Demonstrated experience leading or mentoring a small engineering team (2+ people) in a technical setting.
Strong analytical and problem-solving skills; ability to diagnose and resolve complex GPU software issues.
Good written and verbal communication skills for team coordination and documentation.

PREFERRED QUALIFICATIONS

Master's degree or Ph.D. in Computer Science, Computer Engineering, AI, or related field.
2+ years of experience writing system software for GPUs in a professional setting.
Experience with distributed GPU computing, multi-GPU coordination, or parallel runtime systems.
Knowledge of AI model architecture and its impact on GPU workload design (e.g., attention mechanisms, matrix operations).
Track record of successfully owning delivery for a team or module end-to-end.
Experience with profiling tools such as Nsight Compute, Nsight Systems, or AMD ROCm profiler.
Contributions to open-source GPU/HPC projects or publications at relevant conferences (PPoPP, HPDC, SC, MICRO, etc.).