ABOUT MOREH VIETNAM
Moreh Vietnam is developing Moreh-vLLM, a highly optimized inference framework for LLMs and generative models across diverse GPU and NPU architectures. Our innovations span GPU kernel development and optimization for single-GPU and multi-GPU systems, AI compiler optimization and R&D at the intersection of HPC and AI framework development.
ROLE OVERVIEW
We are looking for a GPU Engineer Team Leader to lead a focused team of 4–5 engineers in building and optimizing high-performance GPU software. You will own technical delivery, guide the team's day-to-day work, conduct code reviews, and grow each engineer's skills — while staying hands-on with the most critical and complex challenges for GPU.
This role sits at the intersection of technical excellence and team leadership. You are expected to be a strong individual contributor as well as a people-first leader who takes responsibility for the team's output, culture, and growth.
KEY RESPONSIBILITIES
Team Leadership & Delivery
-
Lead, mentor, and develop a team of 4–5 GPU/HPC engineers, fostering a culture of technical excellence and continuous improvement.
-
Plan and coordinate sprint tasks, manage workload distribution, and ensure timely, high-quality delivery of GPU software components.
-
Conduct regular 1:1s, provide constructive feedback, and support engineers' career development.
-
Translate high-level technical goals from senior engineers or engineering management into actionable team tasks.
-
Track team progress, surface blockers early, and communicate status clearly to stakeholders.
Hands-on Technical Contribution
-
Write production-quality GPU kernel code in CUDA, HIP, or OpenCL for AI training and inference workloads.
-
Lead code reviews and enforce coding standards, GPU optimization patterns, and software quality across the team.
-
Conduct performance profiling and optimization of GPU kernels and memory hierarchies
-
Contribute directly to the most technically challenging features, debugging complex issues that require deep GPU systems knowledge.
MINIMUM QUALIFICATIONS
-
Bachelor's degree in Computer Science, Computer Engineering, or a related field.
-
Strong proficiency in C++ and Python.
-
Experience with CUDA, HIP, or OpenCL.
-
Experience with GPU memory hierarchy optimization (shared memory, registers, coalescing, occupancy).
-
Familiarity with deep learning frameworks (PyTorch or TensorFlow) and how they interact with GPU computers.
-
Demonstrated experience leading or mentoring a small engineering team (2+ people) in a technical setting.
-
Strong analytical and problem-solving skills; ability to diagnose and resolve complex GPU software issues.
-
Good written and verbal communication skills for team coordination and documentation.
PREFERRED QUALIFICATIONS
-
Master's degree or Ph.D. in Computer Science, Computer Engineering, AI, or related field.
-
2+ years of experience writing system software for GPUs in a professional setting.
-
Experience with distributed GPU computing, multi-GPU coordination, or parallel runtime systems.
-
Knowledge of AI model architecture and its impact on GPU workload design (e.g., attention mechanisms, matrix operations).
-
Track record of successfully owning delivery for a team or module end-to-end.
-
Experience with profiling tools such as Nsight Compute, Nsight Systems, or AMD ROCm profiler.
-
Contributions to open-source GPU/HPC projects or publications at relevant conferences (PPoPP, HPDC, SC, MICRO, etc.).