We’re looking for a Senior Robotics Perception Engineer to build end-to-end spatial perception systems that combine multi-camera vision, IMU data, and learning-based models into a unified 3D understanding of the world.
You’ll work on problems spanning SLAM, 6DoF pose estimation, multi-device sensor fusion, and calibration, while also leveraging modern computer vision and ML techniques (e.g., monocular depth, action/skill understanding, VLM).
This role sits at the intersection of robotics, 3D computer vision, and applied ML.
What You’ll Work On
-
Build real-time perception pipelines combining:
-
Multi-camera systems (head-mounted + wrist-mounted cameras)
-
IMU + RGB fusion for accurate camera pose estimation
-
Develop and optimize SLAM / visual-inertial odometry (VIO) systems
-
Design multi-device sensor fusion to align multiple viewpoints into a single scene
-
Implement 3D / 6DoF hand and object pose estimation from RGB / RGB-D inputs
-
Implement object detection models
-
Work on stereo + multi-view geometry pipelines
-
Build robust camera calibration systems:
-
Intrinsics / extrinsics
-
Cross-device calibration
-
Integrate or research ML models for:
-
Monocular depth estimation
-
Action / skill labeling
-
VLM systems
-
Optimize pipelines for real-time performance and robustness
What We’re Looking For
-
Core Skills (Must-Have)Strong foundation in 3D Computer Vision & Geometry
-
Multi-view geometry, epipolar geometry, transformations
-
Experience with SLAM / VIO / sensor fusion
-
Visual SLAM, visual-inertial fusion, state estimation
-
Hands-on experience with camera calibration
-
Intrinsics, extrinsics, stereo calibration
-
Experience working with multi-camera systems
-
Strong programming skills in C++ and/or Python
Good to Have (High Impact)
-
Experience with hand pose / human pose estimation (2D/3D/6DoF)
-
Familiarity with RGB-D / depth sensors
-
Experience with learning-based vision models
-
Monocular depth
-
Pose estimation
-
Action recognition
-
Exposure to VLM or embodied AI systems
-
Experience optimizing for real-time systems (latency, memory, throughput)
-
Familiarity with frameworks like:
-
OpenCV, PyTorch, ROS, COLMAP, ORB-SLAM, OpenVINS, etc.
Nice to Have (Bonus)
-
Experience with multi-device synchronization (time alignment, sensor clocks)
-
Background in robotics, AR/VR, or embodied AI systems
-
Experience deploying models on edge devices / mobile systems
What Makes This Role Unique
-
You’ll work on complex multi-sensor setups (not just single-camera CV)
-
Ownership of end-to-end perception stack (not just modeling or infra)
-
Blend of classical geometry + modern ML
-
Opportunity to shape next-gen embodied / spatial AI systems
Ideal Candidate Profile
Someone who:
-
Can move fluidly between math, systems, and ML
-
Is comfortable debugging real-world sensor noise and calibration issues
-
Has built or worked on real-time perception systems, not just offline models