As a member of the Dojo team, you will be responsible for enabling Tesla's neural networks to train efficiently on our upcoming in-house custom-silicon supercomputer systems. Join a small team of experienced developers in building the drivers and control plane for the Dojo distributed training system.
What You’ll Do- Work on the Dojo distributed system to improve the reliability on all parts of the control plane stack from drivers to the cluster monitoring and repair routines
- Work with researchers and Dojo software engineers to profile applications and improve driver performance
- Collaborate with the Dojo HW team to understand current HW architecture and propose future improvements
- Degree in Engineering, Computer Science, or equivalent in experience and evidence of exceptional ability
- Comfortable with C++ and C
- Kernel or user-space PCIe device development experience
- Good communication skills