About Anthropic

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

About the role:

As a Research Engineer in the Scaling team, you will directly train the models we launch to the public via Claude.AI and our API. In this role, you will design, implement, and optimize large-scale distributed systems that interact with state-of-the-art hardware accelerators. You will drive operational overhead towards zero through automation, and system uptime towards 99.9%. You will be at the nexus of systems, infrastructure, and deep learning. Your work on the scaling team will have a direct and massive impact on the company’s success.

Responsibilities:

Implementing and optimizing distributed training algorithms on accelerators, and surrounding distributed systems to train frontier models.
Optimize and debug workloads on the latest generation hardware accelerators and data center networks.
Collaborating with ML researchers, accelerator performance optimization teams, network teams, cluster management teams, and beyond.
Delivering innovation at the high-leverage intersection of systems and machine learning.

You may be a good fit if you:

Have experience with Python and one additional high performance language (Rust, C, C++, D, Java, C#, Fortran, Go, Swift, etc.)
Are results-oriented, with a bias towards flexibility and impact
Enjoy being empowered to fix problems wherever they show up.
Enjoy pair programming (we love to pair!)
Want to learn more about machine learning research
Care about the societal impacts of your work
Have clear written and verbal communication

Strong candidates may also have experience with:

Have experience working with hardware accelerators, HPC, machine learning, networking, or distributed systems
Have experience or a strong interest in machine learning
Have experience with complex shared codebases
Have experience optimizing the performance of programs
Have experience running highly available systems

Deadline to apply: None. Applications will be reviewed on a rolling basis.

Research Engineer, Scaling

About Anthropic

About the role:

Responsibilities:

You may be a good fit if you:

Strong candidates may also have experience with: