As the leading data and evaluation partner for frontier AI companies, Scale plays an integral role in understanding the capabilities and safeguarding large language models (LLMs). Safety, Evaluations and Analysis Lab (SEAL) is Scale’s new frontier research effort dedicated to building robust evaluation products and tackling the challenging research problems in evaluation and red teaming. At SEAL, we are passionate about ensuring transparency, trustworthiness, and reliability of language models, while simultaneously igniting the advancement of model capabilities and pioneering novel skills - we are setting the northern star for the AI community, where safety and innovation illuminate the path forward.
We are seeking talented research interns to join us in shaping the landscape for safety and transparency for the entire AI industry. We support collaborations across the industry and the publication of our research findings. This year, we are seeking top-tier candidates for multiple projects, focusing on frontier agent data, evaluation and safety; scalable oversight and alignment of LLMs; science of evaluation for LLM; and exploring the frontier and potentially dangerous capabilities of LLMs with effective guardrails. Below is a list of SEAL’s representative projects:
Example Projects:
- Adversarial robustness, jailbreaks and safety red teaming
- Measuring the dangerous capabilities of frontier models and conducting preparedness research
- Research on the science and creation of new benchmarks for frontier models
- Building frontier evaluations for LLMs and agents such as AI R&D
- Developing scalable oversight protocols and red teaming oversight methods
- Develop, evaluate and improve the agentic use of frontier models, including tool-use, SWE coding, browser-use, OS-related, computer-use/GUI and other related agents.
Required to have:
- Currently enrolled in a BS/MS/PhD Program with a focus on Machine Learning, Deep Learning, Natural Language Processing or Computer Vision with a graduation date in Fall 2025 or Spring 2026
- Prior experience or track of research publication in agent, safety, evaluation, alignment or a related field
- Experience with one or more general purpose programming languages, including: Python, Javascript, or similar
- Ability to speak and write in English fluently
- Be available for a Summer 2025 (May/June starts) internship
Ideally you’d have:
- Have had a previous internship around Machine Learning, Deep Learning, Natural Language Processing, Adversarial Robustness, Alignment, Evaluation and Agents.
- Experience as a researcher, including internships, full-time, or at a lab
- Publications in top-tier ML conferences such as NeurIPS, ICLR, CVPR, ICML, COLM, etc. or contributions to open-source projects