Full-time | On-site | San Francisco | Founded in 2024
Compensation: $165K - $250K
About The Role
As a Research Scientist at a seed-stage AI start up, you'll be at the forefront of developing benchmarks and evaluation methodologies for large language models. Your work will directly impact how cutting-edge AI systems are tested, validated, and deployed in enterprise environments.
Key Responsibilities
Evaluate newly released AI models (e.g., DeepSeek, Gemini, etc.).
Design and build new benchmarks from scratch, including dataset construction, hiring labelers, and authoring white papers.
Enhance methodologies for automated evaluation of generated text.
Work closely with engineering teams to implement and scale evaluation frameworks.
Collaborate with leading AI labs and enterprise customers to refine evaluation strategies.
Who We're Looking For
2+ years of experience in applied AI, with a focus on benchmarking, evaluation methodologies, or language models.
Experience designing and developing new evaluation methodologies is highly valued.
Tech Stack
Backend: Django
Infrastructure: AWS
Frontend: React with TypeScript (TSX)
Perks & Benefits
Equity 0.3% - 5% (flexibility for the right candidate)
Visa Sponsorship is available
Excellence is well rewarded.
Relocation and transportation support.
Health/dental insurance coverage.
Lunch and dinner provided, free snacks/coffee/drinks.
Unlimited PTO.
Friday happy hours with friends and community members
Occasional team outings like rock climbing, hiking, and bowling
About The Company
Current AI model benchmarks are inadequate for real-world applications. At this company, they provide industry-specific performance evaluations for language models, ensuring they are tested on the exact tasks where they will be deployed.
They have built a proprietary evaluation infrastructure that enables large-scale assessment of any LLM model. Their platform collects expert review criteria and applies it to general and task-specific LLM applications, delivering actionable insights into model performance.
Seniority level
Entry level
Employment type
Full-time
Job function
Other
Industries
Software Development
Referrals increase your chances of interviewing at Kadence by 2x