Machine Learning Engineer

Lumino • Full-time • San Mateo, CA, US • 2d ago

About Lumino

At Lumino, our mission is to unlock the power of AI for every human, and we can’t do this without having the best people in the world on the team. AI is one of the next set of technologies that will unlock vast potential of human innovation, empowering us to solve problems that were thought to be unsolvable. Lumino is a technology company that builds infrastructure which enables anyone to create AI models. We are backed by prominent VCs such as Longhash Ventures, OP Crypto, Protocol Labs, Quaker Capital, Escape Velocity, and OrangeDAO.

We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability status, gender identity or Veteran status.

About the role:

We’re looking for a Machine Learning Engineer to join our team and help set the foundations of the company. You will be responsible for designing and building decentralized and distributed AI training pipelines, optimizing training for fast performance and low costs, and conducting research on cutting edge techniques for training on heterogeneous environments.

You will:

Design, develop, and optimize machine learning models and algorithms for various applications, including computer vision, NLP, and audio/video processing. Implement basic to advanced model architectures, starting with minimalist implementations.
Collect, clean, and preprocess data to create robust training datasets, working with complex datasets to ensure high-quality inputs for model training.
Train various deep learning models on different GPUs, including multi-GPU setups, and improve model performance and resource utilization by fine-tuning hyperparameters and using advanced techniques like LoRA and QLoRA quantization.
Evaluate model performance using appropriate metrics such as F1 scores, and conduct experiments to improve model accuracy and robustness.
Deploy machine learning models into production environments, ensuring scalability, efficiency, and reliability while managing the deployment of models on both cloud and bare-metal infrastructure.
Work closely with cross-functional teams to understand business requirements and translate them into technical solutions, collaborating with data scientists and software engineers to integrate models into production systems.
Monitor and maintain deployed models to ensure they continue to perform as expected, and implement processes for model retraining and updates as needed.
Optimize training performance on various GPUs (e.g., V100, T4, RTX3090, A100) and assess trade-offs to minimize training time and improve infrastructure efficiency.
Capture and analyze benchmarks on heterogeneous infrastructure, making performance improvements based on benchmark results.
Own and manage MLOps processes, train custom Lumino models including fraud detection, build and improve internal inference systems for model evaluation, and enhance existing evaluation processes in the ML pipeline.

Requirements:

You have 2+ years of experience as a Data Scientist or Machine Learning Engineer
You have 1+ years in Python and ML frameworks such as PyTorch, TensorFlow, and Jax.
You have experience in building, serving, and fine-tuning machine learning and large language models
You have strong analytical skills with the ability to navigate ML system trade-offs
You have a high degree of initiative and end-to-end project ownership
You have strong communication and collaboration abilities
You have excellent problem-solving skills and ability to learn quickly

Nice to haves:

Contributions to open source projects
Previous experience in a startup environment
Experience with latest Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) techniques.

What we offer: