About The Company
TensorOpera provides the generative AI platform and foundation models to enable developers and enterprises to build and commercialize their own generative AI applications easily, scalably, and economically. Its flag product, TensorOpera AI, provides unique features in enterprise AI platforms, model deployment, model serving, AI agent APIs, launching training/Inference jobs on serverless/decentralized GPU cloud, experimental tracking for distributed training, federated learning, security, and privacy.
About The Position
We are looking for a skilled and innovative Machine Learning Engineer in Model Deployment and Inference to join our AI team, focusing on the development and deployment of LLM/generative AI models. In this role, you will design and implement cutting-edge AI model deployment workflows and inference engines, driving research from prototype to production. You will play a key part in enhancing the scalability, stability, and security of the TensorOpera® Deploy system, which operates across a globally distributed GPU infrastructure.
You will manage the life cycle of model serving workloads, addressing complex challenges in geo-distributed orchestration, including network topologies, consistency, and observability. The ideal candidate has a strong foundation in deep learning frameworks, model deployment, and optimization for AI systems, with a focus on real-world production environments.
Responsibilities
- Research and development of the cutting-edge LLM/geneartive AI model deployment workflow and inference engine, and ship the research prototype into production
- Improving the scalability, stability, observability, and security/privacy of TensorOpera ® Deploy (https://TensorOpera.ai) system which sits on globally distributed GPU cluster infrastructure
- Manage the life cycle of model serving workloads by writing stateful controllers and schedulers; conquer the challenges of geo-distributed orchestration, including network topologies, consistency, and observability
Minimum Qualifications
- BS or MS in Computer Science or Computer Engineering
- At least 1 year of full-time working experience in designing and implementing model deployment and inference platform
- Familiar with popular LLM and generative AI models such as Llama3, Mistral, Qwen, Stable Diffusion, 3D Generation, etc.
- Familiar with Pytorch/Tensorflow deep learning frameworks, CUDA, Docker, Kubernetes, FastAPI, HuggingFace TGI, vLLM, TensorRT, etc.
- Have ever developed machine learning systems for improving latency, throughput, scalability, stability, observability, and security/privacy.
Preferred Qualifications
- Worked at top-tier big techs or AI startups for model deployment and inference for at least 1 year
- Experience in popular libraries such as HuggingFace TGI, vLLM, TensorRT would be a plus
- Experienced in optimizing AI inference systems's latency, throughput and success rate in production environments.
- Familiar with the lifecycle management of model serving jobs. Experience of writing Kubernetes Operators is preferred but not required; proficient in low-level network management and simulation techniques across VPC
- Cutting-edge research for business goals in faster and cheaper model inference systems for LLM and generative AI, published papers in machine learning and distributed systems related conferences such as ICML, NeurIPS, ICLR, MLSys, SOSP, OSDI, VLDB, SIGMOD, etc.
Compensation
We offer competitive compensation, startup equity, health insurance, and additional benefits. The US base salary range for this full-time position is $150,000 - $250,000 + equity + benefits. Salary ranges are determined by location, level, and role, with individual compensation based on experience, skills, and job-related knowledge.
Equal Opportunity
TensorOpera is committed to fostering an inclusive and diverse work environment. We are an Equal Opportunity Employer, offering equal employment opportunities to all individuals regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and other protected characteristics.