Senior Systems Engineer, SRE, Data and AI

Expedia Group
Expedia Group

Job Overview

Senior Systems Engineer, SRE, Data and AI

If you’re the right person for the job you will be joining the growing Machine Learning Data Processing platform team within Expedia Groups Hybrid Cloud AI and Data Engineering Team.

The MLDP platform services team enables data engineer and data scientist productivity and velocity through an automated, multi, and hybrid cloud data processing platform. This platform is rooted in the principle of infrastructure as code and is constantly evolving and integrating new and interesting technologies.

You’ll be expected to continually learn and utilize the latest open-source tools available to enable a seamless experience between our bare metal, public, and private cloud environments.

What you’ll do:

  • Provisions infrastructure, manage configurations, Implement CI/CD pipelines for applications and infrastructure, implement testing and monitoring tools, and help manage the lifecycle of models and algorithms for the Data Science team

  • Utilize industry standards and best practices to fully automate all environments and deployments

  • Use industry-standard toolset (version-control system, continuous build, continuous delivery/deploy, containers, secret engine, artifact service) in fully automated pipeline

  • Deploy and maintain data science and data engineering development environments

  • Responsible for the design and implementation of the distributed compute clusters and services that form the core of our micro-services platform

  • Working with our internal business partners to gather requirements

  • Developing enterprise platform services utilizing object-oriented methodologies

  • Developing unit tests, functional tests, and integration test frameworks for distributed systems

  • Performing peer reviews, code walkthroughs, and weekly demos

  • CI/CD pipeline management for infrastructure components

  • Must have the ability to be a self-starter and work independently on technical projects but also work collaboratively with project team members through an agile development process that promotes constant team communication

  • Must have excellent communication skills to assist in conducting user interview sessions, requirements gathering, and design reviews

  • Who you are:

  • Distributed compute – You have real-world experience defending a modern distributed compute platform at scale (Mesos/Nomad/Kubernetes)

  • Multi-cloud – You have experience leveraging multiple flavors of public and private cloud (AWS, GCP, Azure, VMWare, OpenStack)

  • Programming skills – You are comfortable writing code in multiple languages, confident in choosing the right strongly or dynamically typed language for the job. Preferred language familiarity: Python, Go, Ruby, Rust, Scala Database skills – You understand the use cases for relational and non-relational data, you’ve implemented code against several different database platforms

  • Development experience – Service Oriented Architecture and micro-services

  • Knowledge of configuration management tools, monitoring tools, cloud platforms, and software delivery tools

  • Experience with Consul, DataDog, Kafka, Splunk, Vault or equivalents preferred

  • Excellent troubleshooting and problem-solving skills

  • Experience working in an agile team environment, conducting code walk-throughs, peer reviews, and producing technical documentation

  • Committed to Open Source Projects. Please provide Github links if appropriate

  • View More
    Job Detail
    Shortlist Never pay anyone for job application test or interview.