AI/ML – Search Data Engineer, Siri Data


Job Overview

Key Qualifications

  • You have excellent written and verbal communication skills
  • You are curious and have excellent analytical and problem solving skills
  • You are excited about digging into massive petabyte-scale semi-structured datasets
  • 1+ years of industry experience working with distributed data technologies (e.g. Hadoop, MapReduce, Spark, etc.)
  • Proficiency in at least one high-level programming language (Python, Go, Java, Scala, or equivalent)
  • Experience with large, complex, highly dimensional data sets; hands-on experience with SQL
  • You are pragmatic, not letting “the perfect” be the enemy of “the good”
  • You are self-directed and capable of operating amidst ambiguity
  • You are humble, continually growing in self-awareness and possessing a growth mindset
  • Extras we’d be excited about…
  • Experience building stream-processing applications using Apache Flink, Spark-Streaming, Apache Storm, Kafka Streams or others
  • Experience with data engineering in support of ML: Anomaly detection in time series data, engineering work to product-ionize models developed by data scientists, etc.
  • Description

    Developing data pipelines and/or software libraries to process, transform, and analyze data to identify signals from the billions of events we collect every dayDesigning and building abstractions that hide the complexity of the underlying big data stack (HDFS, Hadoop, Hive, Impala, Spark, Kafka, Parquet, etc) and that allow partners to focus on their strengths: product, data modeling, data analysis, search, information retrieval, and machine learningDefining and implementing the “source of truth” for our most fundamental data—such as search activity and content—as well as our core metrics across a variety of productsOptimizing end-to-end workflows of data users (crafting libraries, providing abstractions to define jobs, scheduling data pipelines, managing access datasets, etc)Building internal services and tools to help in-house partners implement, deploy and analyze datasets with a high level of autonomy and limited friction. Surfacing datasets in near-real-time to mission critical products and business applications throughout the company, providing the signal that feeds our machine learning algorithms as well as our daily product-defining decisionsAutomating and handling lifecycle of datasets (schema evolution, metadata store, backfill management, deprecation, migration)Improving the quality and reliability of our pipelines (monitoring, retry, failure detection)

    Education & Experience

    Surprise us! Many will have an MS or BS in CS, Engineering, Math, Statistics, or a related field or equivalent practical experience in data engineering.

    View More
    Job Detail
    Shortlist Never pay anyone for job application test or interview.