Robotics Data Infrastructure Engineer
Engineering · Full-time · San Francisco · On-site
Physical experience arrives as raw robot telemetry, teleoperation traces, video, sensor streams, and deployment logs — from the edge to the cloud. You'll build the ingestion and processing pipelines that turn all of it into clean, versioned, queryable episodes that researchers can actually train on. The customer for this work is our ML team, not a BI dashboard.
What you'll do
- Build ingestion and ETL pipelines that move robot telemetry and sensor data from edge devices to the cloud.
- Manage large-scale multimodal datasets — video, images, robot state, force/torque, point clouds — with versioning, metadata, and retention.
- Build data quality, validation, and observability: schema checks, dedup, lineage, and freshness alerting.
- Automate the path from raw field data to training-ready datasets with orchestrated batch and streaming jobs.
- Build internal tooling to browse, visualize, and debug robot data.
What we're looking for
- Strong Python and experience building and operating production data pipelines.
- Solid cloud experience (AWS or GCP) and systems fundamentals.
- Experience with large-scale ETL and data-processing systems.
- SQL and working with structured and multimodal data at scale.
- A CS or engineering background, or equivalent experience.
Nice to have
- Distributed compute (Spark, Ray, Beam) and orchestration (Airflow, Prefect, Kubernetes).
- Streaming systems (Kafka, Flink) and warehouse/transform tooling (Databricks, dbt).
- Robotics data formats (ROS bags, MCAP, HDF5, Parquet) and multi-sensor time alignment.
- Data-quality tooling (Great Expectations) and infrastructure as code (Terraform).
Email us at eldaniz@episodeint.com.