August 13
🔄 Hybrid – Manhattan
Airflow
Apache
AWS
Azure
Cassandra
Cloud
Distributed Systems
Docker
Google Cloud Platform
Hadoop
Python
Scala
Spark
Go
• Build large-scale batch and real-time data pipelines with frameworks like Scio and Spark on Google Cloud Platform • Leverage best practices in continuous integration and delivery • Drive optimisation, testing, and tooling to improve data quality • Collaborate with other Software Engineers, ML Engineers, Data Scientists, and stakeholders • Create and maintain metrics datasets and dashboards for data-driven decisions • Work on machine learning projects for personalized user experiences
• Professional data engineering experience working with high volume, heterogeneous data, preferably with distributed systems such as Hadoop, BigTable, Cassandra, GCP, AWS or Azure • Proficient in Scala and willing to share knowledge • Experience with higher-level JVM-based data processing frameworks like Beam, Dataflow, Crunch, Scalding, Storm, Spark, Flink • Familiarity with Docker, Luigi, Airflow, or similar tools • Passionate about clean code and experience in building data pipelines • Commitment to agile software processes, data-driven development, reliability, and responsible experimentation • Value collaboration and partnership within teams
Apply Now