August 14
🏢 In-office - Manhattan
• Develop and implement strategies to improve system reliability, including monitoring, alerting, and incident response • Design and build automated infrastructure solutions using tools such as Terraform, Ansible, or other IaC tools • Analyze system performance and implement improvements to ensure our services are fast and efficient • Work closely with software engineers to ensure new features and services are designed with reliability in mind • Identify areas for improvement in our systems and processes and drive initiatives to address them
• 8+ years of experience in a Site Reliability Engineering or similar role • Proficiency in at least one programming language (e.g., Python) and experience with infrastructure as code tools (e.g., Terraform, Ansible) • Experience leading SRE teams and federating DevOps work out to Software Engineering teams • Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack) • Strong experience with cloud platforms, especially AWS • Strong analytical and problem-solving skills, with a focus on root cause analysis • Excellent communication skills, with the ability to explain complex technical concepts to both technical and non-technical audiences • Proven ability to work collaboratively in a cross-functional team environment • Experience with designing and implementing scalable systems • Knowledge of security best practices and experience implementing security measures
• Base salary range of $190,000 – $230,000 plus equity, depending on experience • Group medical and dental insurance • Flexible vacation/PTO policy • 401k • Flexible office culture, with team members remote and working from our offices in New York City
Apply NowJuly 30
March 12