Arize AI is a machine learning observability platform for ML practitioners to detect and troubleshoot model issues
3 days ago
🏡 Remote – New York
Arize AI is a machine learning observability platform for ML practitioners to detect and troubleshoot model issues
• Work hands-on with the infrastructure that supports our distributed & highly scalable services in both SaaS and on-prem offerings • Gather requirements from customers and adapt manifests and software to support new environments • Use and augment monitoring tools to observe platform health, ensure performance and reliability • Interact with the product team to test new features and package new on-prem releases • Automate and optimize the release pipeline to make it as frictionless as possible • Exhibit continuous curiosity for emerging technology that could solve our challenges
• 1-2+ years experience in site reliability engineering, DevOps, and system administration • CS (preferred) or other technical degree, or equivalent practical experience • Experience working with DevOps tools such as Kubernetes, Terraform, Ansible, Puppet and Chef • Proficiency with scripting languages such as Python and bash • Experience managing cloud infrastructure in AWS, GCP, and/or Azure • Expertise in Linux administration, configuration, and networking protocols • Experience with on-prem deployment architectures (Bonus) • Experience running a 24x7 SaaS platform with defined SLI, SLO, SLA (Bonus) • Familiarity with operating machine learning & AI applications (Bonus)
• medical • dental • vision • 401(k) plan • unlimited paid time off • generous parental leave plan • mental and wellness support
Apply Now