Site Reliability Engineer jobs in Pune at Blueshift

About Us

Blueshift is a venture-funded startup headquartered in San Francisco. Our AI-Powered marketing platform empowers cutting edge B2C brands to drive 1:1 marketing on every channel. With Blueshift, marketers are in full control of automating various forms of personalized messaging across every engagement channel.

Blueshift is trusted by leading digital brands like Udacity, LendingTree, BBC, and Paypal to automate their customer engagement marketing and recognized by Gartner as a 'Cool Vendor for AI in Marketing'.

Blueshift is founded by repeat entrepreneurs who previously built Mertado.com (acquired by Groupon to become Groupon Goods),and were part of the early team behind Kosmix (acquired by Walmart to become @WalmartLabs). We are backed by top-tier VCs including Nexus Venture Partners, Storm Ventures, Luma Partners, and SoftBank Venture Asia.

Blueshift has now started staffing a new development center in Pune, India. As part of Blueshift, you will get to work on cutting-edge technologies including machine learning, artificial intelligence, big data, and large-scale distributed data systems. This is an exciting opportunity for motivated individuals to build a great career.

Site Reliability Engineer / Cloud Operations Engineer

We are looking for a SRE / CloudOps Engineer who will be responsible for managing, building and scaling our 1000+ node setup, that processes millions of real-time events and personalizations daily. We are looking for candidates with prior infrastructure experience, ideally in a startup or other fast paced environment.

Responsibilities

On-call duties to provide application support, incident management, and troubleshooting
Improve reliability and drive down the burden of toil with tooling and automation
Analyze complex systems from a reliability, resilience, and performance perspective
Identify sources of instability in large-scale distributed systems and drive operational excellence
Hands on implementation and management of complex virtualized environments
Implement scale-up / scale-down strategies based on various utilization metrics
Author incident reports by coordinating with multiple engineering teams
Identify and fill gaps in the monitoring & alerting system
Periodic reporting of system status to the organization

Requirements

5+ years of relevant industry experience
Prior hands-on experience with managing AWS and cloud infrastructure scaling to hundreds of nodes
Experience with managing a container orchestration system
Deep understanding of large scale data systems and data pipelines including managing NoSQL, SQL and HDFS/Hadoop clusters
Experience with modern SRE practices & tools
Hands-on experience with active incident management
Willingness & ability to work in night shifts

Perks And Benefits

Opportunity to be part of the early team in India
Competitive salary along with stock option grants
Excellent hospitalisation, personal accident, and term insurance coverage
Located in a top-notch facility in Baner - one of the best neighbourhoods for tech startups
Daily catered breakfast, lunch, and snacks along with well-stocked pantry