Site Reliability Engineer - Associate
Req #: 190082056
Location: Hyderabad, TG, IN
Job Category: Technology
Responsibilities:
- Develop, test and debug automated tasks (Apps, Systems, Infrastructure)
- Troubleshoot priority incidents, facilitate blameless post-mortems
- Work with development teams throughout the software life cycle ensuring sustainable software releases
- Perform analytics on previous incidents and usage patterns to better predict issues and take proactive actions
- Build and drive adoption for greater self-healing and resiliency patterns
- Lead and participate in performance tests; identify bottlenecks, opportunities for optimization, and capacity demands
- Participate in the 24x7 support coverage as needed
- Implementing CI/CD solutions using tools like Ansible / Kubernetes/ Docker / Jenkins etc.
- Identifying and implementing tools across Monitoring, Telemetry, Operations/Process Automation
- Implement Site Reliability Metrics (SLI, SLO, Error Budgets etc ) for products
This role requires a wide variety of strengths and capabilities, including:
- Bachelor’s degree or equivalent experience in an software engineering discipline
- Mastery in at least two or more software languages (e.g. Python, Java, Go, etc.) with respect to designing, coding, testing , and software delivery
- Adept in the development of automated tools, systems, and services in multiple technology domains
- Advanced knowledge of one or more infrastructure components (e.g. networking, cloud services, orchestration tools, containerization, compute and storage systems)
- Proficiency in service-level changes to a system and troubleshooting components
- Knowledge administering application servers, web servers, and databases would be desirable (Tomcat, WebSphere, Nginx, Microsoft IIS Message Queues etc.)
- Experience in Security testing & tools would be an added advantage.
- Good working knowledge of Cloud Engineering would be desirable.