Production Engineer
Position Summary:
We are currently seeking a Site Reliability Engineer to join our team. In this role you will contribute to the reliability and enhancement of the technology engine that powers multiple Pivotree solutions.
The primary function of this role is the direct responsibility for the availability of platform solutions, focusing on several key areas, including availability, performance, change management, monitoring and emergency response. You will work with other members of the platform, solutions, operations, and application teams to understand and ultimately address changing and evolving requirements through extending and exposing capabilities in a simple and consistent fashion. You will be a member of a team who maintains expertise with Utility Computing services and will advise management and the organization as a whole on this mode of computing.
You will…
- Contribute to ensuring pooled and independent utility services are highly available
- Actively take part and initiate continuous improvement: measure and reduce manual tasks and overhead
- Be a subject matter expert for Utility Computing providers and respective services both existing and emerging - with particular focus on AWS
- Complete systems development, administration, and engineering tasks including integration, documentation and testing
- Develop and maintain tools, processes, and workflows for automated infrastructure resource(s) and application deployment, configuration management & maintenance
- Own the responsibility for platform management, supporting services, and all related tooling and automation
- Investigate and troubleshoot relevant platform-based issues and incidents, (high availability, performance, security, etc.)
- Participate in recurring stand-ups with other team members located in different locations and time zones
- Participate in on-call rotation, escalations, and shift work (generally Monday to Friday, Wednesday to Sunday)
- Work with other team members to improve processes and advance relevant and related competencies
You are…
- Super comfortable with Linux (RHEL-based / Debian-based)
- Experienced with supporting software development teams and workflows
- A team player, one that recognizes the power of Agile and team based delivery
- Well versed in infrastructure & application monitoring, logging, and tracing
- Able to effectively decompose problems into workable chunks
- Experienced at working on large projects with deadlines
- Committed to high quality and attention to detail
- Focused and committed to delivering high quality services
- A strategic thinker who is able to link business and technical objectives
- Someone that can go wide and deep, who work with several disparate systems and services and ultimately acquires expert knowledge and who can navigate accordingly
Qualifications (Must have)
- Minimum one Associate-level Amazon AWS certification, or will achieve this within 3 months
- A mature understanding of and lots of experience with infrastructure-as-code concepts and practices
- 2+ years - working with tools to support version control, build automation and automated testing (e.g. the usual suspects... Git, Jenkins, TravisCI, Selenium, etc.)
- 1+ years - production experience operating container and container orchestration technologies (ideally Docker and Kubernetes / managed Kubernetes service)
- 2+ years - infrastructure lifecycle management with tooling such as AWS CloudFormation, HashiCorp Terraform, or similar
- 2+ years - monitoring system performance
- 2+ years - implementing and maintaining security and compliance for all aspects of system and components where possible
- 2+ years - implementation and operating experience in respectable scale API-driven production environments on AWS
- 3+ years - system administration experience (OS, network, storage, virtualization management, etc.) in challenging production environments and have associated war stories
- 3+ years - Debian-based and RHEL-based Linux
- 3+ years - web service, application, middleware, and database support
- Exceptional communication skills and are able to convey decisions and ideas in a clear and concise manner
- The ability to work independently as well as collaboratively
- The ability to learn and adapt to new and overlapping technologies quickly and independently, and to formulate and implement standards, procedures and best practices
- The ability to think in systems
- Working experience with the likes of Python, Bash, or similar to extend and increase efficiencies
NICE TO HAVE
- Experience and/or exposure to the Serverless Framework
- Experience with APM tools such as AppDynamics, NewRelic or Dynatrace, Amazon X-Ray
- Experience with the following Amazon AWS services in a production environment (API Gateway, Cognito, DynamoDB, ECS, EMR, Lambda)
- AWS Certified Developer
AWS Certified SysOps Administrator
Autres détails
- Famille d'emplois Engineering and Architecture
- Fonction professionnelle Professional
- Type de paie Salaire
- Bangalore, Karnataka, Inde
- Bombay, Maharashtra, Inde
- Mysore, Karnataka, Inde