WHAT DOES A CLOUD OPERATIONS ENGINEER DO?

Published: July 23, 2024 - The Cloud Operations Engineer collaborates with developers in agile SCRUM frameworks and manages AWS solution architectures. This role focuses on optimizing cloud services for performance, security, and cost-effectiveness, while also providing advanced monitoring dashboards. Responsibilities also include supporting product lifecycle management and validating new releases with the QA team.

A Review of Professional Skills and Functions for Cloud Operations Engineer

1. Cloud Operations Engineer Duties

  • Cloud Services Participation: Be an active participant in all Cloud Services Operations activities.
  • Incident Management: Incident resolution, alert response, event management, support requests, proactive maintenance, provisioning, and deployment of new systems.
  • Operational Workflow Implementation: Responsible for implementing defined workflow, escalations, and communications about operational status.
  • Incident Troubleshooting: Troubleshoots incident tickets with staff and customers to maintain satisfaction levels.
  • Policy Implementation: Implements policies and procedures as defined by the team lead.
  • Project Resource Assistance: Acts as project resources and assists with project communications.
  • Proactive Monitoring: Proactively monitor and respond to alerts on Dialpad's telephony and infrastructure platforms by executing standard resolution measures from the Incident Playbook.
  • Incident Ownership: Identify when monitoring and alerts indicate incidents and assume ownership of the closure of these incidents.
  • Incident Communication: Effectively communicate the status of incidents to different stakeholders like Business, Customer Support, etc.
  • Engineering Collaboration: Work with Engineering teams to quickly resolve the incidents.
  • Ticket Creation: Create tickets when incidents uncover a systemic problem in the platform and assign the tickets to the right stakeholder.

2. Cloud Operations Engineer Details

  • Agile SCRUM Participation: Interface with the development team in agile SCRUM operation and participate in stand-ups.
  • AWS Architecture: Architecture and maintenance of AWS solutions.
  • AWS Expertise: Excellent knowledge of AWS best practices and service offerings.
  • Cloud Service Optimization: Identify and implement cloud services with best practices around reliability, availability, performance, scalability, security, and cost-effectiveness.
  • Continuous Integration Assistance: Assist in Continuous Integration with Git or other repositories for managing the automation or templates used in creating and managing the environment.
  • Dashboard Solutions Provision: Provide dashboard solutions with alerting and escalations for monitoring the health, availability, and quality of the service.
  • Product Lifecycle Management: Manage all lifecycle stages for a product from ideation through sunset, with responsibility for ensuring that deliverables are understood and releases are delivered on time.
  • Feedback Loop Engagement: Part of the feedback loop to iteratively improve and optimize the monitors, alerts, and standard procedures in the Incident playbook.
  • QA Support: Support the QA team as and when needed to validate new releases.

3. Cloud Operations Engineer Responsibilities

  • Hybrid Cloud Development: Design and develop mobile networking platforms which run as a hybrid cloud model (public cloud and local appliances).
  • Software Integration: Integrate various software components to form a full and concrete solution.
  • Cloud-native Automation: Integrate and automate the applications with Cloud-native tools such as AWS CloudFormation.
  • Cloud Technology Adaptation: Expected to learn and use the latest Cloud technologies for an efficient and best-practice automated deployment.
  • Project Leadership: Lead implementation projects from application design to application development.
  • Team Contribution: Work, be part of, and contribute to a team that has won multiple top awards in the telco industry.
  • Technical Coordination: Attend and coordinate regular meetings to discuss technical solutions, and decide with the rest of the team the timeline and project milestones.
  • Solution Presentation: Propose solutions and present work to other teams and stakeholders.
  • Cloud Infrastructure Leadership: Lead the design, maintenance, and implementation of a secure, scalable cloud infrastructure platform across multiple accounts.
  • Performance Optimization: Identify bottlenecks, issues, and areas of improvement for the current cloud structure.

4. Cloud Operations Engineer Accountabilities

  • Automation Tool Utilization: Work with modern automation and configuration management tools to continue to expand existing library of constructs that support the automation of application deployment and infrastructure provisioning tasks through multiple environments, from development and feature branches to production.
  • Lifecycle Participation: Participate in client and internal project lifecycles, ensuring that proper solutions are utilized that enable easy and reliable infrastructure management in an observable manner while ensuring proper security and compliance are always maintained.
  • Observability Enhancement: Continuously improve the observability of applications and infrastructure by integrating logging, testing, metrics, dashboards, and alerts into growing library of infrastructure constructs.
  • Infrastructure Troubleshooting: Work with development teams in troubleshooting/resolving infrastructure issues.
  • Solution Implementation: Learn from and implement long-term solutions to discovered gaps.
  • Self-Service Improvement: Continue to improve on an established self-service model through the creation and maintenance of supporting documentation, operational runbooks, and recipes.
  • Cloud Infrastructure Oversight: Guide the development, implementation, maintenance, and monitoring for EquipmentShare’s cloud infrastructure.
  • Cross-Functional Collaboration: Collaborate across functions and departments to develop an end product that fulfills end-user needs.
  • Technology Trend Monitoring: Stay up-to-date on evolving technology and communicate with the team on trends, finding new ideas and system improvements.

5. Cloud Operations Engineer Functions

  • Cloud Deployment Management: Design, automate, and manage a highly available and scalable cloud deployment that allows development teams to deploy and run services.
  • AWS Automation: Do extensively automated deployments and manage applications on the AWS platform.
  • Systems Development: Work closely with Engineering and Architecture teams to develop systems that focus on scalability, high-performance, and security.
  • Documentation Development: Build end-to-end documentation to ensure visibility and resiliency throughout the system.
  • Multi-cloud Support: Provide additional support for other cloud providers like IBM and Azure.
  • On-call Support: Take part in a 24x7 on-call rotation and escalation processes.
  • Linux Server Management: Install new, rebuild, and support existing Linux servers.
  • Azure Security Implementation: Implement and maintain Azure platform and security controls, including Azure policy management, alerting, and reporting.
  • Azure Cost Management: Contribute to Azure cost management, including automated cost reporting, cross-charging, cost analysis, and recommendations.
  • Azure Best Practices: Contribute to Azure management best practice, including introducing, maintaining, and troubleshooting Azure policy, access management controls, and management automation alongside the Cloud Operations team.