DevOps Engineer Beginner to Expert

A comprehensive roadmap to master DevOps engineering from Linux fundamentals to advanced cloud-native and platform engineering concepts.

32 Stages
All Levels

This roadmap guides you from Linux fundamentals through to advanced platform engineering and MLOps. Each stage builds on the last work through them sequentially to develop a deep, well-rounded DevOps skill set. Mark topics as you complete them and revisit earlier stages to reinforce your foundations as you progress.

01
1

Linux Fundamentals

3 topics · 3 required
Master the Linux command line and system basics the foundation of every DevOps workflow.

File System & Navigation

Required

Understand the Linux directory tree, navigate with cd/ls/pwd, and manage files with cp/mv/rm.

Users, Groups & Permissions

Required

Manage users with useradd/usermod, set file permissions with chmod/chown, and understand sudo.

Package Management

Required

Install and update software with apt, yum/dnf, and snap. Understand package repositories.

02
2

Shell Scripting

3 topics · 2 required · 1 recommended
Automate repetitive tasks and glue tools together with Bash scripting.

Bash Basics

Required

Variables, conditionals (if/else), loops (for/while), and functions in Bash.

Text Processing

Required

Use grep, awk, sed, cut, and sort to process and transform text streams.

Cron Jobs & Scheduling

Recommended

Schedule recurring tasks using crontab and understand cron syntax.

03
3

Networking Basics

4 topics · 2 required · 2 recommended
Understand the network concepts that underpin modern infrastructure and cloud environments.

TCP/IP & DNS

Required

How IP addressing, subnets, DNS resolution, and routing work.

HTTP/HTTPS & TLS

Required

Request/response lifecycle, status codes, headers, and TLS certificate fundamentals.

Firewalls & Ports

Recommended

Configure iptables/ufw, understand inbound/outbound rules, and common service ports.

Load Balancing Concepts

Recommended

Round-robin, least-connections, and health checks at the network layer.

04
4

Version Control with Git

4 topics · 2 required · 2 recommended
Track code changes, collaborate with teams, and manage releases using Git.

Core Git Workflow

Required

init, clone, add, commit, push, pull, and status commands.

Branching & Merging

Required

Feature branches, merge vs rebase, resolving merge conflicts.

Git Workflows

Recommended

Gitflow, trunk-based development, and pull request best practices.

Tags & Releases

Recommended

Semantic versioning with git tags and using GitHub/GitLab releases.

05
5

Containerization with Docker

4 topics · 3 required · 1 recommended
Package applications and their dependencies into portable, reproducible containers.

Docker Architecture

Required

Images, containers, the Docker daemon, and the Docker Hub registry.

Writing Dockerfiles

Required

FROM, RUN, COPY, ENV, EXPOSE, CMD, and ENTRYPOINT instructions.

Docker Compose

Required

Define and run multi-container applications with docker-compose.yml.

Image Optimization

Recommended

Multi-stage builds, layer caching, .dockerignore, and minimizing image size.

06
6

Container Registries

3 topics · 1 required · 2 recommended
Store, version, and distribute container images securely.

Docker Hub

Required

Push and pull public/private images from Docker Hub.

Private Registries

Recommended

Use AWS ECR, GitHub Container Registry, or self-hosted Harbor.

Image Scanning

Recommended

Scan for vulnerabilities using Trivy or Docker Scout before deploying.

07
7

CI/CD Fundamentals

3 topics · 2 required · 1 recommended
Understand the principles of Continuous Integration and Continuous Delivery.

CI/CD Concepts

Required

The pipeline stages: build, test, lint, package, deploy, and release.

Pipeline as Code

Required

Define pipelines in YAML (GitHub Actions, GitLab CI) rather than through a UI.

Artifacts & Caching

Recommended

Cache dependencies and pass build artifacts between pipeline stages.

08
8

GitHub Actions

3 topics · 2 required · 1 recommended
Build automated workflows directly in your GitHub repository.

Workflow Syntax

Required

on, jobs, steps, uses, run the building blocks of a GitHub Actions workflow.

Reusable Workflows & Composite Actions

Recommended

DRY your pipelines by sharing logic across repositories.

Secrets & Environments

Required

Manage sensitive values, approval gates, and environment-specific variables.

09
9

GitLab CI/CD

3 topics · 1 required · 1 recommended · 1 optional
Use GitLab's built-in CI/CD system for powerful, integrated pipelines.

.gitlab-ci.yml Basics

Required

Stages, jobs, scripts, image, and before_script in GitLab CI.

GitLab Runners

Recommended

Register and configure shared and self-hosted runners for job execution.

Environments & Deployments

Optional

Track deployments per environment and use dynamic child pipelines.

10
10

Cloud Computing Fundamentals

3 topics · 3 required
Understand the core concepts shared across all major cloud providers.

Cloud Service Models

Required

IaaS vs PaaS vs SaaS understand where you manage what.

Regions, Availability Zones & Edge

Required

How cloud providers distribute infrastructure globally for availability.

Shared Responsibility Model

Required

What the cloud provider secures vs what you are responsible for.

11
11

AWS Core Services

5 topics · 4 required · 1 recommended
Get hands-on with the foundational AWS services used in most production environments.

IAM Identity & Access Management

Required

Users, groups, roles, policies, and the principle of least privilege.

EC2 & Auto Scaling

Required

Launch instances, configure AMIs, security groups, and Auto Scaling Groups.

S3 Simple Storage Service

Required

Buckets, objects, versioning, lifecycle policies, and static website hosting.

VPC Virtual Private Cloud

Required

Subnets, route tables, internet gateways, NAT gateways, and VPC peering.

RDS & Databases

Recommended

Managed relational databases, Multi-AZ deployments, and read replicas.

12
12

Infrastructure as Code Terraform

4 topics · 2 required · 2 recommended
Provision and manage cloud infrastructure declaratively using Terraform.

Terraform Core Concepts

Required

Providers, resources, data sources, variables, and outputs.

State Management

Required

Local vs remote state, terraform.tfstate, state locking with S3 + DynamoDB.

Modules

Recommended

Write reusable, composable Terraform modules and use the public registry.

Workspaces & Environments

Recommended

Manage dev/staging/prod environments using workspaces or directory isolation.

13
13

Configuration Management

3 topics · 2 required · 1 recommended
Automate server configuration and application deployment at scale.

Ansible Fundamentals

Required

Inventories, playbooks, tasks, handlers, roles, and ad-hoc commands.

Ansible Roles & Galaxy

Recommended

Structure playbooks with roles and reuse community roles from Ansible Galaxy.

Idempotency

Required

Understand why idempotent tasks are critical for reliable automation.

14
14

Kubernetes Core Concepts

4 topics · 4 required
Orchestrate containerized workloads at scale with Kubernetes.

Cluster Architecture

Required

Control plane (API server, etcd, scheduler, controller-manager) and worker nodes (kubelet, kube-proxy).

Pods, Deployments & ReplicaSets

Required

The smallest deployable unit, declarative rollouts, and replica management.

Services & Networking

Required

ClusterIP, NodePort, LoadBalancer service types, and DNS within the cluster.

ConfigMaps & Secrets

Required

Decouple configuration from container images and manage sensitive data.

15
15

Kubernetes Workloads & Storage

4 topics · 1 required · 3 recommended
Go beyond basic Deployments to run stateful apps, batch jobs, and persistent storage.

StatefulSets & DaemonSets

Recommended

Run stateful applications with stable network identities and per-node daemons.

Persistent Volumes & Claims

Recommended

PV, PVC, StorageClasses, and dynamic provisioning for stateful data.

Jobs & CronJobs

Recommended

Run batch tasks and scheduled workloads inside a cluster.

Resource Requests & Limits

Required

Set CPU and memory requests/limits to ensure fair scheduling and stability.

16
16

Kubernetes Advanced Operations

4 topics · 2 required · 2 recommended
Operate and extend Kubernetes clusters at production scale.

Ingress & Ingress Controllers

Required

Expose HTTP/S routes with NGINX or Traefik ingress controllers and TLS termination.

Horizontal & Vertical Pod Autoscaling

Recommended

HPA based on CPU/custom metrics and VPA for right-sizing resource requests.

RBAC

Required

Role-Based Access Control ClusterRoles, Roles, RoleBindings, and ServiceAccounts.

Network Policies

Recommended

Restrict pod-to-pod communication using Kubernetes Network Policies.

17
17

Helm Kubernetes Package Manager

4 topics · 2 required · 1 recommended · 1 optional
Template and package Kubernetes manifests for repeatable, versioned deployments.

Helm Chart Structure

Required

Chart.yaml, values.yaml, templates, helpers, and the _helpers.tpl file.

Templating with Go Templates

Required

Use {{ .Values }}, conditionals, loops, and named templates in Helm.

Chart Repositories & OCI Registries

Recommended

Host charts on GitHub Pages, Artifact Hub, or push to OCI-compatible registries.

Helm Hooks & Tests

Optional

Run pre/post-install jobs and validate deployments with helm test.

18
18

GitOps

3 topics · 1 required · 1 recommended · 1 optional
Use Git as the single source of truth for declarative infrastructure and application state.

GitOps Principles

Required

Declarative config, versioned history, automated reconciliation, and self-healing.

ArgoCD

Recommended

Deploy and sync Kubernetes manifests automatically from a Git repository with ArgoCD.

Flux CD

Optional

CNCF-graduated GitOps toolkit for continuous delivery to Kubernetes.

19
19

Observability Logging

3 topics · 1 required · 2 recommended
Collect, aggregate, and search logs from applications and infrastructure.

Structured Logging

Required

JSON log formats, log levels, correlation IDs, and log context best practices.

ELK / EFK Stack

Recommended

Collect with Fluentd/Filebeat, store in Elasticsearch, visualize in Kibana.

Loki & Grafana

Recommended

Lightweight log aggregation with Loki, queried via LogQL in Grafana.

20
20

Observability Metrics

4 topics · 2 required · 2 recommended
Collect time-series metrics to understand system health and performance trends.

Prometheus

Required

Scrape metrics with Prometheus, write PromQL queries, and configure alerting rules.

Grafana Dashboards

Required

Build dashboards, panels, and variables to visualize Prometheus metrics.

Exporters

Recommended

node_exporter, kube-state-metrics, blackbox_exporter for infra and app metrics.

Alertmanager

Recommended

Route, deduplicate, and silence alerts; integrate with PagerDuty, Slack, and email.

21
21

Observability Tracing

3 topics · 1 required · 2 recommended
Track requests as they flow through distributed microservices.

Distributed Tracing Concepts

Required

Traces, spans, context propagation, and the OpenTelemetry data model.

Jaeger & Tempo

Recommended

Deploy Jaeger or Grafana Tempo as a tracing backend and query trace data.

Instrumenting Applications

Recommended

Add OpenTelemetry SDKs to Node.js, Python, and Go services.

22
22

Security DevSecOps

4 topics · 4 required
Shift security left and integrate it into every stage of the delivery pipeline.

Static Application Security Testing (SAST)

Required

Scan source code for vulnerabilities using Semgrep, SonarQube, or Bandit.

Dependency Scanning (SCA)

Required

Detect vulnerable third-party packages using Dependabot, Snyk, or OWASP Dependency-Check.

Container Image Scanning

Required

Scan Docker images for CVEs with Trivy or Grype in your CI pipeline.

Secrets Detection

Required

Prevent credentials from reaching Git with detect-secrets, GitGuardian, or gitleaks.

23
23

Security Cloud & Kubernetes Hardening

4 topics · 1 required · 3 recommended
Apply security best practices to cloud workloads and Kubernetes clusters.

CIS Benchmarks

Recommended

Apply CIS benchmarks for Linux, Docker, and Kubernetes to harden configurations.

Pod Security Standards

Required

Enforce privileged, baseline, and restricted pod policies using PSA/PSP.

OPA / Kyverno

Recommended

Define and enforce admission policies in Kubernetes with Open Policy Agent or Kyverno.

Secrets Management

Recommended

Store and inject secrets using HashiCorp Vault or AWS Secrets Manager.

24
24

Service Mesh

3 topics · 1 recommended · 2 optional
Manage service-to-service communication, security, and observability with a mesh layer.

Service Mesh Concepts

Recommended

Sidecar proxy pattern, control plane vs data plane, and mTLS between services.

Istio

Optional

Traffic management, circuit breaking, retries, and observability with Istio.

Linkerd

Optional

Lightweight CNCF service mesh focused on simplicity and low resource overhead.

25
25

Infrastructure Testing

3 topics · 3 recommended
Validate your infrastructure code with automated tests before it reaches production.

Terraform Testing

Recommended

Use Terratest or terraform test to write unit and integration tests for modules.

Kitchen-Terraform / Checkov

Recommended

Policy-as-code compliance scanning for Terraform with Checkov or tfsec.

Kubernetes Manifest Testing

Recommended

Lint and validate manifests with kubeval, kubeconform, and Polaris.

26
26

Cost Optimization

4 topics · 1 required · 2 recommended · 1 optional
Understand and reduce cloud spending without sacrificing reliability or performance.

Cloud Cost Visibility

Required

AWS Cost Explorer, tagging strategies, and budget alerts.

Right-sizing & Reserved Instances

Recommended

Match instance types to workload needs and use Savings Plans or Reserved Instances.

Spot Instances & Preemptible VMs

Recommended

Run fault-tolerant workloads on Spot/Preemptible instances for up to 90% savings.

FinOps Practices

Optional

Cross-team cost accountability, showback/chargeback models, and FinOps Foundation principles.

27
27

Disaster Recovery & Backup

3 topics · 2 required · 1 recommended
Design systems to survive failures and recover from data loss or outages.

RTO & RPO

Required

Define Recovery Time Objective and Recovery Point Objective for each service.

Backup Strategies

Required

Automated backups for databases, volumes, and object storage with retention policies.

Multi-Region & Cross-Zone Architecture

Recommended

Design for availability zone and region failure using active-active or active-passive patterns.

28
28

Site Reliability Engineering (SRE)

4 topics · 2 required · 1 recommended · 1 optional
Apply SRE principles to balance reliability with the speed of software delivery.

SLIs, SLOs & SLAs

Required

Define measurable reliability targets and track them with error budgets.

Error Budgets

Recommended

Use error budgets to make data-driven decisions about feature releases vs reliability work.

Incident Management

Required

Runbooks, on-call rotations, post-mortems, and blameless culture.

Chaos Engineering

Optional

Proactively test system resilience with tools like Chaos Monkey, LitmusChaos, or AWS FIS.

29
29

Platform Engineering

3 topics · 1 recommended · 2 optional
Build Internal Developer Platforms (IDPs) that improve developer experience and productivity.

Internal Developer Platforms

Recommended

Concepts behind IDPs: golden paths, self-service infrastructure, and platform teams.

Backstage

Optional

Deploy Spotify Backstage as a developer portal with a software catalog and TechDocs.

Crossplane

Optional

Provision cloud resources from Kubernetes using Crossplane Compositions and XRDs.

30
30

Advanced Kubernetes Patterns

4 topics · 2 recommended · 2 optional
Extend and customize Kubernetes with operators, webhooks, and advanced scheduling.

Custom Resource Definitions (CRDs)

Recommended

Extend the Kubernetes API with your own resource types.

Operators

Optional

Encode operational knowledge as Kubernetes controllers using Operator SDK or Kubebuilder.

Admission Webhooks

Optional

Mutating and validating admission webhooks for policy enforcement and injection.

Advanced Scheduling

Recommended

Node affinity, taints/tolerations, topology spread constraints, and priority classes.

31
31

Multi-Cloud & Hybrid Cloud

3 topics · 2 recommended · 1 optional
Design and operate workloads that span multiple cloud providers or on-premises environments.

Multi-Cloud Strategy

Recommended

Evaluate use cases for multi-cloud vs single-cloud: portability, vendor lock-in, and compliance.

Terraform Multi-Provider

Recommended

Manage AWS, GCP, and Azure resources within a single Terraform configuration.

Federated Kubernetes

Optional

Manage workloads across multiple clusters with Cluster API or Google Anthos.

32
32

AI & MLOps Foundations

3 topics · 3 optional
Understand how DevOps principles extend into the machine learning lifecycle.

MLOps Concepts

Optional

CI/CD for ML models, experiment tracking, model registries, and feature stores.

Kubeflow Pipelines

Optional

Orchestrate ML workflows on Kubernetes using Kubeflow.

GPU Workloads on Kubernetes

Optional

Configure NVIDIA device plugins and resource limits for GPU-accelerated pods.

Discuss this Roadmap

Related Posts

You might also enjoy

Check out some of our other posts on similar topics

Release Engineer Beginner to Expert

This roadmap takes you from release engineering principles and version control mastery through to advanced GitOps patterns and multi-account AWS delivery at scale. Each stage builds on the last treat

Site Reliability Engineer Beginner to Expert

This roadmap takes you from the fundamentals of Linux and systems thinking through to advanced observability, chaos engineering, and SRE organisational culture. Each stage builds on the last master re

Solutions Architect Beginner to Expert

This roadmap guides you from cloud fundamentals through to professional-level AWS solutions architecture. Each stage builds on the last master the foundations before tackling advanced networking, secu

JavaScript Beginner to Expert

This roadmap guides you through the complete JavaScript journey from writing your first variable to architecting production-grade applications on the frontend and backend. Work through each stage sequ

4 related posts