Cloud & DevOps Engineer — backed by 3+ years Full Stack Engineering.


Cloud Engineer who ships infrastructure, not just code.
I'm a Cloud & DevOps Engineer focused on AWS infrastructure — the kind that gets provisioned from scratch, monitored under real production load, and maintained when things go wrong at 2am. What sets me apart is 3+ years of backend engineering that came before the cloud work: building REST APIs, real-time Socket.io services, and microservices-based HRM platforms in Node.js. I don't just provision the infrastructure — I understand what's actually running on it.
That combination — infrastructure depth plus application context — is rare, and it shows up in how I work: VPC designs that fit the workload, container configurations that don't create security debt, CI/CD pipelines that the dev team actually trusts. I've provisioned 5 AWS environments from scratch, kept 8+ production applications running, and handled real incidents — including tracing and eradicating persistent crypto-mining malware from a live production system over two days of forensic work.
Production Incident
DeveloperTag · December 2025
This incident shaped how I think about container security. The build cache is not just a performance tool — it's a persistent surface that outlives containers and images, and demands the same scrutiny as running workloads.
Experience
Cloud & DevOps Engineer (AWS)
- ›Designed and maintained CI/CD pipelines using GitHub Actions integrated with AWS, fully automating build, test, and production deployments — eliminating manual SSH/SCP workflows and saving ~2–3 hours per release cycle.
- ›Deployed and managed containerised workloads on Amazon EC2 and ECS, standardising Node.js application deployments using Docker multi-stage builds with non-root execution for improved security posture.
- ›Led an active security incident response — traced persistent crypto-mining malware that survived container restarts, image deletions, and npm overrides; identified root cause as malicious layers in Docker build cache and fully eradicated the threat.
- ›Diagnosed and resolved critical CPU spikes (110% utilisation) and container OOM crashes on production; implemented rollback procedures and zero-downtime deployment strategies.
- ›Right-sized AWS infrastructure using Cost Explorer & Compute Optimizer, downgrading an over-provisioned EC2 instance post-incident and reducing ongoing cloud spend.
- ›Managed infrastructure using Terraform, defining EC2, VPC, and IAM resources as code for repeatable and auditable provisioning.
- ›Set up AWS CodePipeline alongside GitHub Actions for select workflows, integrating CodeBuild and CodeDeploy for fully managed build and deployment automation.
Cloud & DevOps Engineer
- ›Containerised Node.js applications using Docker and contributed to CI/CD workflows, improving deployment consistency and reducing manual release effort across multiple environments.
- ›Implemented secure AWS configurations including IAM roles, VPC isolation, and security group rules; integrated CloudWatch + SNS for real-time monitoring and alerting.
- ›Set up and managed Nginx reverse proxy configurations on EC2 instances to route traffic across multiple services, improving reliability and enabling zero-downtime deployments.
- ›Supported application deployments using Docker Compose multi-service stacks on AWS EC2, coordinating backend, frontend, and database containers.
- ›Used Terraform to manage infrastructure as code for AWS environments, keeping infrastructure changes version-controlled and consistent.
Software Engineer
- ›Developed and integrated RESTful APIs and a Socket.io real-time server for a multi-branch HRM platform used by thousands of users across enterprise clients.
- ›Built core HRM microservices modules: Invoice Management, Multi-Branch Access Control, Help Desk, and Bulk Import.
- ›Designed and implemented RBAC and multi-branch permission logic for complex organisational hierarchies.
- ›Collaborated with a cross-functional team using Git workflows, PR reviews, and code reviews to maintain code quality.
Full Stack & DevOps Engineer
- ›Provisioned 5 AWS environments from scratch using Terraform (IaC) across EC2, EBS, S3, ECS, ECR, and Lambda — ensuring consistent, repeatable, and secure infrastructure.
- ›Containerised and deployed applications on Amazon ECS and ECR; managed IAM users, roles, and KMS encryption policies enforcing least-privilege access control.
- ›Designed VPC architectures including subnets, Internet Gateways, route tables, security groups, and NACLs for secure, isolated multi-tier application environments.
- ›Configured CloudWatch dashboards and SNS alerts for proactive monitoring of application health and performance.
Projects
Public projects and Cloud & DevOps architecture labs — ordered by complexity.
ChatHub
Production-grade real-time chat application built with the MERN stack and Socket.io, deployed end-to-end on AWS with a CI/CD pipeline, infrastructure as code, and observability baked in from day one.
- ›Real-time messaging with Socket.io — typing indicators, online presence, read receipts
- ›JWT authentication with refresh token rotation
- ›Group chats, file uploads, message reactions
- ›Nginx reverse proxy on AWS EC2 handling all traffic on port 80
- ›GitHub Actions pipeline deploying to ECS on every push
- ›Terraform managing the entire AWS infrastructure as code
- ›Prometheus metrics for backend observability
Serverless REST API — API Gateway, Lambda, DynamoDB
Serverless backend with persistent NoSQL storage
A fully serverless REST API built on AWS — API Gateway routing to Lambda with DynamoDB as the persistent store. IAM scoped to the minimum required action on a single table ARN.
- ›Least-privilege IAM — inline policy scoped to a single table ARN and single action
- ›Lambda proxy integration with API Gateway for full request/response passthrough
- ›CORS handled at both API Gateway and Lambda level
- ›DynamoDB as the NoSQL persistent layer — no server to manage
Custom VPC — Public/Private Subnets, NAT Gateway & Bastion Host
Network isolation + secure admin access
A production-grade VPC from scratch — public and private subnets across AZs, NAT Gateway for outbound-only private traffic, and a Bastion Host as the sole SSH entry point.
- ›Security group referencing (SG-to-SG, not CIDR) — no IP whitelisting drift
- ›SSH agent forwarding through Bastion — private instances never hold keys
- ›NAT Gateway verified: outbound internet from private subnet confirmed via curl
- ›Route tables configured per subnet — public IGW, private NAT
Terraform EC2 + Nginx with cloud-init
Infrastructure as Code — zero manual steps after terraform apply
Full IaC — EC2 provisioned, Nginx installed and running, SSH locked to your IP, all from a single terraform apply. No manual steps after the command completes.
- ›SSH access restricted to var.my_ip — no default value, plan fails if unset
- ›cloud-init handles Nginx installation and startup at first boot
- ›Security group allows only port 80 (public) and port 22 (your IP only)
- ›Full teardown with terraform destroy — no orphaned resources
Application Load Balancer + EC2 Target Groups
High availability + multi-AZ traffic distribution
ALB distributing traffic across EC2 instances in multiple AZs, with security group chaining ensuring EC2 instances are never directly reachable from the internet.
- ›EC2 security group only accepts traffic from the ALB security group — never from 0.0.0.0/0
- ›Target group health checks gate traffic — unhealthy instances pulled automatically
- ›Nginx installed on EC2 via user-data at launch — no manual SSH required
- ›Multi-AZ target group for resilience across availability zones
E-Commerce Platform
Full-featured e-commerce backend with complete product lifecycle management, deployed on AWS ECS via AWS Pipeline for automated deployments.
- ›Product catalog, cart management, order processing
- ›JWT-based authentication with refresh token rotation
- ›MongoDB backend deployed on ECS via AWS Pipeline
- ›RESTful API architecture with full CRUD coverage
Docker Flask + Redis Counter
Multi-container application with persistent state
Multi-container Docker Compose setup with a Flask app and Redis backend. Redis uses AOF persistence so the counter survives container restarts.
- ›Health checks using condition: service_healthy — Flask waits for Redis to be ready
- ›Redis AOF persistence — counter state survives docker-compose down and restart
- ›Named volumes for data persistence across container lifecycle
- ›python:3.12-slim base for minimal image footprint
NGINX on EC2 with Custom Domain (Route 53)
DNS management + web serving
EC2 instance running Nginx, served at a custom domain via Route 53. Elastic IP ensures the DNS record stays valid across EC2 stop/start cycles.
- ›Elastic IP attached — DNS record never breaks on EC2 stop/start
- ›Route 53 A record pointing apex domain to the Elastic IP
- ›Nginx configured as the web server — ready for reverse proxy extension
Internal Projects
Enterprise production systems — under NDA. Architecture details available on request.
Max HRM
Internal · ProductionMonolithic multi-tenant HRM system serving thousands of enterprise users across multiple branches. Built Invoice Management, Multi-Branch Access Control, Help Desk, and Bulk Import modules with RBAC.
Max Invoice
Internal · ProductionMicroservice multi-tenant invoicing system handling financial workflows across enterprise client branches.
Max Inventory
Internal · ProductionMicroservice multi-tenant inventory management system for real-time stock tracking and control.
Max Payroll
Internal · ProductionMicroservice multi-tenant payroll processing system for complex organisational pay structures.
Skills & Stack
Key Achievements
Eradicated Persistent Crypto-Mining Malware
Traced malware surviving container restarts, image deletions, and npm overrides. Root-caused to malicious layers embedded in Docker build cache. Fully eradicated from live production without downtime at DeveloperTag.
Automated End-to-End Deployments
Replaced manual SSH/SCP → docker-compose workflows with GitHub Actions CI/CD pipelines, eliminating human error from production deployments across all release cycles.
AWS Cost Reduction via Right-Sizing
Used Cost Explorer & Compute Optimizer to identify over-provisioned infrastructure post-incident. Downgraded EC2 instance from medium to small after confirming CPU had stabilised under 1%.
Deployed & Maintained Production Infrastructure
Provisioned 5 AWS environments from scratch and maintained 8+ production systems — handling everything from VPC design to containerisation, monitoring, and rollback procedures.
Get in Touch
I build and operate cloud infrastructure for production systems. If you have an infrastructure challenge or want to talk DevOps, reach out.
Location
Lahore, Pakistan
GitHub
github.com/zainjafri4