Scaling Systems, Shaping Teams: Inside the DevOps Journey of Ashish Upadhyay
Scaling Systems, Shaping Teams: Inside the DevOps Journey of Ashish Upadhyay
We regularly spotlight emerging technologists whose work is shaping the next wave of enterprise innovation. In that search, Ashish Upadhyay stood out for his unique blend of hands-on engineering experience and the impact he has created early in his career across cloud, DevOps, and automation. His work modernizing legacy systems, optimizing large-scale ML workloads, and driving infrastructure efficiency reflects the kind of real-world transformation our readers care about. We invited Ashish to share his journey, insights, and lessons learned from the frontlines of DevOps and cloud engineering.
1.You’ve worked across a variety of companies and projects. What first drew you to DevOps and cloud engineering?
My interest in DevOps and cloud engineering was sparked right after my graduation and shaped further during the 2020 pandemic. In particular, I was curious to know how companies like Netflix and Meta not only survived but thrived throughout the pandemic. They were scaling to serve millions of users and were able to do so without any noticeable downtime.
It quickly became apparent that their resilience and ability to scale were supported by solid DevOps practices and cloud-native architectures. These structures were not mere afterthoughts, but the backbone of global software delivery. I wanted to learn more about the CI/CD pipelines, automated infrastructure, fault-tolerant systems, and real-time agile monitoring. That curiosity shaped my career and set me on the DevOps path.
2.You’ve worked on automating deployments using tools like Terraform and Ansible. From that, what were some of the biggest wins and pitfalls during that process?
The wins were immediate and significant,each addressing a critical aspect of delivery and operations:
- Consistency: With automation in place, all manual process variability was eliminated. Staging, QA, and production environments received identical provisioning. Issues related to “it works on my machine” are now avoided.
- Velocity: Setting up fresh environments that previously took days now is done in mere minutes. This has proven to be a significant boost in development speed.
- Cost Efficiency: Paying only for used resources due to automated resource cleanup was essential in cloud environments. Idle resources, left unattended, cause a massive surge in cloud billing.
Despite all benefits, the following issues arose early on:
- State Management:Terraform’s concurrent run capability has proven to be a double edged sword. Different team members running concurrent sessions sometimes ended up corrupting the state file. This forced the implementation of state locking mechanisms and strict workflows to prevent that.
- Idempotency: Balanced Ansible playbooks that avoided needless alterations on repeated executions were surprisingly difficult to craft. Unpredictable states were a real threat due to poor idempotency.
- Scripting, not Engineering: Automating flawed workflows only accelerates the flaws. It is critical to first streamline the foundational structure before attempting to automate.
Robotic automation requires strong architectural practices, as devoid of them only poor processes are automated and intensified.
3.What’s your thought process when deciding between containerized solutions and serverless architectures?
I began with the workload characteristics and the operational needs of the business.
Containers have distinct advantages for certain workloads. They are a good fit for long-running predictable workloads, for example, for web servers, baking databases, and microservices. They provide portability across cloud providers, and have less operational overhead, enable fine-grained control for deployment strategies, but have a cost model that is less granular and will charge for compute resources even when containers are idle.
Serverless is best for short-lived and event driven tasks with a defined endpoint (image processing or scheduled jobs). Because of minimal operational overhead, serverless is often cost-effective with a "pay-per-execution pricing model." Vendors do, however, run the risk of getting locked in, and for workloads that are consistently running, serverless can end up being more expensive than containers.
In practice though, I often use a hybrid approach where serverless is used as an auxiliary for event-driven tasks while the core application runs on containers. This maximizes control while minimizing cost and scaling factors.
4.You've led technical sessions and mentored engineers, on the basis of that experience what’s your approach to helping others grow in cloud and DevOps roles?
Active mentorship is a highly fulfilling experience for me and is beneficial for multitaskers as well. It aligns well with my approach, which is built on three pillars:
- Empowerment: Providing a challenge, rather than a guiding, step-by-step analysis to engineers is much more effective. This promotes innovation, which is essential in DevOps, where no two problems are identical.
- Learning by Doing:Teaching engineers in controlled, isolated environments boosts confidence and sharpens skills much more than any theoretical teaching.
- Safe Space for Failure: While many companies punish failure, I encourage teams to take risks and see failure as a failure. Lessons taught by a failed deployment or a misconfigured pipeline are much more effective than those taught by a clean, flawless run.
Giving engineers the ability to take risks means they will become more confident in their skills, which leads to rapid development in their skills and ability to make deep, heavy, independent decisions.
5.Outside of your technical work, you’ve got a curiosity for space and the universe has that influenced how you think about scale or systems design?
Yes, for sure. Thinking about space and the universe changes one's perspective on scale. One tends to think of systems that have to scale beyond a thousand times. Such an approach avoids backward looking walls that a structure may hit during unexpected growth hurdles.
Space exploration brings out the importance of resilience. In space missions, redundancy and fault tolerance are absolutely critical, because when something goes wrong, the mission is lost. I apply the same base principle onto systems design—assume failures will happen and build disaster recovery, high availability, and failover systems from the get-go.
It is about over designing, adaptive design, recovery, and continuing to operate under extreme unpredictable conditions.