I recently listened to a podcast from the folks over at Packetpushers/Datanauts titled “SRE Vs. Cloud Native Vs. DevOps” and it was a great listen. The show featured Rob Hirschfeld (of RackN / Digital Rebar), who lives and works in this space, to get his thoughts on the terminologies and how they relate to one another (or do not relate). He talked about the differences and similarities between these terms and the one topic that stood out the most for me was that ops is only now starting to catch up to developers in terms of workflow and tooling, and how Google brought parity between developers and ops by creating Site Reliability Engineering (SRE) which allowed ops and development to work more closely in creating, shipping, and maintaining code. I have often said that servers and network devices are just a meaningless jumble of electronics without software to run on top of them, so it makes sense for ops and software developers to work together to make the end product better.
One of the reasons I have always enjoyed working “in tech” is the ability to work and play with new technologies. However, the period we are in now, I believe, is one of the more transformative times since the web took off in the late 1990s. For years we have cobbled our systems together with shell scripts and the like, but not really achieving any truly substantive gains. We still spent way too much time at the console, hand configuring knobs and dials, resulting in snowflake infrastructure and thus spending too much time putting out fires. However, our counterparts, the developers and software engineers, were constantly improving their processes, and building new tooling around these development processes. Version control (Subversion, Git, etc.), “lean” workflows, Scrum, Kanban, Agile – many ways to improve the “Software Development Lifecycle (SDLC)”. On the ops side of the yard, however, we languished with the console and our scripts to manage our processes and systems. There did exist some tooling, but it required a lot of wrangling to get it working and was often not very reliable when scaled.
That is changing, however, albeit slowly. With the rise of “cloud computing” we had to find better ways of managing this menagerie of systems spread amongst on-premise datacenters and infrastructure as a service (IaaS) offerings such as AWS, Azure, and GCP. Also, it was in the interest of the IaaS companies (Google, Amazon, Facebook) to find ways to bring up infrastructure faster and automate as much as possible. While certain tooling has been around for quite some time (think CFengine), I believe we are starting to see a maturity in the marketplace as well as a plethora of choices as some of these aforementioned web-scale companies released their internal tooling to the open-source community. This ultimately leads to innovation and refinement of the tooling that ops has at its disposal.
For years ops have often been one of the major bottlenecks in the technical realm, taking weeks (or months – blech) to bring up servers or networks. While this was often unavoidable as we lacked the resources and tooling, the situation was not getting much better. Sure, some companies hired more ops engineers, but most did not. Even with more engineers, this usually only resulted in spreading the firefighting duties. A new paradigm was needed, and whatever you call it – DevOps, SRE – this paradigm shift had to become a culture, not just a job role. Ops needs to work more closely with development and developers need to work more closely with ops. Many call this silo busting, but it makes sense and allows companies to move faster with the limited staffing and tooling available. But everyone at the company has to buy in for it to really work.
This is why I am excited. I am completely on board with the DevOps/SRE mindset and culture. As the adage says, “none of us is as smart as all of us” and I believe that. No more “throwing tasks over the cubicle walls”. Sometimes SSH’ing into a server to troubleshoot or tweaking some knobs can be fun, but it does not scale and certainly does not make anyone’s job easier. Tooling such as Ansible, Puppet, Terraform, containers gives us immense abilities to improve our workflows and processes so that we can deliver a better product in a more timely fashion – and not be that bottleneck we once were.
So join me for the evolution!