April 14, 2020 · devops site reliability engineering

My Devops Engineer vs SRE aha moment !

It was the hot and sweltering month of June 2018. The weather in Singapore is consistently warm and humid except for few months of winter. I was at the Grand Copthorne hotel to attend the second year of SRECon Singapore event. It was an expensive event with tickets being sold at US$850. I so wanted to attend this event, had a couple of trainings to deliver in Singapore and was planning to align it with this event. Being an individual (self sponsored) I requested the organisers, USENIX, to offer a discounted priced and they had obliged to it. And here I was, in the queue to get my entry ticket at the event. I later realised I was the only one who spent money out of my own pocket, everyone else being sponsored by their respective companies :)

The reason I wanted to spend that money and be there was to really understand whats this Site Reliability Engineering (SRE) fuss was all about. So far I wan under the impression that Devops Engineers and SRE are just two different names for the same positions. Even today a lot of people do think that way. And that was my time for the realisation, no its not the same. Here are some key differences I realised based my experience on Devops and what I saw at the SRECon.

1. SRE is a practices are dominated by only the few companies of the world. SRECon as well as the speakers were from only a handful companies such as Facebook, Google, LinkedIn, Microsoft and yes Baidu. Thats when I realized the importance of the word SITE in SRE. To be a SRE, you must be managing a high traffic Site. The problems that you see at scale are entirely of a different league than just writing automation code and building pipelines.

2. The number one feature of a high performance site is RELIABILITY. Thats the second word in SRE. There was a lot of talk about how to use certain practices, principles and technologies to make your site reliable. There were even a bunch of use cases from the talented folks from these top companies of the world presented, and nothing better than here it from the horse's mouth.

3. I had only heard of the term "taking engineering approach to site reliability". Listening about these topics and interacting with all these super awesome folks gave me the real meaning of what this means. The failure do happen and would happen in future too. Its about how you approach those is what makes the difference. SREs look at the failure, analyse those and if its something that is expected to happen again, see what could be done to avoid it in future. It would be automating the task, setting up the right monitoring metric so you are alerted in advance, ensuring the systems heal automatically, adding a redundancy etc.

So even though there are a lot of common practices between the Devops Engineers and SREs, the real difference comes from managing a production infrastructure at a fairly large scale is what makes SREs those elite breed of operations personnels. And to ensure you are capable of managing things at that scale, you need expertise and experience both at your hands.

And thats the reason, even though there is an increasing trend to call everyone who implements Devops are SRE, I would categorise SREs as being select elite breed of devops ninjas who have high level of expertise and experience both to qualify. And if you are planning to make a career in Devops, do aim to be that SRE.

If you like this story and are not part of my tribe of devops ninjas do join in here https://www.facebook.com/groups/devopstribe. This is my private community where I post about devops articles, courses and the useful resources before anywhere else.

  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pinterest
  • Pocket