Let’s Talk About Infrastructure Teams

Raymond B. Matovu
9 min readApr 3, 2018

A couple of weeks ago I had breakfast with a colleague working on an API development project. He asked about my experience working as a product analyst on a platform infrastructure team in an attempt to understand what value a product person could bring on an Infrastructure (“Infra”) team. Our conversation attempted to draw parallels between both our experiences but quickly turned into a rant from both sides about how there is a general lack of empathy for product folks working on technical projects. In the end, we both agreed that our experiences should be shared in the hope that we can have a wider conversation about what it takes for product manager to succeed on an Infra team let alone any technical project. In this article, I list some of the challenges I encountered during the product development phase and share some solutions that we adopted as a team to get through these murky waters. However, note that not all technical projects are the same, since different products / projects have different needs and varying context, which could both have an impact on the solutions, therefore what worked for us may not necessarily work for your team.

Up until three months ago, I worked as a product analyst on a co-sourced team of my company colleagues and client developers for a large retail client. We were developing a tool that would enable their IT Operations staff “spin” up (or “tear” down) infrastructure resources such as servers, networking resources, tech stacks (e.g. ECS), Jenkins, etc…, on-demand on Amazon Web Services (AWS). In addition, the same tool would be used by application developers to provision development environments and deploy applications that would eventually be accessible on the client’s public website. Shortly before I rolled off the team, we released a beta version of the tool in the form of a command line interface (CLI) that wraps AWS APIs. The idea was to create a tool that would abstract away the required orchestration that application and IT Operations teams would have to go through in order to provision infrastructure or deploy applications in a cloud environment.

You might be curious, why in the world would a product analyst sign up to work on an Infra team? What does an product person do in such an environment? How do you even write stories in the Infrastructure world? Well, unlike the articles I’ve read in the past, I won’t attempt to paint a rosy picture here. At first, I thought that coming from a software engineering background, having been a developer for more than 6 years, and having worked on an API development project for more than a year, this experience would be sufficient to navigate through any technical complexity required on this project. But I learned that this wasn’t the case. It felt like a foray initially, but quickly turned out to be a decent learning experience having been curious about the “downstream” end of software delivery. I spent most of the time learning about the infrastructure domain and some of the challenges that engineers on such teams face. Before we dive into the details, let’s first of all talk about what happens on infra teams.

What happens on infra teams?

Having come off the heels of a successful API development project, I imagined infrastructure automation would be similar to API development since a lot of the conversations I had with developers and subject matter experts seemed to suggest that it’s exactly the same. I was also eager to learn about the “dark side” of infrastructure and what it took to get the software that we normally build into the hands of customers. Infrastructure as Code (IAC) for those not too familiar with the phrase simply means treating infrastructure as a software system i.e. programmable, on-demand, self-service. With a large number of organizations migrating their infrastructure to the cloud, IAC has become the de-facto practice of managing this transition. What this simply means is that infrastructure has reached a level of maturity that the work that our dear old systems administrators did in the past i.e. server provisioning, networking, monitoring, etc can now largely be automated by adopting the similar practices used while building software e.g. Continuous Delivery, Continuous Integration, Test Driven Development, Release Management, etc. In case you haven’t read Kief Morris’ book with the same title, he lists some of the benefits of infrastructure automation including the ability to lower the barriers for making changes to infrastructure, thus minimize snowflakes, enabling self service and timely recovery from failure among other things.

What are some of the challenges a product person will find on an infra team?

There are a myriad of challenges any product analyst will encounter working on an infra team. Below is a list of five challenges I could relate to from my own experience and a few others I spoke to while working on the infra team.

The technical jargon

The first challenge you’re mostly likely to face as a new analyst will be coming to terms with the jargon that’s used on infra teams. You’ll need to quickly get up to speed with terms such as DevOps, cloud, VPC, Routes, CIDR Block, Docker, Container, ECS, etc. Whereas some of these particular terms may be specific to AWS, other cloud computing platforms have similar jargon to refer to the same concepts. Note that these terms are analogous to components and services that exist within a physical data center.

In the first few days of trying to craft relevant user stories, it was a challenge trying to understand what the definition of done was for each story, given that the stories were filled with all this unfamiliar jargon. I quickly invested in a personal account on AWS, watched all manner of videos, tutorials and internet resources that attempted to explain what these terms meant. It all started making sense as it helped me recall my time as a software engineer years ago when we had to deploy applications to environments in physical data centers.

Designing the user experience

Coming straight from an API development project, I learned that when designing user interactions on API projects, conversations typically center around the URL structure, what verb to use for your endpoint e.g. GET, POST, DELETE, etc., whether the endpoint should return JSON or XML and whether to make the service RESTful, etc. In the infra world, Command Line Interfaces (CLI) are the most common means of interacting with applications or systems. For example the command below would instruct AWS to list the instances that exist in the user’s account.

$ aws ec2 describe-instances

The CLI in our case was the proverbial user interface that our users would invoke to execute commands that would provision the required applications or infrastructure components. Of course once the command is executed, you’d expect an output in a format that is readable by end users, which in our case was JSON. In designing a CLI, you’ll need to think about the keywords and any input formats required. In our case, the required configuration was always stored in files which would be specified as command line input. Sometimes the keywords would include flags/switches that would enable or disable extra features. It is worth conducting some user testing on the CLI design to get some feedback from actual users of your tool.

Story slicing / breakdown

At the start of the project, we had a huge debate about how user stories could be sliced (the old vertical vs horizontal slicing adage). In my world of analysis, and I guess I speak for most analysts out there, customer journeys have a way of showcasing tangibly the value a customer is deriving from a process / tool / application. I still believe that it should be one of the starting points of any conversation you’d have about building anything, including infrastructure automation. Fortunately, at the beginning of the project, we had a requirement to “showcase working software” to business users who had no clue about what we meant by a “platform team”. This meant that we had to tailor our user stories in a way that addressed a given part of a customer journey in order to make it easier for them to understand, albeit the heavy use of jargon. To give an example of how our stories and respective acceptance criteria were structured, we’d have a narrative of a large story such as:

“As a product team developer, I want to provision an ECS environment so that I can deploy my application.”

The above example still seems vague to most business users, but it was relatively hard to argue about the actual value that was being realized from this (to a certain extent). The challenge we encountered was that; hidden behind this narrative, was a series of miniature technical tasks (or smaller stories stories) and orchestration that the developer had to string together in order to derive this business value. This led to a series of long running stories that couldn’t be delivered in a single iteration. I spoke to a few colleagues who often got theoretical about how simple this was, for example they’d say “make every task a story and have it estimated, then roll these up into delivering the same business value”. This made a lot of (theoretical) sense, but in practice, it turned out to be a nightmare trying to tie these tasks back to the actual value business customers expected and also having to account for showcasing working software at every stage. I’m still not sure whether it was a case of being obstinate or indeed this was as simple as they claimed. The same applies to defining the required acceptance criteria for each of the stories. Below is a sample snippet of the acceptance criteria we initially drafted on the project:

Sample acceptance criteria

Breadth of analysis

When automating infrastructure, I discovered that the landscape wasn’t the same as a business landscape where it’s almost obvious what the customer’s journey should be (for example a journey through a product details page on any retail website). This was mostly because of how unfamiliar I was then with cloud infrastructure in general. For example, we had requirements for users to set up or tear down route tables (and route53) on AWS. With little knowledge of what route tables are, let alone the business value they bring, it became quite hard writing the related user stories. It had been years since I had pushed code to a server, connected a network cable, much less to a cloud platform. I never imagined that it would be a nightmare trying to understand how cloud computing platforms work. There is a plethora of technologies to choose from to solve any problem you have from Continuous Integration, Automated Testing, Logging, Searching, Queuing, Service Discovery, name it! I spent time trying to get familiar with what each technology we had on our technology stack was used for, mostly names of the vendors who’ve built these tools e.g. ElasticSearch, HashiCorp, Jenkins, Splunk, Palo Alto, etc. You will most definitely get to work with one or two and spend the rest of the project duration hearing developers complain about how inadequate the tool that was chosen was. This realization got even more complicated when I learned that end-to-end tests required for each stack we were building had nothing in common (remember snowflakes?).

Some argue that product folk aren’t required to know the details of how the code should work and many times developers avoid answering these questions (to my exasperation) and instead referring to them as “implementation detail”. Most argue that the product analyst should focus more on the business side and leave the technical details of the story to more technical folks. Whereas I do agree with this mindset, this gets unusually complicated when 90% of the user stories in the backlog are technical stories (implementation detail) and the rest of them are input / output focused. If you’re the kind of product enthusiast that I am and enjoy having a rich understanding of every “detail” of a story and what or how it may affect or impact the product, then you’ll quickly feel left out of all conversations with developers, product owners, architects and your end users since you won’t have a deeper understanding of the “guts” of your tool to deeply engage in critical decisions. I reckon that this comes down to one’s analysis style and you could get around with not worrying about the details but it will certainly complicate any facilitation you’ll need to conduct. I got around this by getting more and more comfortable with the fact that I wasn’t an expert, and tasked the developers to always “make me understand” what problem we were trying to solve.

Conclusion

One of the mistakes we made at the start of the project was to believe that we could get going on an infrastructure project without going through the same rituals e.g. inceptions, scoping, etc that we normally do before kicking off a project. We relied on a bunch of user stories that were extracted from a project gantt chart created by the client architects to construct the backlog. It also didn’t help that most of the target end users were new to the whole cloud concept and weren’t of much help when it came to defining their needs. A lot of the challenges we encountered were a result of this. As an product analyst, it’s important to go back to the beginning, to ensure that you understand what the vision of the product you’re trying to build is and work towards this.

The title and some of the feedback incorporated in this article were inspired by a friend and colleague, Asif Choudhoury (RIP), with whom I spent over 6 months ranting and debating the same challenges we both experienced on this project.

--

--

Raymond B. Matovu

Technical Product Manager | Software Engineer | Senior Program Manager