NetOps: Getting Started

If you're reading this, you may be a network engineer like me, and you probably find the term ‘NetOps’ to be nebulous, as many trendy terms are in the industry. Or maybe you're a software developer with experience in DevOps, but you have just been exposed to some networking problems and wonder whether NetOps might provide you with some answers. Or perhaps you're just someone needing a real-world perspective on yet another buzz word you’ve heard. In any case, you’ve come to the right place. Here I'll provide some perspective on what NetOps can do, and what you can do to get started.

First, let’s characterize what we mean by NetOps. It is a purposefully broad term, so you will find there are aspects that overlap with other trends. These include DevOps, automation, SDN, infrastructure as code, etc. I think about these things in terms of the objective or the outcome we are aiming for. NetOps is about improving the agility that network engineering teams—with their operations counterparts—can achieve in deploying network functionality, making changes and generally operating a network. This will encompass people, how they work and the tools they use, i.e., skills, practices, and tools. Thinking about the outcome gives you the flexibility to leverage many technologies and focus on any one of a large number of areas to move toward an agile NetOps environment. Remembering the objective will also help you avoid a hot technology or tool that doesn’t fit your case. In reality, though, NetOps or any of these other trends, cannot be adopted overnight. As much as you might like to, you can’t take a NetOps pill and wake up the next morning NetOps-ified. You need to evolve in small steps as we’ll discuss here.

The practical experience in this blog has come from the recent work my team has been doing. I have put together a team over the last few years that work in this area of NetOps. We’ve assembled a group of people with network engineering, software development, and professional services skills. Dubbed Network Development Services or NetDev, we focus on helping clients around the world with their NetOps and other similar projects focused on disaggregated networking. The types of projects we work on are quite varied, from providing adjunct development capabilities for technically sophisticated companies to providing customization and integration, often with third party components and inclusive of our competitors’ equipment. We have chosen to deliver these services via joint teams formed with our clients and partners to best ensure client success in these complex and cutting-edge areas. This also allows us to deliver additional value by transferring the skills, tools and practices our clients need to become more self-sufficient.

Our work across organizations has provided us a realistic perspective of where companies are in this transformation of networking. Even more, we’ve developed a practical approach to one of the trickiest aspects of NetOps and the related trends—getting started. That’s what I'll focus on in this blog.

You hear it all the time, but getting started is key. Once you begin working on a NetOps project, you're already learning, and that's progress. Yet while this sounds simple, for some, it’s actually the hardest part. Common issues include: reaching agreement on what project to do; research and planning to understand how to do it; and articulating the reasons why it is beneficial to the organization. We’ve seen these issues derail projects before they even get started. To avoid this, don’t aim for perfection. It doesn't matter if you don't pick the best thing, or even end up picking the wrong thing (if even possible). Get something your small team sees as a good starting point and run with it. Most of all, value the learning experience. Some ideas for projects can be based on network abstraction that removes vendor and device specifics from application calls; or automation of troubleshooting and remediation which links design and operations; or version controlled config repositories of applications, systems, and networks; or development and deployment of networking software using Agile methodologies.

Culture has come to be considered a blocker to NetOps. A belief has emerged in the industry that DevOps and NetOps are dependent on cultural change. Cultural change can be difficult, and it’s unrealistic to expect that you can make wholesale changes within an organization in a short time. Just like improving your golf swing, or any other skill, trying to change too much can send you backward. If you think about what culture is, you'll see it is merely a collection of behaviors—the many behaviors a group of people has come to adopt in order to work together. Now, these have been adopted for good reason. Therefore, you need to anticipate the group will try to protect norms and impede change by shunning or attacking new elements that act in a way that goes against the group-accepted behaviors. The only solution is to start small. You’ll need to look for one or two things that can be changed and then prove there is a better way than the norms would allow. Once you have these examples, use your supporting leaders to shine a spotlight on them. Here you drive change by leading by example.

Consider typical IT departments today—efforts to control costs and ensure security and reliability often restrict individuals, reducing their flexibility and their ability to adopt new approaches. At a recent AWS Summit, I heard the story of an individual at an energy retailer who had an idea to develop a more sophisticated analytics system to better predict energy requirements, and as such, optimize purchases in the generation market. But the IT department was unable to provide him with the compute resources to experiment with the analytics. Further, IT required a fully developed business case (a most potent cultural control weapon), which just isn’t possible when you’re experimenting.

As a workaround, he thought to use AWS to minimize the upfront costs. However, he was again blocked, this time by procurement—AWS was not an approved supplier. Getting AWS approved had many hurdles and once again required a business case. Deciding to go renegade, he opened an AWS account with his credit card and began expensing the cost quietly. However, they were still not in the clear—IT had locked down the network, blocking access to AWS. Still not deterred, they began working on the first floor of their building where they were able to piggyback on Wi-Fi access from the hotel across the road.

After some time, the team proved they could build an analytics system with scalable compute power that leveraged many more inputs than previously possible. Exceeding their own expectations, they started delivering cost savings after only a short period. To the company's credit, they took the opportunity to promote cultural change by extolling the divergent behavior by the team. Now, this isn't a recipe for anarchy—they could have failed at any step for a relatively lost cost; they could have made a mistake (I'm sure they did), but they had the opportunity to learn; and they could have delivered no results, yet I'd say the individuals would have still learned some things that would help them in their next endeavour. So all up, culture needs to change sometimes to allow you to work in new ways, and the only way to work through this is to experiment in small teams, making a few changes at a time.

Now, once you’ve overcome these organizational hurdles of getting started, what are you to do in your NetOps project? Well, you'll no doubt have heard Braveheart-style calls for automating-everything, software-define-all, complete-vendor-abstraction, block-all-CLI-access, etc., but what practical approach can you take? Here are five simple steps we've honed through our projects that are applicable to small and large problems alike.

One: Pick something manageable.
This is very important. Go too big, and it’s likely the project can become overwhelming. Success begets success. Small wins will lead to bigger wins. Now, this is harder than it sounds, as your organization is probably going to pull you in the opposite direction. That double-edged sword—the business case—will lead you to make large extrapolations. People above and around you will be looking for winners to back; this will lead to their asking you to take on additional requests. But you must resist, so as to succeed with your current project before taking on a larger one. Additionally, it’s better to get started with a small team of those critical to your project, rather than a larger team that encompasses every resource you’ll eventually need. Doing a bit of horse-trading down the track to get help is OK, but the more chefs in the kitchen early on, the harder it will be to develop a shared vision. You will definitely run into unforeseen issues and these are much easier to resolve with a (usually small) team of like-minded people.

Two: Map the whole workflow.
Workflow is a term used here as the set of tasks that need to get done repeatedly to achieve each outcome. Examples could be provisioning a new service in the network (triggered by a user portal or a virtual machine change) or remediation following a network failure, based on some operational data that is beyond the protocols in use. The operative words here are: (a) map, i.e., understand and write-down and (b) whole, i.e., all tasks to achieve the outcome. Include all those tasks that may start or finish outside of the network or engineering team. You may not go on to touch all items of the workflow (in line with step one), but if you do not know about them at all, you will not know if your work will deliver any value. For example, if deployment and operations teams are left out, they may block a project going live and soaking up all the efficiencies gained in the early development phases. Missing the mapping of the whole workflow will greatly limit step three and your potential outcomes. Note that workflows are not meant to be like processes, rigid and inflexible; use this step to learn the problem space and share to generate ideas on how to improve. You can and will always make the workflows more sophisticated over time.

Three: Identify bottlenecks and optimize.
Once you have a complete view of your workflow, you have a great opportunity to improve it. Rather than focusing your efforts on those very simple, narrow workflows, such as merely inputting commands into the command-line of a network device, (scripts optimized this long ago), focus on those workflows that include manual steps involving human interactions. Think reviews, approvals, and testing. These are often the bottlenecks in your workflow. As these involve changing behaviors, i.e. cultural change, you will not be able to tackle many of them in one go. But just knowing what is there, and getting some addressed, can deliver value and can build the case to change others in a future project. Bottlenecks can be tackled with a tool, e.g., Robot Framework, Stackstorm or Ansible. You may also need to split out items that can be improved now, versus those that will need more careful management, e.g., a person with scarce knowledge that needs to be involved in many aspects of the workflow.

Four: Automate all the steps possible.
You'll have heard the mantra automate everything. While removing as many humans from the loop as possible is good, it can be somewhat difficult. However, automating some aspects of the workflow is definitely achievable. Good, low-hanging fruit can include using a version-controlled config utility like Puppet or Ansible. This could allow you to standardize your build process and therefore your environments across dev, test, and production. This would allow you to remove common issues that result from new updates moving through the deployment process. (We often hear, "Well, it worked in my dev environment!") There are numerous examples like this in networking, such as tests, where tools like the Robot Framework help immensely. Use the 80/20 rule—don’t try to automate for every possible scenario, get the main cases covered and allow the automation to be proven and made more sophisticated over time.

Five: Continually improve, reliably.
Recognize that you must continue to look for improvements. You will find many things can be improved using the experience you developed through these steps. Merely list them now for future prioritization. And plan to start the next project as soon as you finish this one. Finally, with regards to reliability, take care not to overcommit. Having completed steps two, three and four, you will be able to understand the capacity of your workflow, whether that be the number of provisioning changes per week, day or hour; the number of releases and their size per quarter or year; or something else. Determine the capacity of your workflows, with all their bottlenecks and the automation you’ve achieved. Knowing this, it's important to minimize work-in-progress, or WIP in the DevOps parlance, as WIP is 100% wasted effort until it exits the workflow complete. So you will need to learn and be strong to say “No.” Simply, you don't want to start things you can't finish.

This provides a high-level, generic overview. In future blogs we'll share our experience in specific areas, such as network testing with the Robot Framework, SDN and NFV orchestration and other areas in which we find ourselves working. This will, over time, grow to provide you with a practical set of examples that will help you get started and deliver.

Originally Published on the Brocade Community on 4/27/2017.