For the longest time, automating networks—especially multi-vendor, multi-domain networks—has been extraordinarily hard. While there are no shortage of tools and vendors to step up to the task, the end destination of closed-loop automation in networks has seemed elusive.
Because of this, large network operators haven’t been able to move at the speed of innovation they anticipated. Services still take weeks to set up and customer satisfaction still lags as faults linger on far longer than they should. Thanks to COVID-19 and the shift to 5G, there is an explosion of new network equipment being deployed as I write this, all of it without the necessary automation. So, when the deployment party is over, the ensuing hangover will likely be severe.
I would argue much of these challenges are linked to a combination of poor network automation strategies, locked-in architectures, and the application of false doctrines. Allow me to unpack what may be perceived as harsh criticism of our industry.
Let’s start with strategy
Many network automation projects are started with a single goal—orchestrate a new service, automate top-of-rack switch deployments, or automatically create trouble tickets—just a few from a long list of automation tasks facing most operators. These are each laudable goals, but they are not by themselves, an automation strategy.
For each automation project, four key areas must be addressed, preferably in a framework that works across the network, and not just for one element or vendor:
- How will I provision network elements collectively, or as a service, and know when it has been successfully deployed and forwarding traffic correctly?
- Where will the telemetry from these systems go, and how will I deliver this information to all the analytics and security applications I might choose to deploy? What happens when these tools make a decision? How is this fed back to the network?
- How will I deal with task-oriented workflows such as fault management, resource overload, or changing a security posture?
- And how do I apply new applications to the network such as a path compute engine, flow management, or even the original vendor’s EMS system?
For more information on automation strategies, what works and what doesn’t, download our white paper.
It may seem obvious to look at the automation problem holistically, but many network automation vendors are doing the exact opposite, and positioning quick-fixes for one element of this puzzle. For this reason, the abandonment rate of automation projects in network operators is very high, because you can’t touch one element of the puzzle without affecting another.
I want to look at three myths propagated by vendors that are the direct cause of much of this angst, and how their strategies can be realigned to fit a much stronger outcome:
You can deploy Orchestration without SDN
Yes, you can. You can also drive from San Francisco to New York. It’s going to take ten times longer; you might break down or have an accident along the way, and you’ll be a bundle of nerves at the end of it. Let’s not forget your car is now stuck in New York and you’ll need to drive the reverse journey all over again just to get home.
When orchestration vendors first started, they embraced the idea of plugging their products into SDN controllers. The keyword here is “abstraction”. The purpose of the SDN controller is to take generic instructions from the orchestrator and turn them into instructions for specific vendors. In this way, the orchestrator doesn’t really care if the optical network is Ciena, ADVA, or Infinera, or if the router network is Cisco, Juniper, or Arista; the SDN controller takes care of it, because the devices are abstracted through a YANG model. Because of the wide acceptance of the OpenDaylight SDN controller by both network operators and many vendors, these YANG models are already worked out. Because of this power, vendors are required to show up with products supporting OpenDaylight, and this is no different to having a device show up with support for ethernet or BGP. It's expected.
This is the model laid out by both ETSI and TIP, but you wouldn’t know this talking to orchestration vendors. Rather than take this easier path (which admittedly provides them with less account lock-in), they have built their own proprietary methods for controlling different vendor devices. They chose to drive instead of fly. This means that they must perform this integration effort for each customer and for each service. While they may potentially reuse these adaptions across different customers, this is a Sisyphean task because the onus is on the orchestration vendor to integrate with each new version of operating system and each new product created by each vendor. Worse, many of these orchestrators are supplied from vendors that sell their own optical or IP equipment, so they obviously work best when orchestrating with the self-same vendor. All bets are off when one competitive vendor demands help from another to “take control” of their competitor's equipment.
These proprietary orchestrators usually perform their orchestration tasks very well, but the remaining three areas of network automation are neglected. This means that the act of provisioning a new customer is taken care of, but many of the lifecycle management, workflow, analytics, security, and closed-loop automation tasks are not included and are now locked out of automation because there isn’t an SDN controller to deliver this information. What should be an open eco-system to allow many different vendors to bring solutions to your network is closed, and you have a long drive home.
Overlays simplify network automation
No, they don’t. An overlay is a brilliant idea that creates dynamic links across the network that can be brought up and down from the endpoints directly. I use one every day when I “VPN in” to my company’s network from my home. They are widely deployed in data centers and in SD-WAN solutions too. The problems are that they assume the network is already working perfectly, that any choices they make about path selection will have zero impact on congestion and they are invisible to the day-to-day management of the underlying network. While they absolutely have their place in the network automation framework, they represent a divergent strategy to holistic network management, not a convergent strategy (which is the destination here). For this reason, overlays should be treated as an additional domain to provision, manage, and monitor, alongside the IP, MPLS, transport, backhaul, and radio networks. For these reasons, overlays are a traffic separation strategy, and not an automation strategy.
Workflow tools make network upgrades easy
Well they should, but here is the problem: workflow engines are a key part of the network automation framework. Acting as a decision engine, their logic acts in a “If this, then that” model, so a link failure can be addressed, or a sequence of mind-numbing tasks can be executed flawlessly. When workflow engines are applied to networks directly, their instructions tend to be hard coded to a specific product, and worse, a specific version of a specific product. For example, they may pull interface statistics from a router, and based on this information reconfigure a path, or open a trouble ticket. These use cases are often built by the operations team directly to take shortcuts and save time, without thought to an automation strategy specifically. The problem comes when the device or network changes. The interfaces are reconfigured, the device is upgraded or swapped out. Suddenly the script stops working, and your upgrade cycle comes to a halt.
To address these problems, the workflow engine must interact with abstractions too, and as you might have guessed by now, the SDN controller takes care of this. With the SDN controller translating device logic, the workflow engine doesn’t have to worry about different versions of vendor software, or even different vendors.
When it comes to upgrade cycles, the workflows still need to be tested, but again, the onus on delivering a consistent NETCONF and YANG model stays with the equipment vendor, and should not be the responsibility of the network operations team to resolve in the middle of the night.
It is perhaps inevitable that Sirens of vendors promising quick-fix solutions to point network automation problems have lured so many onto harsh rocks. While these three myths still circulate, their allure will remain enchanting.
It is important for network operators to now focus on a complete strategy and follow what has so clearly been laid out through the standards bodies and by companies that have already been through this journey. At Lumina Networks, we’ve spent the last six years focused on delivering OpenDaylight at scale to the world’s largest network operators. This has given us a unique insight into delivering SDN in a complete network automation framework, which has helped our customers save literally billions of dollars in operational costs and given them the ability and agility to deliver 5G networks at scale.
For more information on automation strategies, what works and what doesn’t, download our Strategies for Automation and AI in Large Heterogenous Networks White Paper.