Communication Service Providers and Network Operators struggle to keep up with the pace of change in their networks. Current operational models no longer work when the industry is moving to cloud-based architectures. AI-powered automation is required to ensure that the business can deliver the applications and services that the consumers expect. Continuously!
Reduce time-to-market for new
services and features
Improve network resiliency
Eliminate IQ drain due to staff
for user impacting issues
Machines are more intelligent than humans; is this accurate? It depends. Researchers are currently developing algorithms and machine learning (ML) models within the narrow artificial intelligence framework (NAI), which outperforms humans in most specific cases (technically called targets).
NAI candidate is a collection of ML models created and trained on historical behavioral data to learn patterns and causalities aiming to be used to solve a specific business problem. When trained and implemented in the automated pipeline, those models predict patterns and potentially impact decision-making become an AI process.
Events and objects are seen by babies and processed to create history and connectivity in their minds. Our memory stores events that allow the learning of logical connectivity between neurons. For machines, historical data replaces events and objects, which presents a substantial value of the machines given the current processing power that it can use to read and analyze years of data. On the other hand, humans need much more time to pass through the same events and objects. Moreover, a machine learning model can remember historical data patterns and make informed decisions based on what it has learned, which helps cover the needed touch points related to business processes for robust statistical learning.
Challenges & Developments
Data availability is one of the biggest challenges for ML model training. Good data governance initiatives for CSPs have a strategic added value for the adoption of AI initiatives.
With a valid dataset, Data Scientists, with the help of subject matter experts (SMEs) in the industry specified, would develop and implement the appropriate model to solve the business problem in place, which will be later integrated by system developers for business decision making.
How can we distinguish the "appropriate model" and its characteristics? Start by defining a model; it is a mix of mathematical and statistical formulas defined by types and parameters. Data scientists define types, while parameters are learned from the data. A model becomes appropriate when solving a business problem using the right data and is implemented, given its complexity and availability.
Most of the used models fall under one of the following three types: supervised, unsupervised, and reinforcement learning. All types are designed to predict unseen events using seen/captured data. Thus, the second challenge of data scientists is to decide which model to use for each business problem.
Without zooming in into statistical and mathematical assumptions, supervised models learn and memorize the historical patterns and co-occurrence of the outcome (also called a label) and derive the optimal statistical function that links the outcome to patterns. Unsupervised is the approach of learning data variations and classifying patterns into a measurable space. For example, to develop a similar root cause of network failures, an unsupervised approach would help. In contrast, a supervised approach is a way to learn the behavior of an outcome through patterns and variations of data captured, such as predicting the type of a network failure event based on what happened before that event. When it comes to reinforcement learning, which is an innovative approach in the data science space and heavily used in certain industries, including telecom, it consists of integrating a human in model tuning and adaptation following some changes in the data flow coming to the model. Theoretically, the human is called an "agent" with a specific gain and loss utility function and trying to direct the learning towards maximizing profit.
The optimal model is the one with higher statistical accuracy (defined as "the appropriate model"), and minimizes training efforts and implementation/integration time with the environment generating the data. This tradeoff discussion is highly essential and should be a joint discussion between a data scientist and SMEs from the early stages of project design and roadmap. We have seen a high percentage of failed AI initiatives within organizations due to the lack of communication from the start. Data scientists tend to use the challenging mathematical and statistical approach, while CSPs are looking for a final output with clear objectives to solve a specific business problem.
A practical example is B-Yond's Agility products, which are based on advanced supervised learning on top of a reinforcement approach by integrating an SME (agent) in the loop for continuous feeding of relevant information from the network to the model maximize the accuracy of detecting the root cause. B-Yond's configurable black-boxes of pre-trained models are also already optimized on both dimensions: fully automated with highly intelligent models but keeping the necessary model configurations parameters available for SMEs to control models' operational aspects.
In the next series of blogs, we will go more into technical details of Agility product configurable black-boxes-AI models. We will also discuss Anomaly detection pre-trained robots that sit in CSPs environments 24/7 learning from data that report any issue and sometimes fix it automatically.
We often discuss similarities between Machine Learning (ML) and Human Intelligence. After all, it is “Artificial” intelligence, right? So, why wouldn’t you? To better understand the value of applied ML, you could instead take the opposite approach and look at how ML differs from us humans.
The nature of ML is that it is always learning, always sharing, never forgets, and never leaves for another job. Contrast this with an emotional human. We are not always in the mood to learn. We keep our knowledge close to the vest due to fear of becoming obsolete, we are forgetful and, the grass is always greener on the other side of the fence, so we leave for another job. Eventually. Think about it. Just the cost of poor knowledge sharing is huge. It slows things down further, reduces the quality of output, it is hard to replicate and repeat the process. Then the person leaves and takes all that knowledge with them. This churn adds additional costs in the form of recruiting efforts and even slower execution (after all, you just lost a team member). Fortunately for us, the current state of AI cannot even begin to compete with human intelligence. There are still suitable applications for AI/ML though. Take technology testing for example. Here we deal with repetitive (some would say boring) tasks. This accelerates churn further. It is better to delegate it to a machine and free up us humans work on the fun stuff, like research and development!
The “always learning” part of ML can create a perception that the ML-based solution is not fully baked. But the solution is fully baked. It is the world around us that is “baking” (changing). Everything evolves, changes, transforms. Even a system under test. To allow ML models to adapt, we allow users to provide feedback to the system. In AI-lingo, it is called reinforcement training. In the past, with rules-based testing software, you would have to modify or add a rule. If you are lucky, that is. Often, you would have to develop and deploy a software upgrade. With an ML-powered solution like B-Yond Agility, the user can provide feedback, which further improves the fidelity of the system. In other words, the tribal knowledge that normally would reside with one single engineer (the user in this example) is now part of the system and shared with everyone. It becomes part of the collective IQ of the ML-supported processes.
Early on in the B-Yond journey, when we applied Artificial Intelligence (AI) to the analysis and root cause phases of testing, we had a hunch that machine learning (ML) was a suitable approach because of two distinct properties: (1) ML can process massive amounts of data and, (2) ML provides predictions based on data patterns instead of binary conclusions. While both properties are important, the latter is unique to ML and the driver behind why we have been able to solve the “Two-Third Dilemma” of the technology life cycle. This dilemma is that, while test execution can be automated (the first third), the results analysis and root cause determination (the other two thirds) require a lot of human-like processing. Things are not black-and-white, one test run may not be identical to the next, what we concluded last time only applies partially to the next, and so on. ML deals with patterns and offers a prediction with an assessment of a pattern’s resemblance to something that has been seen before. In that regard, ML resembles human thinking.
We believe that solutions, like B-Yond Agility, will ultimately make life better for people by providing a complementary role that was never done well by humans in the first place. Let’s look at this in more detail. Let’s look at how Agility applies ML to Continuous Integration, deployment, testing, and validation (CI/CD, CT/CV – Check out the white paper here).
Agility reduces test result validation, failure analysis, and root cause from days to minutes. This reduction in the technology life cycle has profound implications on its own. It eliminates the long pole of go to market for new services and applications. The business benefits are many. Reduced test costs, improved quality, first-mover advantage, accelerated time to revenue, faster feedback on the success of a new feature. The list goes one. Ok, so an ML-powered solution like Agility is faster than a human because of the way AI is different than humans. What else?
With ML applied to CT/CV in production environments, there is tremendous value beyond just accelerating issue triage. You can also reduce your reliance on support from equipment vendors because the ML is now handling the issue triage. Once you gain confidence in the predictions, you can begin to automate the operation towards a closed-loop, self-healing system. It sounds like a pipedream, perhaps? It is closer than you think.
If you are interested in learning more, one great way is to schedule a demo. It is easy. Contact us here!
Over the last two decades, I have co-founded several companies serving the telecoms industry, and in the process, have met with a majority of the executives from the top telcos around the world.
When you ask about how a telco perceives itself, you never get the same answer. It varies from one to another, sometimes even between one country’s operating business to another within the same group (depending on CEO’s leadership style).
Take the Spanish market, for example. Earlier this year I met with the CEO of Orange Spain who told me: “We are not a telco. We are a software company.” Orange Spain claims to have a large team of engineers and data scientists. It also has data science use cases developed in-house. Orange as a group has its own private cloud. It even has its own video conferencing system.
Contrast that to MasMovil, the fourth largest carrier in Spain, growing steadily through financial engineering and marketing as a beachhead. MasMovil started as an MVNO (with the acquisition of Yoigo) and grew through further acquisitions. MasMovil has a 35-year roaming agreement with Orange. Based on my multiple meetings with CEO Meinrad Spenger, I am not surprised. He is a lawyer, ex-McKinsey, IE MBA graduate and a strong negotiator.
Let’s look at Telefonica; Telefonica’s CEO who is a firm believer in artificial intelligence (AI) and continues to educate himself on the topic. Telefonica is the incumbent operator in Spain. Telefonica has invested in its own telco cloud architecture, known as Unica, has not given up on edge cloud against the webscale companies, and claims to have developed over 400 AI applications to support its business (with no confirmation about how many are operational). Its chief network officer is chairing an industry-wide edge cloud forum called GSMA Telco Edge Cloud, with representation from 22 international telcos. Its group CTIO recently signed an open RAN collaboration with Rakuten.
Take two of the US Tier 1 carriers as other examples:
AT&T has a long track record of innovation, starting with AT&T Labs (not everyone is old enough to remember Daytona). A few years ago, AT&T evolved its innovation strategy and initiated an open source strategy, launching initiatives such ONAP, Airship and Akraino, which are more collaborative innovation strategies leveraging contributions from open source communities like Linux Foundation and Openstack. Today, AT&T is also leading initiatives to open up the network and, for instance, is heavily committed to O-RAN, along with other key carriers like Deutsche Telekom, Telefonica and Rakuten.
Sprint outsourced its entire network to Ericsson at some point in time, probably the largest managed services contract ever awarded in the US. A few years later, Sprint reversed that decision and I was one of some 40 telco execs invited to discuss plans forward after the Softbank acquisition. There were hints of massive investments in the network. These plans were later scrapped, and we all know the rest of the story: the T-Mobile acquisition.
Finally, there are many examples around the world of telcos innovating to create new revenue sources. For example, Telus stepped into data centers and healthcare. Businesses contributed a billion dollars plus towards new revenue streams for their top line, transforming operational models. I am especially impressed by Deutsche Telekom’s recent push to transform its next generation IMS — telco cloud automation which has been recognized by the World Communication Awards for “Best Network Transformation Initiative”. There are also carriers such as the Finnish company, Elisa, whose ambition goes way beyond its home market, and who acquired Swedish company, Polystar, which focuses on network analytics.
So, different telco executives have varying views about their company’s identity, and the role of innovation within their organization. That view evolves, too, with time.
Then there are outsiders trying to bring innovation to the telco space.
My company was one of seven co-founders of the Facebook Telco Infra Project (TIP). When we attended the first meeting, I wondered if Facebook was planning to take over the telco business in a few years, based on the ambitions laid out. Later, it became clear Facebook was trying to create a forum to advance innovation within telcos, which would not only benefit the telcos but also certainly indirectly benefit its own business (i.e., connecting the unconnected, hence, selling more ads). Today, TIP is highly focused on O-RAN (disaggregated RAN network) as an alternative presented to carriers looking to replace an extremely consolidated market of traditional RAN vendors in their networks (especially in the light of Huawei rip out).
The Telecom industry has touted the level of mobile product enablement that will come with 5G. Low latency, fast data speeds and, highly reliable mobile connectivity will open the door to applications from enterprise-grade fixed wireless connectivity, augmented reality to autonomous cars, and even remote surgery. More importantly, 5G is about the cloudification of the telco infrastructure and opening the Edge to innovation and developer communities. All these applications have one key hurdle in common: Will wireless connectivity provide the level of reliability to support these performance-sensitive applications?
We go about our daily routines using products where our trust in their reliability means that we are betting our lives on them. Planes, cars, surgical equipment (etc.), the way they’ve arrived at this level of trust is rigorous testing. Producers of these products and services have perfected testing in two ways: first, the ability to simulate the real-life application of the product. Second, the use of extensive testing platforms that allow for replicating and testing every scenario.
Telecom testing processes today are, unfortunately, not at that “bet-my-life” level. Labs are a limited replica of production. Test validation processes are highly manual, costly, and slow. Network slicing promises to address the former to some degree. However, the processes that achieve full automation of network functions and service orchestration while maintaining true network resource isolation have yet to attain enough maturity to address the challenge in the short term.
But there is hope. The technology already exists to solve for much of this. The first step is the creation of production service replication in test-beds automatically, on-demand, including decommissioning once the service is verified. This, first and foremost, is dependent on CI/CD pipelines that not only automate DevOps processes but integrate detailed testing pipelines through, what we call, Continuous Testing and Continuous Validation (CT/CV) pipelines. The CT/CV pipeline is an end-to-end test automation framework that includes: classifying the service call flow patterns through implementing machine learning, closing the loop on test execution, and allowing for an exponential increase in the level of detectable call flow variations that is virtually independent of any human intervention.
What we know from the millions of call-flows we have processed, is that lab and production traffic can be similar, meaning that the resulting insights are highly portable. By creating the right test batteries, combined with the automated network replication processes mentioned earlier, including CT/CV, our pre-production certification process not only produced much more reliable network functions and services but our Machine Learning (ML) models trained to recognize call flows in pre-production were reused in production.
The results of the business are impressive. Applying preventive measures before production rollout means that the quality of the software is at a much higher level than before, reducing the number of consumers impacting events down the road.
If we plan to open up our networks to innovations from third-party developers and performance-sensitive applications, we would have to begin our journey towards uncompromised quality before services hit the consumers. Consumers cannot be guinea pigs; we have machines for that.