B-Yond
December 18, 2024

How Can You Use AI While Conforming to Stringent Data Protection Rules?

How AGILITY Handles Customer Data

Implementing AI for telco network operations presents several challenges:

  1. There is sensitive personal customer information in the data.
  2. The data includes business information that the telco wants to protect.
  3. There are government regulations to adhere to.
  4. Corporate governance rules restrict where the data can reside.

We have seen some progress in customer support whereby support tickets have been combined with Large Language Models (LLM) using Retrieval Augmented Generation (RAG) to help the support team better handle customer calls. It can be argued that this type of data is the least sensitive and therefore the first place to go. However, the nature of Generative AI aligns quite nicely with the Support use case simply because it is leveraging text (the support tickets). And text generation is core to LLM. The benefits to the business are also quite promising:

  1. Improved support interactions,
  2. Faster issue resolution.
  3. Accelerated learning curve for junior support engineers.
  4. Better retention of knowledge.

How do we bring AI into the network operations side where we deal with the network control, user and data planes plus a plethora of other signals that inform us of network health?

While this is a lot of data, the data we can train on is limited due to the challenges we mentioned earlier. Furthermore, any AI would benefit from being trained on a data set larger than what any one telco can provide. And, telcos don’t want to share data. We can't bring a large LLM to the telco and train it there. It is just too costly. Besides, GPU is in high demand and hard to come by. With no appetite to bring the data to the LLM, even if it is just to augment the output with RAG, it seems we are stuck.

So, what if we can create a solution that:

  1. Allows the data to remain on premise.
  2. Protects the trained AI model from leaving the premises.
  3. Continues to learn and adapt to this customer's unique environment without requiring retraining.

How do we solve this? In AGILITY, we did it by:

  1. Create our own synthetic data set based on network standards.
  2. Train an AI base model using synthetic data.
  3. Implement a Continual Learning process (more about this later).
  4. Deploy the models so that the instance is dedicated to one customer (most often on-premise).

Let's focus on the Continual Learning process since this solves the challenges mentioned earlier.

Continual Learning requires 2 elements:

  1. An Active Learning mechanism
  2. Context for the entity that will do the the teaching.

Active Learning allows an AI model to be updated without requiring any retraining.

The entity that does the teaching (or, updating) can be another system or a skilled person. Ideally, it is a combination of both and that is the case with AGILITY. A skilled person is an expert in the domain, such as a network engineer. A system can be an algorithm or software that provides important context that can help in determining what is happening and how to adapt the model.

We call the decision maker the “Oracle”. In our case, the Oracle is a human domain expert and the system we developed provides important context.

When the Oracle gets help in the form of a good context it results in better suggestions that will improve the AI model. One could argue that context is mission-critical in the Active Learning process.

We decided to create the context by using a Similarity Analysis based on our own implementation of the Levenshtein algorithm.

This works well for us since we convert PCAP data into a form of text and Levenshtein is all about text comparison.

The similarity analysis detects previously unseen data flows and compares with flows that the AI model is trained on.

It identifies the closest match and proposes resolutions to what is going on in the data. In other words, it provides context.

This context speeds up the analysis for the Oracle and improves the resulting classification.

In summary, this approach…

  1. Retains tribal knowledge that all engineers can benefit from going forward.
  2. Accelerates the learning curve for new engineers.
  3. Reduces business impact from employee churn.
  4. Conforms to all data policies and regulations.