Zohaib Aftab, Senior Data Scientist
July 18, 2024

Transforming Telco Troubleshooting: Our Journey Building TelcoGPT with RAG

Large Language Models (LLMs) have made significant strides in the past year, advancing Natural Language Processing (NLP), and attracting global business interest. However, these models, trained on publicly available datasets, often lack current events and private enterprise data, requiring business-specific adaptations.

Our team at B-Yond addressed this need for the telco industry by building a chat bot with custom knowledge. Telco engineers frequently search through extensive public and private documentation, which can be time-consuming. Current LLMs, while incorporating some public knowledge like3GPP standards, fall short with specific vendor documents or newer information.We developed TelcoGPT using a Retrieval-Augmented Generation (RAG) approach.This article shares our journey and lessons learned.

The Project

The Pain Point

In the telecom industry, fixing network issues quickly is crucial. Troubleshooting requires access to and integration of various information sources. Engineers must often consult numerous technical manuals, troubleshooting guides, and network analysis reports. Additionally, they may need to use system logs or historical data to identify the root cause of issues, and tribal knowledge may not be documented so it is not efficiently shared across a company or team of engineers.

The Solution

To address this challenge, we developed TelcoGPT, an interactive chat engine designed to analyze 4G and 5G networks. Based on Large Language Models, it can understand natural language queries, quickly search through standards documentation,manuals, relational databases, tickets, and logs, and provide comprehensive answers. TelcoGPT interacts with various data types, including structured (like PostgreSQL), semi-structured (such as CSVs), and unstructured (textual) data.

Figure 1 depicts the overall architecture of TelcoGPT. The flow begins when the user asks a question:

Figure: Highlevel overview of TelcoGPT

    1. The query is first processed through a classification chain to determine if external sources are needed.
    2. If no external sources are needed, the query proceeds to the inception flow for a response.
      TelcoGPT answers internal queries related to the chatbot’s identity and purpose which are embedded through prompt engineering and filters out queries unrelated to telecommunications.
    3. If the query requires external sources, TelcoGPT uses vector search andSQL query generation and execution using SQLAlchemy.
      This step accesses data from sources such as the vector databases, technical documentation, Jira tickets,and system logs.
    4. The consolidated information is processed through the LLM to generate a response for the user.

    This architecture ensures that the information provided is always up-to-date and based on retrieved information.

    Architecture Choices

    A framework, should be simple to implement and update,ensuring a smooth and efficient development process. Adequate community support is also important, as it helps troubleshoot issues and improve implementation.Additionally, the framework should offer consistent and reliable answers to maintain the accuracy of our results.

    Choice of Framework

    LangChain is a popular framework for developing applications powered by language models. It offers abstractions to interact with LLMs and connectors with diverse data sources for context-aware applications like TelcoGPT.

    LangChain has faced criticism for its inconsistent abstractions and unnecessary complexity, but we found the LangChain Expression Language (LCEL) implementation effective and consistent. It provides a standard ‘Runnable’ interface and invokes most methods similarly and makes it easy to link together components of a chain and multiple chains. Additionally, it offers easy integration with LangServe, a framework for deploying LangChain runnables as REST APIs.

    Our experience migrating our legacy codebase to LCEL has been positive. It has simplified our workflow and enhanced our ability to add more data sources. In our opinion, LangChain remains the go-to framework for ease of use and is excellent for quickly developing and testing our solution.

    Choice of Workflow

    The winner: Query Routing

    We used query routing to guide the model toward relevant sources. TelcoGPT combines LangChain’s RunnableBranch and RunnableParallel interfaces to manage the flow between data sources. Compared to agents, query routing delivers answers much faster while maintaining reasonable accuracy levels.

    First, we define a predefined string of actions based on user input and a description of each target source. Unlike agents, query routing is unidirectional and cannot correct mistakes by revising a previous step. However, with a high-quality indexing/retrieval system and carefully crafted prompts, we achieved reasonable accuracy without the need for a looping process.

    The runner-up: Agents

    Agents are a popular workflow in LLM-based AI applications. Agents are like guides that break down tasks for language models step-by-step. The ReAct framework,well-known in this field, uses a loop where these prompts call external tools or data sources as required. This method is effective for complex tasks that need refining through multiple tries to minimize mistakes.

    While useful for complex tasks needing several iterations to reduce errors, agents slowed our process due to multiple reasoning and feedback loops, resulting in many API calls to the LLM.

    Data extraction

    Telecommunications documents are vast and come in many different formats. Our first challenge was to create a system that can handle all these documents and make them easy to find when needed. Instead of changing all the documents into one format, we kept them in their original forms. This made our system more flexible and allowed us to process documents separately and more efficiently.

    Data extraction from PDF files

    PDFs are a popular format for technical and enterprise documentation due to their small size and portability, but extracting data from PDFs is challenging because they are visually structured documents that are not designed for data manipulation.

    Text in PDFs is often stored as individual characters or words, not as coherent sentences or paragraphs. Several libraries in the LangChain environment can help extract data from PDFs. PyPDF and PyMuPDF convert a PDF into an array of documents,where each document contains page content and metadata, and the Unstructured package is a good choice for more complex or structured data extraction.

    Extracting tables is more complicated. A table in a PDF is a collection of lines and text positioned to appear as a table, and tables are often interspersed with non-tabular text. This issue is a focal point of research, with numerous solutions ranging from open-source tools to proprietary software.

    Camelot, a Python library powered by OpenCV, identifies the lines and borders that form a table,retrieves the text from each cell, and organizes it. It also saves the page number of the table as metadata, which can be used to cross-reference the table with other text.

    The data extraction pipeline is critical to our solution. We tested many frameworks, but none are perfect. Extracting meaningful descriptions of diagrams remains an unsolved challenge. Recent developments suggest using an image-to-text approach, but this would significantly increase the cost of the solution.

    Dealing with Structured Data in Telco Engineering

    In telco engineering, structured data consists of relational databases (SQL) and simpler formats like CSV or Excel files. Efficiently accessing and using data from these sources is vital for AI projects, and LangChain's SQL agent toolkit plays a crucial role, bridging natural language processing and structured data querying.

    SQL Databases: Harnessing Text-to-SQL

    SQL databases store structured data ideal for complex querying. LangChain's SQL agent toolkit simplifies interaction by converting natural language queries into SQL commands. This capability is important for efficient data retrieval, ensuring clarity and specificity in queries to enhance accuracy.

    When to Use the SQL Agent

    TheSQL agent excels when interacting with data without extensive SQL syntax training.This process not only speeds up data handling but also safeguards against destructive SQL commands (DML statements), ensuring database integrity.

    To optimize workflow:

    • Flatten the table: Ensure your prompt includes comprehensive prompt details (field names,types) to flatten query complexity and improve accuracy.
    • Execute the SQL chain: LangChain’s toolkit utilizes a straightforward three-step SQL chain:
      • Text to SQL: Converts your natural language query into a SQL command.
      • Execution: Runs the SQL command on the database.
      • SQL Output to Final Generation: Translates the SQL output back into human-readable form or into the format required by subsequent application layers.

    Working with CSV Files: Integration Approach

    For CSV files,loading data into an SQL database and using the SQL agent toolkit is more efficient than direct manipulation or vectorization. The SQL agent toolkit’s keyword search and data aggregation is particularly effective for categorical or numeric data fields.

    Vector Databases

    Expanding the RAG project to handle larger volumes requires moving from in-memory vector libraries like FAISS and Chroma to hosted vector databases like Pinecone, Weaviate, or PGvector. These databases support extensive CRUD operations and can store data objects alongside their vectors, which is essential for scaling our operations. When evaluating vector databases, we considered several factors: self-hosting capabilities,cloud management, role-based access control (RBAC), CRUD operations, PDF storage, and hybrid search capabilities. We selected Weaviate after thorough analysis due to its strong performance across key criteria.

    Evaluation

    An evaluation framework is essential for several reasons. First, just because a system seems to work doesn't mean it's accurate. An evaluation framework allows us to assess the precision and relevance of the retrieved and generated information. Second, although many articles suggest the best methods, we must test these on our specific use case to see how they perform. Finally, an evaluation framework is crucial for fast iteration, letting us quickly test and refine our approach to get the best results.

    Testing Framework

    "If you cannot measure it, you cannot improve it." — Lord Kelvin

    The effectiveness of a RAG system hinges on its continuous evaluation and refinement. Continuous evaluation is essential for refining Retrieval-Augmented Generation (RAG)systems. It quantifies current effectiveness and identifies areas for improvement across components like data ingestion, vector storage, and the large language model. Evaluation typically involves human assessment, thorough but time-intensive and not scalable, and automated frameworks like RAGAS and ARES, which evaluate:

    Retrieval Effectiveness: How well the system fetches semantically meaningful information from the vector database in response to user queries.

    Generation Quality: Coherence and contextual alignment of responses generated by the LLM with the query and retrieved information.

    Hallucinations: Potential generation of false or irrelevant data by the AI system.

    These frameworks rely on benchmark QA datasets derived from documentation, which may not cover all real-world queries and are limited to text data, excluding evaluation of SQL databases.

    For TelcoGPT, we employed a dual approach:automated evaluation using RAGAS for rapid feedback and human feedback from subject matter experts (SMEs). Fast iterations with RAGAS identified immediate impacts, followed by SME testing for comprehensive evaluation. Integration of both approaches ensured robust performance enhancements before implementation, as solely relying on automated metrics occasionally overlooked critical nuances captured by human testers.

    Deployment

    Deploying a Retrieval-Augmented Generation (RAG) system comes with several challenges. Deciding between a SaaS LLM service and hosting a local model mainly depends on performance needs, data control, and cost. Some clients might have their own local LLMs or be hesitant to use third-party services.These factors need to be carefully considered to make sure the deployment meets the specific needs and limitations of each use case.

    SaaS LLMs

    SaaS LLMs, such as those offered by OpenAI, are accessed via API calls,and operate on a pay-as-you-go model. They provide scalability and flexibility to adjust to varying demand levels without additional hardware costs. However,using SaaS involves potential privacy risks due to data transmission to and from Cloud servers.

    Self-hosted LLMs

    Self-hosted LLMs run locally on on-premises servers, offering the highest level of data control and privacy. Models can be deployed using open-source libraries like ollama, ensuring all data remains within the company's environment. This setup, combined with self-hosted vector databases like Weaviate, minimizes the risk of data leakage. Challenges include high upfront costs for hardware and ongoing maintenance expenses. Achieving the accuracy of models like GPT-4 can also be challenging with local deployments.

    Choosing the right model

    Speed Considerations

    In TelcoGPT, user interaction speed is critical for maintaining a seamless and engaging user experience. The retrieval task which involves querying and retrieving data from vector and SQL databases is quick. Generation tasks, which produce coherent responses based on retrieved data, are more computationally intensive. Different models offer varying trade-offs between speed and capability; for example, GPT-3.5 Turbo offers quick inference times, whileGPT-4 excels in overall capability. This presents an accuracy versus speed trade-off, as not all models operate with the same efficiency. The use of streaming techniques is an effective strategy to mitigate perceived delays during slower generation processes, ensuring a seamless user experience.

    Prompt Engineering and Model Selection

    Prompt engineering is crucial as different LLM models require tailored prompts for better performance. Our approach uses smaller models like Mistral 7B or GPT-3.5 for faster inference times for simpler tasks,while larger models are reserved for more complex processes. This strategy balances accuracy and speed, essential for applications like TelcoGPT.

    The Power of Collaboration

    Our team’s interdisciplinary approach, combining data scientists,subject matter experts, and software developers, proved both challenging and immensely rewarding. Despite initial language barriers, this diversity enriched our system's design beyond what isolated efforts could achieve. Collaboration fostered innovations like real-time document updates and adaptive user interfaces, directly benefiting from varied perspectives.

    Conclusion

    In this post, we focused on our journey and the outcomes we achieved. Although it was tempting to get distracted by the constant news sharing the latest and greatest approaches, we found using simple approaches allowed for faster iterations, better modularization, and detailed solution evaluations. For instance, extensive testing revealed that reranking methods did not enhance our outcomes. Additionally, minimizing components aided debugging and operational speed.

    Our greatest challenge lay in balancing speed and accuracy. We prioritized rapid response times—setting a benchmark of 5 seconds for accurate answers—and subsequently refined result accuracy. This approach steered us away from using agents, prioritizing direct user engagement.

    We hope you found this article useful. Please share your experiences and insights with us.