Leveraging machine learning models for predictive maintenance of network services

Leveraging machine learning models

As per lightreading’s service assurance and analytics research study conducted with 100+ network operators and service providers, nearly 40% reported that issues around service assurance as a massive challenge.

Service assurance is a big focus area for all the digital service providers (DSPs) and they are looking at ways to address the common problems. Service outages and network performance issues can have a serious impact on brand reputation and any such incident could easily cost millions of dollars to the DSPs. One solution to avoid such unexpected outages is to be able to analyze data at detailed levels to track the underlying quality of network performance.

Key challenges faced by DSPs in the service assurance domain

  • Lower QoS & performance issues
  • Accumulated network faults
  • Network traffic congestion
  • Multivendor-based network device upgrades
  • High downtime of network devices
  • Exhausted capacity
  • Reactive network event handling
  • More open incidents

With the overwhelming volumes and complexity of data coming from the service assurance domain, AI/ML techniques are proving to bring much value. As shown above, reactive network event handling is one among the top three issues. Many DSPs are looking for innovative predictive analytics solution based on Machine Learning techniques for network event prediction use case.

Machine learning (ML) in network event prediction

This article explores various AI/ML classification techniques that can be used to build an effective network event prediction model. It further discusses various factors to be considered, while improving the prediction accuracy levels.

Leveraging insights from operational data

DSPs are moving towards improving service assurance by leveraging insights from operational data using AI/ML techniques. Data streams from multiple sources such as:

  • Physical and logical inventory systems
  • Fault-related data from NMS/EMS applications
  • Scheduled jobs & case ID related data from workforce management systems
  • Service transactional data from performance systems

These are fed to the AI/ML engine to process the data, visualize and predict actionable results.

Applying event prediction model on operational data

Event prediction model predicts a network event along with its severity beforehand from the performance data of the network node. Whenever a new performance metrics data arrives, the ML algorithm predicts it as either an event or non-event based on built intelligence.

 Leveraging machine learning models

Various machine learning models used in network event prediction use case

DSPs are exploring different ways in finding the optimal solution for demanding network event prediction. Below are the recommended algorithms, which can be used for prediction.

  1. Random Forest Classifier

Random forest is the most commonly used algorithm for classification problem. It accurately does the predictions with less quantity of good data and is easy to train with required inputs.

  1. Gradient Boosting Classifier

It is more advanced than the random forest classifier model. The prediction accuracy is better in most cases and is also easy to train to achieve the best results. It also has a better prediction accuracy with less data.

  1. Neural Networks

This requires a lot of tweaks by SMEs and greater memory, and high-end hardware (larger VM or GPU) for training. It is scalable for larger datasets.

  1. Advanced Machine Learning Algorithms

The following algorithms work well with high quality & scalable data

  • Perceptron (P)
  • Boltzmann Machines (BM)
  • Restricted Boltzmann Machines (RBM)
  • Learning Vector Quantization (LVQ)
  • Recurrent Neural Networks (RNN)
  • Temporal Convolutional Nets (TCNs)
  • Support Vector Machine (SVM)

Deeper tests, validations, and implementations prove Gradient Boosting Model to be more effective & provide higher levels of prediction accuracy for the network event prediction use case

Recommendations on improving prediction accuracy

DSPs are already leveraging AI/ML techniques to improve service assurance issues. However, they face major challenges in achieving higher precision levels. Following are the list of recommended techniques, which help DSPs to improve network event prediction accuracy.

  • Obtain the SME’s inputs to tag the contributing parameters by marking respective fields in the performance data during training or modeling phase
  • Consider low severity network events data before they become promoted alarm
  • Ignore the demoted events data during the calculation of misclassification error percentage
  • Proper consideration of seasonal network event data such as the data collected during a soccer match or a cricket match
  • Performance data collection interval must be in real time or near real time

A trade-off between prediction accuracy and forecast window

DSPs need to make a trade-off between higher accuracy – shorter forecast window vs. lesser accuracy – higher forecast window to take any preventive or corrective measures based on the business use case. The following graph represents the variation of prediction accuracy across different forecast time window using a negative exponential curve. Prediction accuracy is higher with a shorter prediction time window and it tends to decrease as the forecast time increases gradually.

Using these recommended AI/ML techniques, one of the leading DSPs in the US was able to predict 75% of network faults, 30 mins ahead of time

By Avaiarasi S

Gary Taylor
Hybrid Worker Risks Organizations are under pressure to secure their remote workers, but they are also worried about the potential impact on user experience. Can they have it both ways without compromise? The pandemic has ...
Rajesh Khanna
Implement Hyperautomation to Scale Automation Programs by 3X Most Digital Service Providers (DSPs) struggle to accelerate their path to Hyperautomation due to the complex processes with legacy systems and applications. Although Robotic Process Automation (RPA) plays a ...
Gilad David Maayan
Cloud Security Posture Management Cloud Security Posture Management (CSPM) enables you to secure cloud data and resources. You can integrate CSPM into your development process, to ensure continuous visibility. CSPM is particularly beneficial for DevOps ...
Alex Vakulov
Ransomware Database Targeting The scourge of ransomware is undoubtedly the most severe cyber security concern for home users and organizations these days. It revolves around taking important data hostage and demanding money, usually hard-to-trace cryptocurrency ...
Derrek Schutman
Implementing Digital Capabilities Successfully Building robust digital capabilities can deliver huge benefits to Digital Service Providers (DSPs). A recent TMForum survey shows that building digital capabilities (including digitization of customer experience and operations), is the ...

SECURITY TRAINING

  • Isc2

    ISC2

    (ISC)² provides IT training, certifications, and exams that run online, on your premises, or in classrooms. Self-study resources are available. You can also train groups of 10 or more of your employees. If you want a job in cybersecurity, this is the route to take.

  • App Academy

    App Academy

    Immersive software engineering programs. No experience required. Pay $0 until you're hired. Join an online info session to learn more

  • Cybrary

    Cybrary

    CYBRARY Open source Cyber Security learning. Free for everyone, forever. The world's largest cyber security community. Cybrary provides free IT training and paid IT certificates. Courses for beginners, intermediates, and advanced users are available.

  • Plural Site

    Pluralsite

    Pluralsight provides online courses on popular programming languages and developer tools. Other courses cover fields such as IT security best practices, server infrastructure, and virtualization.