CloudTweaks | Data Science and Machine Learning

Data Science and Machine Learning

Security breaches have been consistently rising in the past few years. Just In 2015, companies detected 38 percent more security breaches than in the previous year, according to PwC’s Global State of Information Security Survey 2016.

Those breaches are a major expense — an average of $3.79 million per company, according to the Ponemon Institute. And Juniper Research forecasts that by 2019, data breaches globally will cost $2.1 trillion — four times more than in 2015.

Every year, global spending on cybersecurity has been growing. Gartner’s estimates put that spending at $75.4 billion this year. Despite those mounting expenses, attackers are still playing on the offense side, breaching through the defenses. Operating in a sophisticated, flourishing underground economy where they specialize and freely sell their goods and services, hackers are evolving much faster than the defenses can adapt to keep up.

Daunting Tasks

One major struggle for security teams stems from the lack of visibility and the information silos — especially as they try to sift through an increasingly large amount of IT security data sources. In most publicized breaches, the companies monitoring systems worked as they should, generating intrusion alerts. But the sheer number of daily alerts — along with the large number of false positives — makes the security analysts’ job daunting and results in ignored alerts.

Increasingly, more organizations are turning to data science to help in rapid detection of breaches and to enable more efficient response. Called user and entity behavior analytics, or UEBA (a term coined by Gartner), this science can bridge the gap that no human can — helping to prioritize and reduce the number of alerts.

UEBA works by creating behavior-based models for cloud services (both for users and entities) and then analyzing activity against those models. Because they’re based on Machine Learning, the models continuously adapt depending on user behavior, and do so without the computers needing to be overtly programmed. For example, peer groups can be created based on data sets such as company databases and directories, user profiles and common user activities. Through UEBA, statistical models are applied to detect, in the right context, anomalies beyond the login — using factors such as a user’s login volume based on historic logs.

The process can quickly identify patterns that deviate from the “normal” behavior. Through this machine learning, computers can solve complex problems that require rich-data exploration, in environments where software engineering and humans alone cannot be successful. As a result, the number of security alerts will not only be significantly reduced from millions a day to hundreds, but they can be further prioritized to a short list of top alerts.

Fintech Detection

An analogy for how UEBA can help companies is the system that credit card institutions have in place for detecting fraudulent transactions. Rather than automatically flagging every single transaction over, say $10,000, or every user with a large number of transactions — or any other static threshold — the credit card companies use behavioral analytics to spot unusual activities among the billions of daily transactions. In the same way, UEBA can sift through massive datasets to flag potential breaches.

For organizations, this ability is especially important as cloud use is exploding and security practices still a question mark. The average enterprise sees 2 billion cloud-based transactions daily, and the traditional breach-detection methods are not keeping up in this data-rich environment. UEBA can be used against factors such as service action, service action category, number of bytes uploaded/downloaded and rate/time of access of services across a service action or even an entire cloud service provider to identify behavioral anomalies. The UEBA can be customized for each enterprise based on time, rate, level of use etc.

One challenge in detecting anomalies in cloud use is the noticeable pattern variation that results from corporate policies as well as users’ personal preferences. If the actual usage is the only piece that can be observed, the user-behavior model is incomplete because it lacks unobservable components such as use variations during different times of day of different days of the week, or the evolution of a user’s preferences and patterns over time. UEBA connects the dots because it can predict future usage, leading to a more precise anomaly detection process.

Empowering Security

Security vendors are seeing the advantages of UEBA and beginning to integrate it into their products and services. Gartner forecasts that by 2017, at least 25 percent of the major DLP and SIEM providers will add UEBA capabilities either natively or through partnerships and acquisitions. That means companies will be able to bolster their security defenses and gain better visibility into their data, empowering their security teams with more robust, context-aware tools.

Although the human factor will never likely be eliminated in the fight against attackers, providing better failsafe mechanisms against human error through machine learning is a vital next step.

By Sekhar Sarukkai