March 23, 2015

Big Data and Financial Services – Volume Variety Velocity Veracity

By CloudTweaks

Big Data: Volume Variety Velocity Veracity Cloud Banking Insights Series focuses on big data in the financial services industry and whether it is a security threat or actually a massive opportunity. How does big data fit into an overall cloud strategy? Most FI’s have a positive mind-set towards cloud IT consumption as it not only […]

Big Data: Volume Variety Velocity Veracity

Cloud Banking Insights Series focuses on big data in the financial services industry and whether it is a security threat or actually a massive opportunity. How does big data fit into an overall cloud strategy? Most FI’s have a positive mind-set towards cloud IT consumption as it not only enables saving across IT investment but frees up precious resource and surplus so  FI’s  can invest in customer centric activity. Financial institutions have long, and in many occasions life long relationships with their customers, the convergence of cloud and data analytics helps FI’s to work towards the common goal of relationship building and increasing product uptake through tailoring and appropriately segmenting their approach. There is a great deal of “noise” around this topic both in terms of its real meaning but also how it can create disruption in relation to security & privacy. Throughout this article we will share some insight about both of those to help reduce the “noise levels” from the industry.

What is Big Data?

Big data has been around for a number of years now and everyone has an opinion on it. It is not unusual to see two individuals having a conversation about big data where neither person is talking about the same thing. One is saying that it is all about the volume and the other saying it is about variety. Some even suggest that small and fast data shouldn’t be classified under big data. This is just like two individuals discussing Ruby and Java. Both of them are programming languages but neither has anything to do with the other – only when you apply some context to  the conversation does it become productive. So let us specify some context on the topic and then see how it applies to financial services.

Big Data: Volume Variety Velocity Veracity

Big data is really all about 4 things, known as the 4 V’s of Big Data:

  • Volume – relates to the size of the data. New ways to process huge amounts of data by cutting it into small “bite sized” pieces and processing the algorithms leveraging massive parallelism.
  • Variety – relates to the structure of the data. This also relates to new ways of ingesting, handling and processing data which can be originated from multiple structure types from relational to non-relational origins including social platforms such as Twitter, Facebook and other service type data aggregators.
  • Veracity – Data Veracity relates to the accuracy of Big Data. Focus is on the the uncertainty of imprecise and inaccurate data.
  • Velocity – is related to the speed in which the data is ingested or processed. This is also important because big data brings different ways to treat data depending on the ingestion or processing speed required. So this can be for example data coming from a sensor, which ingests small status but very quickly or also be related to data required to be processed in “near real-time” due to the criticality of the data like in fraud detection and compliance mechanisms.

(Image Source: IBM)

Now that we understand the 4 V’s you might be asking why this is important and what generated these massive amounts of data? This is a great question and the answer is quite simple.

Data Explosion

There has been an increase in data of around 10x every five years and about 85% of new data types have been introduced; from clickstream, time-series and  columns to spatial data types. The potential is skyhigh when these types of data explosion are placed in conjunction with the consumerization of IT or the huge amount of people connected through social channels.

The potential is huge but the “noise” levels right now are so high that it is difficult to achieve real value. This happens because, as is the case with many other innovations, companies already have some kind of Big Data solution which in their opinion will revolutionize the way they work, but when you dig a bit deeper do they really know how to capitalise on this? Other misconceptions are around the thinking that Big Data is all about Hadoop and can achieve if you leverage it. This is definitely not the case. Big Data is an approach, a strategy to generate better Business Insights and Intelligence for the business. The technologies and products we use will always depend on what the enterprise’s goals really are. If we are talking about storing the data (Yes Big Data can also be about how you store this data) we might focus on MongoDB, Cassandra, RavenDB and others, but if we’re talking about the processing we might be talking about Hadoop, Spark, Kinesis, Streaming Analytics and others. Every choice depends on the context and the Hadoop ecosystem is not the only one which will solve everything. There is no such thing as “one size fits all” or “silver bullets” as this type of thinking will result in disastrous consequences.

When looking at the market, It would be fantastic to see a common and unified approach to products that enable Big Data analytics, but unfortunately we are still far from that. This is very similar to what is happening with the Cloud. Just because  you have it doesn’t make you an expert in it and by no means does this guarantee that you are optimising its potential, let’s not delude ourselves, we are all still trying to figure out the real power of this for our businesses and it will depend on how we leverage it. To succeed in this transformation we will first need to be humble enough to stop and learn without preconceptions and understand that this is really another tool in our toolbelt to help our business succeed. Basically we need to learn that, most importantly, we need to work to get the right data, at the right time to the right person. Only then we will make this successful.

Level-setting out Big Data Conversations

To better reduce the noise levels, which are associated with anything new, we need to start using some level-setting questions. These are some which should help you become very successful achieving your commercial and customer engagement objectives. The questions are:

  • What is the data frequency of the data being processed? Is it on-demand, continuous feed or real-time?
  • What is the analysis type which should be used? Is this batch or streaming/real-time?
  • What’s the processing methodology we should use? Predictive Analytics, Analytical (like Social Network Analysis or Pre-emptive Analytics?
  • What is the structure of data we are receiving? Is it unstructured, semi-structured or highly structured?
  • What type of data sources do we need to work with? Web & Social, Machine generated, human generated, biometric, transactional system or other?
  • What is the volume of data received? Are these massive chunks or small and fast chunks?
  • What will be the consumers of this data? Will it be human, business process, other enterprise applications or other repositories?
  • Are we talking about how to store this data or how to process it?

After answering these questions you should have a clear notion of how much “noise” you have been dealing with and why sometimes you have the sense that you are having conversations around big data with others but it seems that you both are speaking different languages. In addition, this context should enable a real productive discussion because everyone should be singing from the same hymn sheet.

Now that we understand the context we also need to understand how the Big Data approach works and why. This approach is based in 4 phases:

  1. Aggregate – This is where we focus on understanding the different data sources and how the data is getting ingested because the goal is to aggregate all the data and to have a unified view around all the different sources independently of where they came from.
  2. Enrich – This is the phase where we refine the data, transform it and perform the data cleansing to make sure we have all the data we need and the most accurate one.
  3. Analyse – In this phase is where we implement our algorithms being analytical, predictive or pre-emptive.
  4. Visualize/Expose – Finally this is where we focus on creating a concrete visualization over all our data and the results of our processes to both people and other solutions.

Big Data in Financial Services

Financial Services has everything to benefit from Big Data when the strategy in place is accurate. If we consider the retail banking companies, this will be a huge help since with more data available it will be much easier to define what type of products will be more suitable to a specific customer because when all the social data is merged with the internal financial data of the customer, this forms an near accurate picture around the customer’s preferences, likelihood of investment, personality and most importantly any associated risk. Application in Big Data for the financial services industry are just touching the surface, a lot more will appear in the meantime i.e the ability for FI’s to create a massive database with social and professional information from customers where they can create risk analysis near real-time to understand the insurance risk you will need to pay and so on.

Fraud Detection Example

Let’s use an example to check if what we use today to perform our Fraud Detection in Credit Cards is basically a Big Data problem or not.

Let us start with the setting the context of what Big Data is in this example.

  • What is the data frequency of the data being processed? Is it on-demand, continuous feed or real-time?
    • In this case we will have two different speeds and processes running. First we have a real-time feed which is provided by the Consumer Transaction Systems and which provide us the transactions which are happening currently and which we need to answer. Secondly we will have a feed from the Anti-Money Laundry System which can be either continuous or on-demand depending on the system we are using or connecting to.
  • What is the structure of data we are receiving? Is it poly-structured or highly structured?
    • This is a poly-structured data problem which we are facing because on one side we have unstructured information provided by our card payment machines (“our connected things”) and on the other side we have massive data warehouse processes which contain historical data which enables us to rate each transaction in terms of trust level.
  • What is the analysis type which should be used? Is this batch or streaming/real-time?
    • The speed in which the decisions need to happen is near real-time because we need to make sure we increase the security of our customers by pre-empting issues before they happen, but not only this is required. To quote a transaction in terms of Fraud level we will need to have also batch processing happen, because we need the historical analysis to train our processes to avoid false positives.
    • So in reality in this example we will need to use a combination of both and only then we will be successful
  • What’s the processing methodology we should use? Predictive Analytics, Analytical (like Social Network Analysis or Pre-emptive Analytics?
    • In this example we need both Analytical processing since we need to understand what happen in the past and what the patterns we should look for to find a fraud. Also we need to use the predictive analytics processing to predict if a certain customer transaction which is happening now is really fraud or not. This is done by using the historical data to train an Artificial Intelligence process using LDA (Latent Diritchlet Allocation), Bayesian Learning Neural Networks and Peer Group Analysis.
  • What is the structure of data we are receiving? Is it unstructured, semi-structured or highly structured?
    • This data is received in multiple ways. Some times this is structured but more often this is really unstructured or semi-structured data.
  • What type of data sources do we need to work with? Web & Social, Machine generated, human generated, biometric, transactional system or other?
    • In Fraud detection we have multiple data sources which generate the data we use in the system. Some of it is generated from a device like the card machine being used for the payment; other is generated by the AML system and other from other systems and even humans like in the peer group analysis.
  • What is the volume of data received? Are these massive chunks or small and fast chunks?
    • The volume of data received is huge and normally we will have that the consumer transaction systems will have small and fast chunks of data where the AML will be more massive chunks of data.
  • What will be the consumers of this data? Will it be human, business process, other enterprise applications or other repositories?
    • Mainly the consumers of this data will be other enterprise applications and other repositories since we need to act in real-time on it and store for audit purposes and train our AI algorithms.
  • Are we talking about how to store this data or how to process it?
    • No we’re not talking only about data storing. This is mainly about data processing even though for audit and training purposes we need to store it and some of it will be time-series based, some document based and some relational.

So here is a more visual representation of what we were talking about. Now we see the real complexity and how setting the context is so important. By now you should have a fairly good idea about the context of this example, how Big Data can help and how to structure your conversation around it.

Summary

So in summary Big Data is an over hyped buzz-word which by itself doesn’t mean anything. We need to dig much deeper to understand really what it is all about and always remember to:

  1. Understand the business goals so we know what we need to achieve.
  2. Perform a level-set around what part of Big Data problem we are talking about so we can focus the discussion.
  3. Focus on understanding what to focus in each of the stages of the approach: Aggregate, Enrich, Analyse and Visualize/Expose.
  4. Provide the right data, to the right person at the right time.
  5. Focus on the visualization of the data also because without it might not be understood.

We hope this article helps you achieve your goals and better understand how Big Data can change Financial Services because with all the data we have it is only a matter of getting the right people (Data Scientists, Data Stewards, Data Engineers) working on it and we will be much better fulfilling our customer needs.

By Diaz Ayub

CloudTweaks

Established in 2009, CloudTweaks is recognized as one of the leading authorities in cloud connected technology information, resources and thought leadership services. Contact us for ways on how to contribute and support our dedicated cloud community.

AI at the Gate: Navigating the Future of Cybersecurity with SonicWall’s Bobby Cornwell

Navigating the Future of Cybersecurity In the face of the digital age’s advancements, AI’s role [...]
Read more
Katrina Thompson

Why Zombie APIs are Such an Important Vulnerability

Zombie APIs APIs have a lifecycle, the same as anything else. They are born, they [...]
Read more

Lambda Cold Starts: What They Are and How to Fix Them

What Are Lambda Cold Starts? Lambda cold starts occur when AWS Lambda has to initialize [...]
Read more
Metasploit-Penetration-Testing-Software-Pen-Testing-Security

Leading Cloud Vulnerability Scanners

Vulnerability Scanners Cyber security vulnerabilities are a constant nuisance and it certainly doesn’t help with [...]
Read more
Jeff DeVerter

Charting the Course: An Interview with Rackspace’s Jeff DeVerter on AI and Cloud Innovation

Rackspace’s Jeff DeVerter on AI & Cloud Innovation In an insightful conversation with CloudTweaks, Jeff [...]
Read more

A.I. is Not All It’s Cracked Up to Be…At Least Not Yet!

Exploring AI’s Potential: The Gap Between Aspiration and Reality Recently Samsung releases its new Galaxy [...]
Read more

SPONSORS

SPONSOR PARTNER

Explore top-tier education with exclusive savings on online courses from MIT, Oxford, and Harvard through our e-learning sponsor. Elevate your career with world-class knowledge. Start now!
© 2024 CloudTweaks. All rights reserved.