Cybersecurity: Big Opportunities for Big Data

What has Big Data been up to?

Big Data has been a major topic in the IT realm for years. The terms Big Data and Big Data analytics are generally used to describe the potential for new insights into our environment by making sense of the rapidly increasing amount of information produced by the growing number of connected devices. But what is the bigger picture in the potential of Big Data in cybersecurity?

Today, many industries are taking advantage of Big Data analytics to cope with security challenges, amongst them for example the banking sector to detect fraudulent transactions.In technical terms, this requires the processing of large and unstructured data sets which could previously not be handled due to technical, financial or time constraints.

The underlying challenge behind Big Data

When in 2005 the first Multicore CPUs for the consumer market where introduced, everyone thought this would solve all performance problems and Moore’s Law promised a bright future. On a small scale this was partially true and multi-threaded programming showed very promising results. However, in practical trials a problem already known from parallel computing, related to the use of multiple cores came to the fore: scalability.

Resources and processing power needed to handle the occurring race conditions and deadlock situations when using multiple cores was greater than the benefit of adding another core. Running a program on a server with 10 cores did not result in the program running 10 times faster. Scaling just did not follow that logic in practice.

Even today the problem remains that only very few applications can run fully parallelized and utilize all available cores on a CPU. Data analytics were facing a similar  challenge when volume and velocity of the available  data began to explode. The stage was set for Big Data  instruments.

Saving the Day: Hadoop

One of the most popular tools for Big Data analytics is Hadoop by the Apache Software Foundation. Hadoop is an open source framework to store and process large data sets. It solves the scalability problem by providing a programming model called MapReduce designed for large scale distributed computing. Programs written with the MapReduce programming model are automatically parallelized and can scale nearly linearly even with thousands of cores within a cluster network. In addition to that, Hadoop provides a distributed file system called HDFS (Hadoop Distributed File System) to store and manipulate large data sets.

Cybersecurity Projects Using Hadoop & MapReduce

Equipped with these new capabilities, security researchers identified various opportunities to apply new methods and technologies. The following list gives an overview of noteworthy cybersecurity projects using Big Data technology (especially Hadoop and MapReduce).

– DOFUR: DDos forensics using MapReduce

A Distributed Denial of Service (DDoS) attack aims at making a system unavailable by flooding the target with a large number of requests. During these attacks, the volume of the produced log files grows rapidly. A forensic investigator will take a long time to analyze these files to find the source of the attack for containment and to reestablish system availability.

The DOFUR-Project proposes a technique using Hadoop and MapReduce to   detect packets that belong to a DDoS attack, which would otherwise require a long time to be achieved.

– BotCloud: Detecting botnets using MapReduce

Modern Botnets no longer depend on a centralized architecture (communication between attacker and botnet computers going through a centralized server) to a more decentralized architecture using peer-to-peer (P2P) networks without the need for a centralized server to operate.

In order to deal with this evolution, cybersecurity analysis has to be moved from the edges of the network to its nodes , such as the Internet Service Provider (ISP). At such a hub, there is significantly more traffic that can be analyzed to effectively detect P2P Botnets.

The BotCloud project leverages this insight by using Hadoop and MapReduce to correlate log files from many different ISPs to detect patterns and trace the communication path of a Botnet back to its participants.

– APT detection frameworks

Ever since the first appearance of reports about “Stuxnet”, the malware that was created to sabotage the uranium enrichment infrastructure in Iran, the topic of Advanced Persistent Threats (APT) has become one of the most intensely debated in cybersecurity.

Traditional network defense systems like firewalls, IDS/IPS or SIEM solutions are rather ineffective in detecting APTsdue to the latter’s high level of sophistication.

These traditional network defenses operate in relativelyshort time frame for detecting an attack. In contrast to that, APTs try to remain undetected by using a “low and slow” approach when infiltrating and attacking a system.

There are several research projects that aim at establishing a framework for the detection of APTs with the help of Hadoop. Researchers propose to gather samples from as many sources as possible  with the final objective to being able to detect and react to any kind of security-related incident.

Considering that APT attacks have an execution time of months or even years, without the use of Big Data technologies, remedies would require a significant investment or might even be impossible to realize.

A number of commercial products are available (e.g. Fortscale, LogRythm, Blue Coat) that pursue a similar approach by implementing Big Data security analytics to detect APTs. These products offer such features as:

  • Real-time analytics of access to sensitive data
  • Network anomaly detection
  • Identifying compromised hosts
  • Detection of malicious employee behavior
Big (Data) Opportunities in Cybersecurity?

The above-mentioned examples provide a first idea of the potential of Big Data analytics in cybersecurity services and should help professionals from all sectors to gain a better understanding of the challenges involved. As indicated, the use of Big Data security analytics offers great possibilities to create new services or enhance existing ones – multiplying the security and value created for clients.

New Picture (5)


[1] Big Data Analytics for Security Intelligence –

[2] Towards a Framework to Detect Multi-stage Advanced Persistent Threats Attacks –

[3] Security Analytics: Big Data Analytics for cybersecurity: A review of trends, techniques and tools –

[4] BotCloud: Detecting botnets using MapReduce – http://i

[5] DOFUR: DDoS Forensics Using MapReduce –


Stefan Benischek

Stefan Benischek is a Cybersecurity Consultant based in Austria. His expertise is in mobile security, secure software development and conducting security assessments.

One Response to “Cybersecurity: Big Opportunities for Big Data”
  1. Angel Healy sagt:

    It seems that Hadoop is the commonly used these days when dealing with Big Data. I heard Apache released a new software, something that will complement its previous system. Anyway, it’s good to see the list of cybersecurity steps that were previouslyd done, this way it would be easy to spot if there are any more loopholes to be solved to secure the system. And you’re right, Stefan. If consumers have a better understanding of the challenges, they would know how to react and what to do when faced by these situations.

Leave A Comment

Copyright 2017 21st CENTURY IT · RSS Feed · Anmelden