Cybersecurity: Big Opportunities for Big Data
What has Big Data been up to?
Big Data has been a major topic in the IT realm for years. The terms Big Data and Big Data analytics are generally used to describe the potential for new insights into our environment by making sense of the rapidly increasing amount of information produced by the growing number of connected devices. But what is the bigger picture in the potential of Big Data in cybersecurity?
Today, many industries are taking advantage of Big Data analytics to cope with security challenges, amongst them for example the banking sector to detect fraudulent transactions.In technical terms, this requires the processing of large and unstructured data sets which could previously not be handled due to technical, financial or time constraints.
The underlying challenge behind Big Data
When in 2005 the first Multicore CPUs for the consumer market where introduced, everyone thought this would solve all performance problems and Moore’s Law promised a bright future. On a small scale this was partially true and multi-threaded programming showed very promising results. However, in practical trials a problem already known from parallel computing, related to the use of multiple cores came to the fore: scalability.
Resources and processing power needed to handle the occurring race conditions and deadlock situations when using multiple cores was greater than the benefit of adding another core. Running a program on a server with 10 cores did not result in the program running 10 times faster. Scaling just did not follow that logic in practice.
Even today the problem remains that only very few applications can run fully parallelized and utilize all available cores on a CPU. Data analytics were facing a similar challenge when volume and velocity of the available data began to explode. The stage was set for Big Data instruments.
Saving the Day: Hadoop
One of the most popular tools for Big Data analytics is Hadoop by the Apache Software Foundation. Hadoop is an open source framework to store and process large data sets. It solves the scalability problem by providing a programming model called MapReduce designed for large scale distributed computing. Programs written with the MapReduce programming model are automatically parallelized and can scale nearly linearly even with thousands of cores within a cluster network. In addition to that, Hadoop provides a distributed file system called HDFS (Hadoop Distributed File System) to store and manipulate large data sets.
Cybersecurity Projects Using Hadoop & MapReduce
Equipped with these new capabilities, security researchers identified various opportunities to apply new methods and technologies. The following list gives an overview of noteworthy cybersecurity projects using Big Data technology (especially Hadoop and MapReduce).
– DOFUR: DDos forensics using MapReduce
A Distributed Denial of Service (DDoS) attack aims at making a system unavailable by flooding the target with a large number of requests. During these attacks, the volume of the produced log files grows rapidly. A forensic investigator will take a long time to analyze these files to find the source of the attack for containment and to reestablish system availability.
The DOFUR-Project proposes a technique using Hadoop and MapReduce to detect packets that belong to a DDoS attack, which would otherwise require a long time to be achieved.
– BotCloud: Detecting botnets using MapReduce
Modern Botnets no longer depend on a centralized architecture (communication between attacker and botnet computers going through a centralized server) to a more decentralized architecture using peer-to-peer (P2P) networks without the need for a centralized server to operate.
In order to deal with this evolution, cybersecurity analysis has to be moved from the edges of the network to its nodes , such as the Internet Service Provider (ISP). At such a hub, there is significantly more traffic that can be analyzed to effectively detect P2P Botnets.
The BotCloud project leverages this insight by using Hadoop and MapReduce to correlate log files from many different ISPs to detect patterns and trace the communication path of a Botnet back to its participants.
– APT detection frameworks
Ever since the first appearance of reports about “Stuxnet”, the malware that was created to sabotage the uranium enrichment infrastructure in Iran, the topic of Advanced Persistent Threats (APT) has become one of the most intensely debated in cybersecurity.
Traditional network defense systems like firewalls, IDS/IPS or SIEM solutions are rather ineffective in detecting APTsdue to the latter’s high level of sophistication.
These traditional network defenses operate in relativelyshort time frame for detecting an attack. In contrast to that, APTs try to remain undetected by using a “low and slow” approach when infiltrating and attacking a system.
There are several research projects that aim at establishing a framework for the detection of APTs with the help of Hadoop. Researchers propose to gather samples from as many sources as possible with the final objective to being able to detect and react to any kind of security-related incident.
Considering that APT attacks have an execution time of months or even years, without the use of Big Data technologies, remedies would require a significant investment or might even be impossible to realize.
A number of commercial products are available (e.g. Fortscale, LogRythm, Blue Coat) that pursue a similar approach by implementing Big Data security analytics to detect APTs. These products offer such features as:
- Real-time analytics of access to sensitive data
- Network anomaly detection
- Identifying compromised hosts
- Detection of malicious employee behavior
Big (Data) Opportunities in Cybersecurity?
The above-mentioned examples provide a first idea of the potential of Big Data analytics in cybersecurity services and should help professionals from all sectors to gain a better understanding of the challenges involved. As indicated, the use of Big Data security analytics offers great possibilities to create new services or enhance existing ones – multiplying the security and value created for clients.
 Big Data Analytics for Security Intelligence – https://downloads.cloudsecurityalliance.org/initiatives/bdwg/Big_Data_Analytics_for_Security_Intelligence.pdf
 Towards a Framework to Detect Multi-stage Advanced Persistent Threats Attacks – http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6830935
 Security Analytics: Big Data Analytics for cybersecurity: A review of trends, techniques and tools – http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6725337
 BotCloud: Detecting botnets using MapReduce – http://i eeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6123125
 DOFUR: DDoS Forensics Using MapReduce – http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=613713