In our previous large PCAP analysis blog, we demonstrated how Gigasheet's powerful data analysis and enrichment capabilities can help security practitioners analyze large packet capture (PCAP) files quickly and painlessly, without the need to learn difficult programming languages or using complicated command-line syntax. While PCAP files are an excellent source of data for cyber incident analysis and root cause identification, their storage capacity and processing requirements could prove cost-prohibitive. According to FireEye's M-Trends 2020 Report, the median dwell time for organizations to detect a cybersecurity incident stands at thirty days. Consequently, if organizations are to get the highest return on their network traffic capturing technology investments, they would need to store at least thirty days' worth of PCAPs to help establish the complete attack timeline, from initial access to actions on objectives. The cost of storage has significantly decreased in the past couple of years with the advent of cloud computing and cloud object storage, but having sufficient storage to prevent overwriting of files is not the only challenge; we still need a processing layer powerful enough to parse and sort through multiple, large PCAP files to enable security practitioners to filter, sort, extract, and analyze relevant information.
To address this challenge, some organizations may rely on data reduction tools designed to extract useful metadata from raw PCAPs for network forensics analysis purposes. An example of such a tool is Zeek (formerly known as Bro) which ingests PCAPs and generates a series of compact, tab-delimited files containing connection logs, file content, DNS information, and other metadata useful for analysis. While the power of Zeek’s network analysis capabilities is undeniable, visualizing, filtering, interpreting, and aggregating Zeek’s output using traditional command-line front-ends, Microsoft Excel, or similar spreadsheet applications can be a difficult proposition for security practitioners, particularly when hundreds of millions of rows of data need to be analyzed.
In this blog, we will show you how to use Gigasheet to conduct forensic analysis of a large set of Zeek log files generated from a 7.9 GB PCAP, all in just a few easy reproducible steps. If you are not familiar with Zeek, I would encourage you to visit https://zeek.org to learn more about the tool. All the files used in this analysis are available to the public at https://mcfp.felk.cvut.cz/publicDatasets, courtesy of the Stratosphere Research Laboratory.
Step 1: Upload Zeek Logs to Gigasheet
The first step in the process is simply to generate the Zeek logs from a PCAP and upload them to Gigasheet. The Stratosphere Research Laboratory already provides Zeek logs of some of their PCAP files which we used in our analysis, but if you are interested in generating the logs yourself, you will first need to install Zeek on your machine and then use the following syntax to perform all the default analysis on an already captured PCAP file:
$ zeek -r -C mypcap.pcap
#‘-r’ reads the pcap file and -C ignores invalid IP checksums
The command illustrated above will generate various ‘.log’ files which you can upload directly into Gigasheet. The connection log (conn.log) is one of the most important files that Zeek generates as it tracks the state and state history of all the connections within a PCAP, including connections transmitted using stateful (TCP) and stateless protocols (e.g., UDP, ICMP)
In our demonstration, the connection log file (conn.log) is 7.9 GB, containing 73.5 million rows.
Step 2: Declutter the Screen
We will start our network forensics analysis by focusing on the conn.log file. To make it easier to visualize all the data on a single screen, we can declutter the screen by hiding irrelevant data or columns that have no data at all from the user’s view. You can hide as many columns as you want by simply unchecking the column name in the Column management panel.
Step 3: Time Conversion and Standardization
Next, we need to use a standard, human-readable time format and sort the data in chronological order so that we can establish an accurate attack timeline. Zeek prepends a Unix format timestamp to each log entry, which is recorded in the ‘ts’ field (e.g., 1547065514.852869). Gigasheet can automatically convert Unix time format to Coordinated Universal Time (UTC), U.S. (US), and European Union (EU) Time zones with literally two clicks. In our demonstration, we converted the ‘ts’ column in the Zeek conn.log file to UTC:
Step 4: Discover Anomalous or Potentially Malicious Data Patterns
The next step in the process is to try to identify anomalous or potentially malicious traffic patterns. There are many paths you can take from here, but I want us to focus on the ‘history’ field or column and group the connections by their unique connection state history values. The ‘history’ field contains a string of letters (e.g., ShwAadDfr) that represents the state history of connections in which an uppercase letter indicates that the event came from the connection originator and a lowercase letter indicates that the event came from the responder. For instance, a history field with a value of ‘ShAD’ indicates that the originator sent a TCP SYN (S), the responder replied with a TCP SYN-ACK (h), and the originator followed with a TCP ACK (A) and a packet with payload (D).
There are 82 unique connection state history values. Interestingly enough, ~99% of the connections have a connection state history value of ‘S’, indicating that the originator sent an initial TCP SYN packet but did not receive a response back from the responder. This represents a traffic pattern that warrants further investigation.
Next, we can identify the number of unique source hosts involved in this traffic pattern by simply re-grouping the connections by the values in the ‘ID.ORIG_H’ or origin host column:
The results provide evidence that a single internal host (192.168.1.194) attempted to establish ~99% of the connections, which could be indicative of system malware. Grouping the connections further by the values in the SERVICE column, we reveal that the host in question attempted to establish connections to multiple external destination IP addresses over TCP ports 22 and 6667.
As per the EITF, TCP port 6667 is the default port for Internet Relay Chat (IRC) via TLS/SSL, which is often associated with an IRC botnet or a collection of systems infected with malware and controlled remotely via an IRC channel.
Next, we can try to establish the initial time of access or system compromise by ungrouping all the columns, sorting the values in the ‘UTC0’ column in ascending order, and identifying for the very first IRC connection:
The first IRC connection was established on January 9, 2019 at 20:25:14 UTC. Unfortunately, there are no connections established prior this date involving the potentially compromised host. It is possible that the network packet capture was started after the malware was installed on the host.
The Zeek logs obtained from the Stratosphere Research Laboratory included an irc.log file. The Zeek irc.log records the contents of an IRC session, including details about IRC servers, channels, and users.
By grouping the connections by the values in the ‘command’ column, we can identify all the IRC channels that the host joined:
We can also see all the usernames and nicknames that were used to connect to the different IRC channels:
In this illustration, host 192.168.1.194 used the nickname ‘a88aa6c2c’ and joined to the #biret IRC channel as user ‘user’.
Another interesting Zeek log is the dns.log file which captures details about domain name resolution activity. The dns.log file reveals that host 192.168.1.194 sent several DNS queries to its default gateway (192.168.1.1) for various undernet.org subdomains.
Undernet.org is associated with the Undernet IRC Network and the undernet.org subdomains queried by the host resolve to various IRC command and control servers.
A Google search of “IRC #biret channel” reveals that the IRCBot malware published on Pastebin in 2018 may have been used to infect the host and obtain command and control.