In her blog, Luciana demonstrated a large PCAP file analysis with Gigasheet. The PCAP (an abbreviation of packet capture) file was originally from the Stratosphere Lab and contained network traffic associated with malware. In this blog, we’ll employ the Gigasheet API to process the same large PCAP data and integrate it with the VirusTotal app and an email alert utility to develop a SOAR (Security Orchestration, Automation and Response) model. We are aiming to kill two birds with one stone here by showing how convenient the Gigasheet API is for building programmatic interactions while putting together a fully automated workflow in a SOAR platform.
Try it: View PCAP Files now ->
We need to load the data, so first things first, let’s convert the PCAP file to CSV and upload into Gigasheet. If you're not already familiar with Gigasheet, it's a free online CSV viewer that has several easy to use tools for data analysis, such as grouping and filtering. You can start with creating a free Gigasheet account here, highly recommended!
Once the PCAP file is uploaded, Gigasheet extracts standard fields including the Packet Number, Timestamp, Delta, Source IP Address, Destination IP Address, Protocol, Packet Length, Server name, Cnamestring, and Information. Please visit the tutorial series offered by Palo Alto Networks Unit 42 if you need some tips on how to gather packet captures using Wireshark.
Gigasheet enables you to work with huge datasets without crashing or hitting row limits. It takes a few seconds to upload the PCAP file that includes over 257,000 rows of data. The figure below displays a screenshot of the PCAP content of the first few rows only but no worries Gigasheet allows sharing files publicly, so feel free to explore the rest of the PCAP data right here.
Screenshot of the PCAP data
A view of all data grouped with respect to the Source IP and Protocol fields is shown in the figure below. Grouping of the PCAP data allows us to quickly identify that the majority of the traffic took place between two IP addresses 10.0.2.107 and 10.0.0.2, which is a loopback address allowing the host machine to receive and send its own data packets. The NBNS is a name resolution protocol similar to DNS and is used mainly by legacy systems.
Grouping of the PCAP data with respect to the Source IP and Protocol
Now, it is time for the fun stuff, the Gigasheet API! Gigasheet Enterprise users can leverage APIs to scale beyond spreadsheets and automate workflows. Today, we’ll only focus on the data processing API endpoints. Everything you can do in the Gigasheet UI, you can do in the Gigasheet data processing API. This includes various functions such as sheet/column statistics, arithmetic operations, split a column, cross file lookup, etc. The full list of Gigasheet’s data processing API endpoints are summarized in the figure below. In this blog, we’ll employ the “column statistics’ and ‘split column’ API endpoints. Please note the Gigasheet API is in beta testing as of today and the development team is rolling it out across the entire platform shortly.
Gigasheet data processing APIs
We can start with the “Column Statistics'' API endpoint to monitor the content inside the Information field. The figure below exposes some interesting statistics about the malicious DNS traffic. Here, we observe multiple occurrences (153 times) of randomly generated seven-letter subdomains for the same domain in the "xxxxxxx.info" format.
The Gigasheet data processing API for column statistics
Let’s now split the Information field using the “split column” API to make all pieces ready for the next step of the VirusTotal integration in the SOAR platform. As illustrated in the figure below, we need to provide the following parameters:
The Gigasheet data processing API for splitting a column
Shuffle is an open source SOAR platform with a user-friendly interface. It allows users to develop workflows and pass values between nodes/apps programmatically. As illustrated in the figure below, we are first processing the PCAP data to identify a list of suspicious subdomains by implementing the aforementioned Gigasheet API calls.
This start node essentially connects the PCAP data to the Shuffle SOAR platform through a CURL command to make network requests. In our workflow, the second step is the filter list action that filters out suspicious subdomains containing “.info” extension from the rest of the data and feeds it into the VirusTotal app for domain report analyzes. Basically, we are adding the “get a domain report” action to our workflow.
In the third step, we are using the ‘last analysis stats’ attribute in VirusTotal’s domain report output for malicious activity decision making. Here, a subdomain is flagged if there is at least one security vendor that reported it as malicious. As the final step, we are creating an email alert to inform the security team about the malicious activity. The email content includes malicious domain, number of security vendors flagged as malicious, and the category of the domain content based on Comodo Valkyrie Verdict.
We are concluding the SOAR model with the email alert step in this demo but we can further implement an SMS alerting step and/or integrate the workflow further with a security incident response platform such as TheHive.
Gigasheet and VirusTotal integration in Shuffle SOAR platform
Well congrats, we did a great job in creating a fully automated SOAR model! We could have optimized the workflow using Gigasheet's "filter data" APIs to replace the Step 2 (filtering inside the SOAR platform) but I wanted to stay focused on Gigasheet's data processing APIs at this time.
The Gigasheet API provides great flexibility to parse domains, email addresses, IP addresses, and file hashes from PCAPs, JSON dumps, and forensic artifacts with hundreds of millions of rows or billions of data points. The SOAR model developed in this blog demonstrates a great example for a seamless connection of the Gigasheet API with other SaaS platforms.