As more of the world goes web, so to speak, web services and applications have become a critical part of pretty much every aspect of modern life. In light of that we have a need to secure our web infrastructure more and more. One of the most critical aspects of securing web infrastructure is detecting web attacks. The sheer quantity of noise from bots and other scanners can make this task exceedingly difficult though.
In this post we’re going to look at using Gigasheet to identify some common web attacks; directory busting and injection based attacks. Our target system is a simple Wordpress site, from which we’ve extracted our Apache logs. The logs were then saved as a csv file and cleaned up just a little bit by deleting some empty columns.
A common way of detecting web attacks can be looking for outliers in traffic volume. We can group our different IP addresses which gives us a count of the total number of hits from different IP addresses. Of course, as this is all occurring within my home lab, these are private IP addresses but the principle is the same. We can see hundreds of hits from two addresses, enough to perk our interest.
Let’s focus on the first and biggest of the two; how do we determine if this traffic is malicious or benign? We can group on the response code and we immediately see another outlier; the vast majority of the response codes are 404s. When we drill down here we can see the page that was requested and we see a directory busting attack has taken place.
A directory busting attack is relatively easy to detect by looking at the pages requested, but how do we evaluate data being sent to our web application to determine what is and isn’t malicious? If we just focused on the response codes from the server, this wouldn’t give us much. When we look at our second highest number of hits by IP address, the majority of these are 200s. One thing we can do is split this column to look at what data was requested. We do this by using the actual column for the GET request, and using the split column function. We use the “?” character as our delimiter to split off the end of the URI and the parameters being passed. We end up with two new columns with the URI and the parameters.
As we start going through the parameters, we see many of them are seemingly benign. These we can filter out by only including anything in that starts with “s=”. Now as we scroll through the data we can see clearly malicious input within the search field of the site.
This is, in essence, the core of threat hunting for malicious web activity. Any web server will constantly be bombarded by traffic from bots and web scrapers. Detecting the outliers and determining what traffic represents a legitimate threat and what traffic is just noise is critical to threat hunting and incident response.
Let’s think about both of these examples. If we suspect that our website has been breached and we can determine that one IP address was looking for directories they shouldn’t be accessing, while the second IP address in question was looking for cross site scripting or SQL injection vulnerabilities, then we now have two prime suspects. We can start to comb through other logs for those specific IP addresses to determine what else they’ve done.
Gigasheet isn't used for live monitoring to detect attacks in real time, but not every business has a SIEM solution to do live threat hunting. For incident response after the fact though, these techniques can be used to determine where an attack came from and give you clues to work with for further investigation.