Buckle up, everyone! We are about to dig into the must-know statistics and trends of worldwide cyber attacks from 2004 through 2022. Together, we’ll explore a comprehensive collection of large-scale cyber attacks openly accessible at Wikipedia.
Let’s take this opportunity to practice on Gigasheet's easy-to-use data statistics and graphing features. If you're not already familiar with Gigasheet, it's a free online CSV viewer that has several easy to use tools for data analysis, such as grouping and filtering. You can start with creating a free Gigasheet account here, highly recommended!
Once you signed up and logged in, you’ll see numerous File Upload and Data Connectors options on the dashboard. Here, you can upload data either directly from your local machine or remotely from any cloud storage such as AWS and GCP buckets in various file formats including CSV, LOG, JSON, TSV, XLSX, and ZIP/GZIP. You can then massage the data using the data filter and data type change features.
The processed database reveals more than 300 cyber attacks affecting a total of more than 13 billion Personally Identifiable Information (PII) and Protected Health Information (PHI) records. Interestingly, some of us have been victimized more than once! Gigasheet conveniently displays the descriptive statistics (e.g., mean, average, range, count, etc.) on the fly.
On the fly descriptive statistics for quick analysis
In this blog we are going to explore cyber attacks statistics 2022 by industry, frequency and impact. Let's get started!
A couple of things to remember here. First of all, the dataset contains mostly the cyber crimes in North America due to the strict data breach disclosure regulations. Also, this content is generated by English speaking Wikipedia communities causing the trends to be relatively biased towards regional resources.
Having said that, let's start with the elephant in the room. Out of the significant cyber attacks between 2004 and 2022, which industries do you think have been the most vulnerable to cyber attacks?
The brief answer for the majority of cyber attacks including the ones performed by script kiddies is that every industry is at risk against malicious acts of cyber criminals. Unlike Advanced Persistent Threat (APT) groups, most unsophisticated threat actors don’t have “brand loyalty”. They just exploit the vulnerabilities in the systems and attack wherever they may be located regardless of the industry and/or organization.
However, the histogram of the overall data grouped with respect to the industry type suggests healthcare, government (at all levels), and social media as the top 3 biggest target industries for cyber attacks. These three industries correspond to roughly 32% of all major cyber attacks since 2004. Here, I could have used a pie chart for visualization, the settings tab is user-friendly and it provides a wide range of chart types and formatting options.
Industry breakdown of major cyber attacks
We briefly reviewed the top targeted industries, let’s now look at the frequencies of cyber attacks. It has been documented that cyber attacks are expected to double every 2 to 3 years and cost trillions of dollars per year.
I regrouped the database with the year column to illustrate the historical trends of the impactful cyber attacks. The new histogram displays a non-uniform trend for the number of sophisticated cyber attacks while the total number of cyber attacks including minor incidents are on the rise following an exponential growth.
With a quick glance, we can observe that nearly 65% of all major cyber attacks took place during the last ten years. I don’t know how to interpret the 2017 data point, but the data points for the last two years reveal a crystal clear correlation with the era of the COVID-19 pandemic. It can be speculated that COVID-19 had a major impact on cybersecurity practices. This is beyond the scope of this blog but you are more than welcome to review the cybersecurity resources for COVID-19 published by the Cybersecurity and Infrastructure Security Agency (CISA).
Yearly breakdown of significant cyber attacks
So far, we have focused on the industry and yearly breakdowns of major cyber attacks but one thing for sure is that these incidents do not necessarily have similar consequences and impacts. In other words, some of the incidents were “devastating” while others were “more devastating”.
Hmmm, why don’t we then categorize the cyber attacks with respect to the Records Affected to assess the impacts of incidents in a more systematic fashion? The total cases histogram to the rescue!
Breakdown of the records affected following major cyber attacks
Let’s investigate the distinct peaks in 2019 and 2013. In 2019, Facebook experienced a series of major data leaks. A facebook app developer exposed 540 million users’ nonfinancial PII to the public. Facebook deleted the data and updated company policies to prohibit apps from storing data in public databases. In another incident, 309 million Facebook users’ nonfinancial PII was exposed by malicious actors who either exploited Facebook’s API or used web-scraping tools.
In 2013, Yahoo was on the news headlines. Sensitive and nonfinancial PII of almost all Yahoo! users (around 3 billion accounts), including names, email addresses, telephone numbers, dates of birth, hashed passwords, and, in some cases, encrypted and unencrypted security questions and answers, were compromised.
Roughly speaking, 3.7 billion and 3.4 billion PII and PHI data leaked in 2019 and 2013, respectively. These giant spikes were also due to the ‘domino effect’ of the so-called password reuse attack.
Basically, a major data breach enables malicious actors to hack victims’ other accounts, if the same credentials are reused. Raise your hand if you reuse your passwords!
In this blog, we first obtained an extensive database covering major cyber attacks from 2004 to 2022. The dataset has been compiled by Wikipedia communities from reliable resources.
Next, we cleaned up the data using Gigasheet’s data processing features. Finally, we put together histograms illustrating the industry and yearly distributions as well as the historical breakdown of compromised PII and PHI data following impactful cyber attacks.
I hope you agree with me that Gigasheet made it very easy to generate descriptive statistics and histograms and quantitatively describe the basic features of the data. I highly recommend you give Gigasheet a try today, please remember it is Free Forever!