Data Analysis: Facts About Soccer

Every four years, a flurry of activity and joyful fanaticism descend upon a designated host nation to witness what is arguably the biggest sporting event in the world. Yes, we’re talking about the FIFA World Cup! And with it in full effect, what better opportunity than such a world-renowned competition to showcase the power of Gigasheet!

As with any mass spectator sport, numbers rule; that is, data and more data. From players and match analyses, to the number of committed fans populating entire stadiums brandishing unrelenting allegiance to their favorite team, football, a.k.a. soccer, naturally encourages gaggles of statistical inference and forecasting.

However, for all its gilded touches, football is also beset by the sort of predictability that comes with monetization. For example, in a recently-published paper by The Royal Society, the four major European leagues (England, Germany, Portugal and Spain) exhibited a clear evolution towards inequality based on financial indicators—a gradual process that the study describes as the “gentrification” of football.

But, to the fun part: The paper seemingly highlights the importance of home-court advantage, and quantifies match scores for several leagues across the world (since 2000 through 2016) according to it. Let’s use Gigasheet to help us get some clarity and confirm whether home teams can still exhibit a certain advantage despite any other biases.

Exploring the Football (Soccer) Data With Gigasheet

Loading and parsing datasets onto Gigasheet takes mere seconds, as the tool is able to handle billions of records at a time, so 200K rows is basically a walk in the park. Soon enough, our table looks like this:

Soccer Data loaded into Gigasheet

Explore This In Gigasheet Here! ↗️No Sign Up Required.

Data in this table are laid out as follows:

  • Season (SEA).
  • League (LGE).
  • Match date (DATE).
  • Home team (HT).
  • Away team (AT).
  • Home team score (HS).
  • Away team score (AS).
  • Goals difference (GD = HS - AS).
  • A single variable indicating whether the final score ended in a win, draw, or loss (WDL) for the home team.

For instance, in the table above we can observe a match that took place on December 31st, 2016, between Chelsea and Stoke City—of the British Premier League (ENG1)—that ended in a win for the home team, 4 to 2.

Now, let’s start drawing some conclusions from our data, beginning with some basic ones such as scoreless matches—for that, we create a filter, setting both HS and AS to zero:

Using Gigasheet to Filter Soccer Data
Soccer Data Analysis Results

Out of these 18,707 matches ending in a scoreless draw, we quickly turn to the Group feature (on the LGE column) to see a distribution of these scores according to individual leagues:

Using Gigasheet to Group Soccer Stats Data

Immediately, we can observe the top five leagues having matches ending in a 0-0 draw, as well as the total number of matches (per league) meeting this criterion:

The Top 5 Soccer Leagues with matches ending in a draw

From the image above, we can see that 2nd and 4th division leagues have the highest propensity for scoreless matches.

Score Differentials in Soccer

Next, let’s see if we can determine what the most frequent score, or match result, is. This time, we’ll begin by combining home and away scores into a single (additional) column named SC (score) using the hyphen character as a separator.

Combine Columns easily using Gigasheet

The new column named "SC":

The new Combined Column of soccer data is inserted into the sheet

With this new column available, all we need to do is use the Group feature once more to obtain the desired frequencies:

Grouping Soccer Analytics Data from the Columns Menu

Shown below, we can see the top five distributions, or match outcomes, signaling a glaring aspect of professional football: goals are rare events in the sense that a typical game undergoes countless chance variables and similar random events. As a result, out of 76 possible different scores, our table shows that most matches (86.71% to be precise) are decided by a margin of ≤ 2 goals:

Final Soccer Match Results grouped

Similarly, the least frequent scores—happening only once in almost 219K matches—constitute a combined probability of 0.0018 percent (four rare events indeed):

Blowout scores are clear outliers in Soccer

Home vs. Away Results

Finally, while game scores such as 1-0 don’t tell us anything about any (perceived) home advantage, we can always go back to our table in search of answers.

For example, Gigasheet comes prebuilt with a series of aggregate functions that can easily manipulate numeric data with just a few clicks—no formulas, no code. One of these, Average, can give us exactly what we’re looking for; head down to the bottom of the HS column, and, using the dropdown control, select “Average”:

Aggregate Functions To Calculate Soccer Statistics

Doing so for the AS column produces the following results:

Average Goals Home vs Away in Soccer

As you can see, home advantage does seem to be a factor going in the game. In numbers, these figures account for 324,785 home wins vs. 243,193 away wins. Similarly, the top five teams with the most home wins (W) can be obtained by using a Pivot table— a powerful feature that allows you to quickly summarize data by grouping it into rows and columns:

Pivot Mode To Analyze Soccer Data in Gigasheet

Columns now contain individual W-D-L scores; notice how Gigasheet allows you to sort these to give you the kind of insight that you’re looking for; in this case, the top five teams with the largest number of home wins:

Pivot Table containing Soccer Match Results by League

Final thoughts

From this data, it seems like competitive advantage in football remains difficult to obtain given the scorelines, albeit some of the results are consistent with our initial premise: outcome inequality increases as richer teams secure the best players that they can afford.

However, goal scoring is still a rarity. During a single match, an untold series of both endogenous and exogenous circumstances can impact the game in more ways than one—a testimony to the role of the not-so-subtle complexities involving twenty-two players and countless play possibilities.

But, with tools like Gigasheet in our arsenal, you can certainly vamoose from the old ways that entailed long and tedious manual processes and calculations just to arrive at basic results, as Gigasheet automatically takes care of the shoddy details so that you can focus on gaining valuable insight from your data.

Try Gigasheet Today!

The ease of a spreadsheet with the power of a database, at cloud scale.

No Code
No Database
No Training
Sign Up, Free

Similar posts

By using this website, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.