Every four years, a flurry of activity and joyful fanaticism descend upon a designated host nation to witness what is arguably the biggest sporting event in the world. Yes, we’re talking about the FIFA World Cup! And with it in full effect, what better opportunity than such a world-renowned competition to showcase the power of Gigasheet!
As with any mass spectator sport, numbers rule; that is, data and more data. From players and match analyses, to the number of committed fans populating entire stadiums brandishing unrelenting allegiance to their favorite team, football, a.k.a. soccer, naturally encourages gaggles of statistical inference and forecasting.
However, for all its gilded touches, football is also beset by the sort of predictability that comes with monetization. For example, in a recently-published paper by The Royal Society, the four major European leagues (England, Germany, Portugal and Spain) exhibited a clear evolution towards inequality based on financial indicators—a gradual process that the study describes as the “gentrification” of football.
But, to the fun part: The paper seemingly highlights the importance of home-court advantage, and quantifies match scores for several leagues across the world (since 2000 through 2016) according to it. Let’s use Gigasheet to help us get some clarity and confirm whether home teams can still exhibit a certain advantage despite any other biases.
Loading and parsing datasets onto Gigasheet takes mere seconds, as the tool is able to handle billions of records at a time, so 200K rows is basically a walk in the park. Soon enough, our table looks like this:
Data in this table are laid out as follows:
For instance, in the table above we can observe a match that took place on December 31st, 2016, between Chelsea and Stoke City—of the British Premier League (ENG1)—that ended in a win for the home team, 4 to 2.
Now, let’s start drawing some conclusions from our data, beginning with some basic ones such as scoreless matches—for that, we create a filter, setting both HS and AS to zero:
Out of these 18,707 matches ending in a scoreless draw, we quickly turn to the Group feature (on the LGE column) to see a distribution of these scores according to individual leagues:
Immediately, we can observe the top five leagues having matches ending in a 0-0 draw, as well as the total number of matches (per league) meeting this criterion:
From the image above, we can see that 2nd and 4th division leagues have the highest propensity for scoreless matches.
Next, let’s see if we can determine what the most frequent score, or match result, is. This time, we’ll begin by combining home and away scores into a single (additional) column named SC (score) using the hyphen character as a separator.
The new column named "SC":
With this new column available, all we need to do is use the Group feature once more to obtain the desired frequencies:
Shown below, we can see the top five distributions, or match outcomes, signaling a glaring aspect of professional football: goals are rare events in the sense that a typical game undergoes countless chance variables and similar random events. As a result, out of 76 possible different scores, our table shows that most matches (86.71% to be precise) are decided by a margin of ≤ 2 goals:
Similarly, the least frequent scores—happening only once in almost 219K matches—constitute a combined probability of 0.0018 percent (four rare events indeed):
Finally, while game scores such as 1-0 don’t tell us anything about any (perceived) home advantage, we can always go back to our table in search of answers.
For example, Gigasheet comes prebuilt with a series of aggregate functions that can easily manipulate numeric data with just a few clicks—no formulas, no code. One of these, Average, can give us exactly what we’re looking for; head down to the bottom of the HS column, and, using the dropdown control, select “Average”:
Doing so for the AS column produces the following results:
As you can see, home advantage does seem to be a factor going in the game. In numbers, these figures account for 324,785 home wins vs. 243,193 away wins. Similarly, the top five teams with the most home wins (W) can be obtained by using a Pivot table— a powerful feature that allows you to quickly summarize data by grouping it into rows and columns:
Columns now contain individual W-D-L scores; notice how Gigasheet allows you to sort these to give you the kind of insight that you’re looking for; in this case, the top five teams with the largest number of home wins:
From this data, it seems like competitive advantage in football remains difficult to obtain given the scorelines, albeit some of the results are consistent with our initial premise: outcome inequality increases as richer teams secure the best players that they can afford.
However, goal scoring is still a rarity. During a single match, an untold series of both endogenous and exogenous circumstances can impact the game in more ways than one—a testimony to the role of the not-so-subtle complexities involving twenty-two players and countless play possibilities.
But, with tools like Gigasheet in our arsenal, you can certainly vamoose from the old ways that entailed long and tedious manual processes and calculations just to arrive at basic results, as Gigasheet automatically takes care of the shoddy details so that you can focus on gaining valuable insight from your data.