Back to articles list

What Do the Olympic Games, UEFA Euro 2016 Football Matches, and Databases Have In Common?

On hearing what I do, people tend to ask me the same question: Can you develop a system that predicts football match results? Or Olympic medal outcomes? Personally, I don’t put much faith in predictions. Still, if we had a large amount of historical data and relevant indicators, we could certainly design a system to help us come up with more accurate assumptions. In this article, we’ll consider a model that can store the results of matches and tournaments.

This model is primarily focused on European football (soccer) matches, statistics, and results, but it could easily be tweaked to accommodate many other sports. My main motivation for this article was this year’s two big football events: the UEFA Euro 2016 Championship that just happened, and the 2016 Summer Olympic Games that are happening right now.

What Do We Know Before the Tournament Starts?

Before the tournament starts, we know almost everything about it — except the most important thing: who will win. Let’s briefly state exactly what we already know:

• The dates the tournament starts and ends
• The locations where the matches will take place
• The exact times the matches will start
• Which teams have qualified for the tournament
• The players on each of these teams
• Each player’s past performance and their current form

What Match Details Do We Want to Store?

Tournaments consist of multiple matches. Before we store any match details we need to:

• Relate each match with the tournament
• Record the tournament stage when the match was played (e.g. group stage, semi-finals)

We also need to store details for single matches, including:

• The teams involved in the match
• Starting lineups and substitutions
• Match events (in football these are: goal, penalty, foul, yellow card, etc.)
• Final score
• Players’ actions during the match

We’ll use this data to capture all important match events. Comparing a player’s performance before and during the match could lead to certain conclusions. Maybe we wouldn’t be able to predict the final results of their performance (i.e. a win or a loss), but statistics certainly could help us make assumptions with a degree of reliability.

Introducing the Model

The model is divided into four main areas:

• `Tournament details`
• `Match details`
• `Events`
• `Indicators and Performance`

The tables outside these areas are dictionaries (`sport`, `phase`, `position`), catalogues (`sport_event`, `team`, `player`) and a single many-to-many relation (`plays`).

We’ll describe the uncategorized tables first, and then take a close look at each area.

The Uncategorized Tables

These tables are important because tables from all four areas use them as dictionaries or catalogues.

The `sport` table lists all the sports we’ll store in our database. We’ll probably have only one sport here, men’s football, but this table gives us the flexibility to add similar sports (e.g. women’s football) if needed.

In the `sport_event` table, we’ll store the events connected with our sport(s). One example would be the “2016 Olympic Games”.

The `phase` table is a dictionary that holds all possible tournament stages. It contains values like “group stage”, “round of 16”, “quarter-finals”, “semi-finals”, “final”.

The `team` table is, as you’d guess, a simple list of all teams. Possible values are “Croatia”, “Poland”, “USA” etc. If we use the database to store information about club or league competition, we would also have values like “Barcelona”, “Real Madrid”, “Bayern”, “Manchester United” etc.

In the `player` table, we’ll store records for all players belonging to the relevant teams.

The `plays` table is our sole many-to-many relation, and it relates players and teams. A player can belong to more than one team at the same time ( e.g. the national team and a club), but during a tournament they will obviously play for only one team.

Finally, we have the `position` table. This simple dictionary will store a list of all the required positions. In football, these include goalkeeper, center-half, striker, etc.

Tournament Details

Note: If you just want to store the results of single matches, you don’t need to use this section.

A tournament consists of more than one match; both UEFA Euro 2016 and the football events in the 2016 Summer Olympics are tournaments. As we said before, we can store a single match in our database, but we can also relate matches to their relevant tournaments. The tables in the Tournament section are:

• `tournament` – This contains all the basic tournament data: the sport, start date, end date, etc. We also need to store the tournament name and a description of where it’s taking place. The `sport_event_id` attribute is optional because a tournament doesn’t have to be associated with a larger event (such as the Olympics).
• `group` – This lists all the groups in that tournament. UEFA Euro 2016 had six groups, A to F.
• `participant` – These are the teams playing in the tournament; each participant can be assigned to a group. Most tournaments begin with a group stage and then continue to a knockout stage (e.g. UEFA Euro, UEFA World Cup, Olympic football). Some tournaments will have only a group stage (e.g. national leagues), while others will have only a knockout stage (e.g. national cups).
• `in_team` – This table provides a many-to-many relation that stores information about the players registered for that tournament and their expected positions.
• `tournament_schedule` – In my opinion, this is the most interesting table in this section. The list of all games played during this tournament is stored here. The `tournament_id` attribute denotes which tournament each match belongs to, and the `phase_id` attribute defines the phase during which the match will take place. We’ll also store the match location and the time when it begins. Both participants will be described by text fields. When the group stage finishes, we’ll know all the matchups for the elimination round. For example, at the beginning of UEFA Euro 2016, we knew that winner of Group E (1E) will play against the Group D runner-up (2D). After all three rounds in the group phase were played, this pair was Italy vs. Spain.

Match Details

The `Match details` area is used to store data for single matches. We’ll use two tables:

• `match` – This contains all the details about a single match; this match can be related to a tournament, but it could also be a single game. So the `tournament_schedule_id` attribute is optional, and we’ll store the `sport_id`, `start_time` and `location` attributes again here. If the match is part of a tournament, then `tournament_schedule_id` will be assigned a value. The `team_1_id` and `team_2_id` attributes are references to the teams involved in the match. The `goals_team_1` and `goals_team_2` attributes contain the result of the match. They are mandatory and should have “0” as the default value for both.
• `in_match` – This table is a list of all the players who are registered for that match; players who don’t participate will have a NULL in the `started_at` attribute, while players that came in as substitutions will have `started_at` > 0. If a player was replaced, they’ll have an `ended_at` attribute that matches the `started_at` attribute of the player who replaced them. If the player stayed in for the entire match, their `ended_at` attribute will have the same value as the `end_time` attribute.

Match Events

This section is intended to store all the details or events that happened during the game. And the tables are:

• `event` – This is a dictionary that lists all the events we want to store. In football, these are values like “foul committed”, “foul suffered”, “yellow card”, “red card”, “free kick”, “penalty”, “goal”, “offside”, “substitution”, “player ejected from match”.
• `match_event` – This relates events with the match. We’ll store the `event_time` as well as player information related to that event (`in_match_id`).
• `related_event` – This is what brings event information together. To explain, let’s look at an example when Player A fouls Player B. We’ll insert a record in the `match_event` table that indicates that Player A committed a foul and another one that indicates that Player B suffered a foul. We’ll also add a record to the `related_event` table, where the ‘committed foul’ will be the parent and the ‘suffered foul’ will be the child. We’ll also record the results of the foul: a yellow card, a free kick or a penalty kick, and maybe a goal.

Indicators and Performance

This section should help us analyze players and teams before and after the match.

The `indicator` table is a dictionary with a predefined set of indicators for each player before each match. These indicators should describe the player’s current form. This list could contain values like: “number of goals in last 10 matches”, “average distance covered in last 10 matches”, “number of saves for GK in last 10 matches”.

The `performance` dictionary is very similar to `indicator`, but we’ll use it to store only values that are related to the single match: “distance covered”, “accurate passes”, etc.

The `player_indicator` and `performance_indicator` tables share an almost identical structure:

• `in_match_id` – refers to the player taking part in a certain match
• `indicator_id` / `performance_id` – references the `indicator` or ”
performance dictionaries
• `value` – stores the value for that indicator (e.g. a player covered 10.72 km distance)
• `description` – holds an additional description, if needed
• What Happened During the Match?

With all this data entered in, we could easily get match details, events, and statistics for every match in our database.

This simple query would return basic details for an upcoming match:

```SELECT team_1.`team_name`, team_2.`team_name`, `match`.`start_time`, `match`.`location`
FROM `match`, `team` AS team_1, `team` AS team_2
WHERE `match`.`team_1_id` = team_1.`id`
AND `match`.`team_2_id` = team_2.`id`
```

To get a list of all the in-play events during a certain match, we would use the query below:

```SELECT `event`.`event_name`, `match_event`.`event_time`, `player`.`first_name`, `player`.`last_name`
FROM `match`, `match_event`, `event`, `in_match`, `player`
WHERE `match_event`.`match_id` = `match`.`id`
AND `event`.`id` = `match_event`.`event_id`
AND `in_match`.`id` = `match_event`.`in_match_id`
AND `player`.`id` = `in_match`.`player_id`
AND `match`.`id` = @match
ORDER BY `match_event`.`event_time` ASC
```

There are numerous additional queries that I can think of; it’s easy to do an analysis when you have the data. If you’ve measured and stored a large number of indicators and player performance data, you might be able to relate these parameters with a final result. I personally don’t believe in such predictions; there’s the luck factor during matches, plus numerous other factors you can’t know until the game starts. Still, if you have a large dataset and a lot of parameters, your chance of making more accurate predictions increases.

The model presented in this article allows us to store matches, match details, and a history of each player’s performance. We can also set form indicators for each player before the match. Storing enough details should provide us with more parameters on which to base our assumptions. I’m not saying that we could predict the result of the game, but we could have some fun with it.

We could also easily tweak this model to store data for other sports. These changes shouldn’t be too complex. Adding a `sport_id` attribute to the dictionaries should do the trick. Still, I think it would be wise to have a new instance for each different sport.