What Do the Olympic Games, UEFA Euro 2016 Football Matches, and Databases Have In Common?

Emil Drkušić, Database designer and developer, financial analyst

by
Emil Drkušić
Database designer and developer, financial analyst

Posted: August 11, 2016

On hearing what I do, people tend to ask me the same question: Can you develop a system that predicts football match results? Or Olympic medal outcomes? Personally, I don’t put much faith in predictions. Still, if we had a large amount of historical data and relevant indicators, we could certainly design a system to help us come up with more accurate assumptions. In this article, we’ll consider a model that can store the results of matches and tournaments.

This model is primarily focused on European football (soccer) matches, statistics, and results, but it could easily be tweaked to accommodate many other sports. My main motivation for this article was this year’s two big football events: the UEFA Euro 2016 Championship that just happened, and the 2016 Summer Olympic Games that are happening right now.

What Do We Know Before the Tournament Starts?

Before the tournament starts, we know almost everything about it — except the most important thing: who will win. Let’s briefly state exactly what we already know:

  • The dates the tournament starts and ends
  • The locations where the matches will take place
  • The exact times the matches will start
  • Which teams have qualified for the tournament
  • The players on each of these teams
  • Each player’s past performance and their current form

What Match Details Do We Want to Store?

Tournaments consist of multiple matches. Before we store any match details we need to:

  • Relate each match with the tournament
  • Record the tournament stage when the match was played (e.g. group stage, semi-finals)

We also need to store details for single matches, including:

  • The teams involved in the match
  • Starting lineups and substitutions
  • Match events (in football these are: goal, penalty, foul, yellow card, etc.)
  • Final score
  • Players’ actions during the match

We’ll use this data to capture all important match events. Comparing a player’s performance before and during the match could lead to certain conclusions. Maybe we wouldn’t be able to predict the final results of their performance (i.e. a win or a loss), but statistics certainly could help us make assumptions with a degree of reliability.

Introducing the Model




The model is divided into four main areas:

  • Tournament details
  • Match details
  • Events
  • Indicators and Performance

The tables outside these areas are dictionaries (sport, phase, position), catalogues (sport_event, team, player) and a single many-to-many relation (plays).

We’ll describe the uncategorized tables first, and then take a close look at each area.

The Uncategorized Tables

These tables are important because tables from all four areas use them as dictionaries or catalogues.

sport table

The sport table lists all the sports we’ll store in our database. We’ll probably have only one sport here, men’s football, but this table gives us the flexibility to add similar sports (e.g. women’s football) if needed.

sport_event table

In the sport_event table, we’ll store the events connected with our sport(s). One example would be the “2016 Olympic Games”.

phase table

The phase table is a dictionary that holds all possible tournament stages. It contains values like “group stage”, “round of 16”, “quarter-finals”, “semi-finals”, “final”.

team table

The team table is, as you’d guess, a simple list of all teams. Possible values are “Croatia”, “Poland”, “USA” etc. If we use the database to store information about club or league competition, we would also have values like “Barcelona”, “Real Madrid”, “Bayern”, “Manchester United” etc.

player table

In the player table, we’ll store records for all players belonging to the relevant teams.

plays table

The plays table is our sole many-to-many relation, and it relates players and teams. A player can belong to more than one team at the same time ( e.g. the national team and a club), but during a tournament they will obviously play for only one team.

position table

Finally, we have the position table. This simple dictionary will store a list of all the required positions. In football, these include goalkeeper, center-half, striker, etc.

Tournament Details

Tournament Details

Note: If you just want to store the results of single matches, you don’t need to use this section.

A tournament consists of more than one match; both UEFA Euro 2016 and the football events in the 2016 Summer Olympics are tournaments. As we said before, we can store a single match in our database, but we can also relate matches to their relevant tournaments. The tables in the Tournament section are:

  • tournament – This contains all the basic tournament data: the sport, start date, end date, etc. We also need to store the tournament name and a description of where it’s taking place. The sport_event_id attribute is optional because a tournament doesn’t have to be associated with a larger event (such as the Olympics).
  • group – This lists all the groups in that tournament. UEFA Euro 2016 had six groups, A to F.
  • participant – These are the teams playing in the tournament; each participant can be assigned to a group. Most tournaments begin with a group stage and then continue to a knockout stage (e.g. UEFA Euro, UEFA World Cup, Olympic football). Some tournaments will have only a group stage (e.g. national leagues), while others will have only a knockout stage (e.g. national cups).
  • in_team – This table provides a many-to-many relation that stores information about the players registered for that tournament and their expected positions.
  • tournament_schedule – In my opinion, this is the most interesting table in this section. The list of all games played during this tournament is stored here. The tournament_id attribute denotes which tournament each match belongs to, and the phase_id attribute defines the phase during which the match will take place. We’ll also store the match location and the time when it begins. Both participants will be described by text fields. When the group stage finishes, we’ll know all the matchups for the elimination round. For example, at the beginning of UEFA Euro 2016, we knew that winner of Group E (1E) will play against the Group D runner-up (2D). After all three rounds in the group phase were played, this pair was Italy vs. Spain.

Match Details

Match Details

The Match details area is used to store data for single matches. We’ll use two tables:

  • match – This contains all the details about a single match; this match can be related to a tournament, but it could also be a single game. So the tournament_schedule_id attribute is optional, and we’ll store the sport_id, start_time and location attributes again here. If the match is part of a tournament, then tournament_schedule_id will be assigned a value. The team_1_id and team_2_id attributes are references to the teams involved in the match. The goals_team_1 and goals_team_2 attributes contain the result of the match. They are mandatory and should have “0” as the default value for both.
  • in_match – This table is a list of all the players who are registered for that match; players who don’t participate will have a NULL in the started_at attribute, while players that came in as substitutions will have started_at > 0. If a player was replaced, they’ll have an ended_at attribute that matches the started_at attribute of the player who replaced them. If the player stayed in for the entire match, their ended_at attribute will have the same value as the end_time attribute.

Match Events

Match Events

This section is intended to store all the details or events that happened during the game. And the tables are:

  • event – This is a dictionary that lists all the events we want to store. In football, these are values like “foul committed”, “foul suffered”, “yellow card”, “red card”, “free kick”, “penalty”, “goal”, “offside”, “substitution”, “player ejected from match”.
  • match_event – This relates events with the match. We’ll store the event_time as well as player information related to that event (in_match_id).
  • related_event – This is what brings event information together. To explain, let’s look at an example when Player A fouls Player B. We’ll insert a record in the match_event table that indicates that Player A committed a foul and another one that indicates that Player B suffered a foul. We’ll also add a record to the related_event table, where the ‘committed foul’ will be the parent and the ‘suffered foul’ will be the child. We’ll also record the results of the foul: a yellow card, a free kick or a penalty kick, and maybe a goal.

Indicators and Performance

Indicators and Performance

This section should help us analyze players and teams before and after the match.

The indicator table is a dictionary with a predefined set of indicators for each player before each match. These indicators should describe the player’s current form. This list could contain values like: “number of goals in last 10 matches”, “average distance covered in last 10 matches”, “number of saves for GK in last 10 matches”.

The performance dictionary is very similar to indicator, but we’ll use it to store only values that are related to the single match: “distance covered”, “accurate passes”, etc.

The player_indicator and performance_indicator tables share an almost identical structure:

  • in_match_id – refers to the player taking part in a certain match
  • indicator_id / performance_id – references the indicator or ”
    performance dictionaries
  • value – stores the value for that indicator (e.g. a player covered 10.72 km distance)
  • description – holds an additional description, if needed

What Happened During the Match?

With all this data entered in, we could easily get match details, events, and statistics for every match in our database.

This simple query would return basic details for an upcoming match:

SELECT team_1.`team_name`, team_2.`team_name`, `match`.`start_time`, `match`.`location`
FROM `match`, `team` AS team_1, `team` AS team_2
WHERE `match`.`team_1_id` = team_1.`id`
AND `match`.`team_2_id` = team_2.`id`

To get a list of all the in-play events during a certain match, we would use the query below:

SELECT `event`.`event_name`, `match_event`.`event_time`, `player`.`first_name`, `player`.`last_name`
FROM `match`, `match_event`, `event`, `in_match`, `player`
WHERE `match_event`.`match_id` = `match`.`id`
AND `event`.`id` = `match_event`.`event_id`
AND `in_match`.`id` = `match_event`.`in_match_id`
AND `player`.`id` = `in_match`.`player_id`
AND `match`.`id` = @match
ORDER BY `match_event`.`event_time` ASC

There are numerous additional queries that I can think of; it’s easy to do an analysis when you have the data. If you’ve measured and stored a large number of indicators and player performance data, you might be able to relate these parameters with a final result. I personally don’t believe in such predictions; there’s the luck factor during matches, plus numerous other factors you can’t know until the game starts. Still, if you have a large dataset and a lot of parameters, your chance of making more accurate predictions increases.

The model presented in this article allows us to store matches, match details, and a history of each player’s performance. We can also set form indicators for each player before the match. Storing enough details should provide us with more parameters on which to base our assumptions. I’m not saying that we could predict the result of the game, but we could have some fun with it.

We could also easily tweak this model to store data for other sports. These changes shouldn’t be too complex. Adding a sport_id attribute to the dictionaries should do the trick. Still, I think it would be wise to have a new instance for each different sport.

 
 
 

Try our online database modeler. No registration. No commitments.

 
 
Tags
 
Subscribe to our newsletter

If you find this article useful, join our weekly newsletter to be notified about the latest posts.

 
 
 
New SQL Course! Find out how to identify all the factors of a successful visualization: think like your audience, find the right chart type, and take out the meaning of the data. View course Discover our other courses: