Back to articles list
- 9 minutes read

A Data Model to Track Tennis Tournaments

Looking for a way to keep tabs on tennis? Shantanu Kher demonstrates a data model that manages information about players, tournaments, and results.

Roger Federer recently won his fifth Australian Open title, picking up a record 18th Grand Slam. Like many people, I’ve followed his play and tennis tournaments in general for a long time. What does it take to organize such events? Let’s dive into a data model designed to keep track of tennis tournaments and players.

Fast Facts About Tennis

Before we start the data model, let’s get some background on tennis. Tennis originated in England in the 19th century and is now played around the world. Some of this sport’s major tournaments include Wimbledon, the U.S. Open, the French Open, and the Australian Open.

The International Tennis Federation (ITF) is the governing body of world tennis. It was founded in 1913 by twelve national associations and is now affiliated with 211 national tennis associations and six regional associations. The ITF partners with the Women’s Tennis Association (WTA) and the Association of Tennis Professionals (ATP) to govern professional tennis tournaments throughout the year.

The Rules of the Game

Here are some basic things you need to know about tennis:

  • A tennis match can be played by either one player on each side (a singles match) or two players on each side (a doubles match).
  • Game vs. Set vs. Match
  • A match consists of sets; the player who wins the most sets wins the match. There are restrictions on the maximum number of sets: best of three for women’s matches and best of five for men’s matches.
  • A set consists of games. To win a set, a player must win six games but they must also be ahead of their opponent by at least two games. If your opponent wins five games, you must win seven to win the set. If the set goes to 6-6, then a tie-breaker is played and whoever gets seven points first wins the set.
  • To win a game, a player needs to score four points. The points are known as 15 (first point), 30 (second point), 40 (third point) and the winning or fourth point. If the game score goes to 40-40, this is called a deuce. When a game reaches deuce, a player must win by two points.
  • Each tournament maintains a player ranking and builds their fixtures in a way that the best or top-ranked players do not meet until later in the competition. This ranking is called a ‘seed’.
  • Most tournaments are organized by the ATP (for men) or WTA (for women). Once players register with either organization, their registration numbers remain constant throughout their career.

Now we have some background on the way a match is scored and how tournaments are organized. We need to build a data model to capture both the results (who won the match, how many sets were played) and the more granular details (the score of each set, how many tiebreakers were played).

I propose the following data model for the job. I have divided it into four subject areas:

  • Player details
  • Tournament details
  • Connecting players with tournaments
  • Fixtures and their results

Below, you can see the model in its entirety.

The Data Model




Player Details

Player Details

This subject area is self-explanatory and has only two tables. The most important is the player table, which holds basic details about players. All the attributes are easily understood, so the only one I’ll go into is the id attribute. This column assigns a unique number to each player. In subsequent subject areas, the id column will signify a player.

Most professional tennis tournaments do not have players representing specific countries (unlike the Olympics, which do). Still, organizing bodies often prefer to store players’ native countries. Therefore, we have the country table, which holds country names and their three-character codes – i.e. USA for the United States of America or SWE for Switzerland.

Tournament Details

Tournament Details

We can now move on to the Tournament Details subject area. As you might expect, the tournament table is its core. It holds details for each tournament. The columns are:

  • id – A unique number for each occurrence of a tournament.
  • tournament_name – The tournament name, such as Wimbledon 2017 or US Open 2016.
  • location – The city where the tournament is being held.
  • start_date – The date when the first match of the tournament is held.
  • end_date – The date when the tournament’s final (last round) is held.
  • number_of_rounds – The total number of rounds in the tournament. This is a number-type column.
  • tournament_type_id – The type of tournament – e.g. grand slam, team cup, Senior or Junior ITF Grade A, etc. This column references the tournament_type table.
  • surface_type_id – The tennis court surface; matches can be played on clay, grass, carpet, etc. This column references the surface_type table.

The playing_category table stores details about different match categories: men’s singles or doubles, women’s singles or doubles, or mixed doubles. The columns in this table are:

  • id –The primary key of the table. It signifies the tournament’s category.
  • category_name – The category name, such as “men’s singles” or “mixed doubles”.

The tournament_playing_category handles the many-to-many relationship between the tournament and playing_category tables. The columns in this table are:

  1. id – The primary key of this table. In subsequent subject areas, references to tournament IDs will actually be referred to this column.
  2. tournament_id – References the tournament table and indicates the related tournament.
  3. playing_category_id – References the playing_category table and indicates the type of tournament (women’s singles, etc).

Connecting Players with Tournaments

Connecting Players with Tournaments

Although this subject area only has three tables, I think it is the most interesting part of the entire data model: it explains how players take part in tournaments.

Players need to register for tournaments. When they do, they are assigned unique numbers (registration numbers). The registration table holds these details. The same registration number is assigned to one (for singles matches) or two players (for doubles matches). The registration_player table stores which player is assigned to which registration number.

The playing_in table holds the details about player participation in the tournament’s category of play. The columns in this table are:

  • id – The primary key of this table.
  • tournament_playing_category_id – The tournament category.
  • registration_id – The registration number of a player or pair of players.
  • seed – The players’ or player’s ranking, as defined by the tournament’s governing body.

Why do I need a registration table when I can directly refer to the “player_id” column in the “playing_in” table?

Because there are times when a player may have more than one registration number.

When two players pair up and play a doubles game, the tournament organizers assign the same registration number to both players. Both players are also seeded (or ranked) as a pair. However, a player can participate in both singles and doubles matches. In this case, the player has two registration numbers: one as an individual and another as part of a pair.

These are the reasons I did not store the player_id directly in the playing_in table. Also, it is not wise to have registration_id as a column in the player table itself.

Fixtures and Their Results

Fixtures and Their Results

This subject area stores match, set, and game details.

The fixture table handles match details. The columns in this table are:

  • id – The primary key of this table. It assigns a unique number to each match.
  • tournament_playing_category_id – The match’s type, i.e. singles or doubles .
  • first_registration_id – The first player or team competing in the match.
  • second_registration_id – The second player or team in the match.
  • round – The tournament round for this match, i.e. first round, second round, quarterfinal, semifinal, etc. This is a number-type column, so quarterfinal, semifinal, and final rounds must be stored in numerical form.

Note: The rows in this table are inserted when the tournament’s roadmap is finalized.

The fixture_result table stores match results. The columns in this table are:

  • fixture_id – References the fixture table.
  • winner_registration_id – The winner’s registration ID number (can be an individual player or a pair).
  • number_of_sets_played – The number of sets played in the match.
  • is_opponent_retired – There are cases when a player is retired from a match, usually due to injury. In such cases, the other player or pair is declared the winner. A ‘Y’ in this column signifies the opposing player was retired. Otherwise, the column remains NULL.

The set_score table keeps the results of each set played in a match. The columns in this table are:

  • fixture_id – The ID of the related match.
  • set_number –The set number of the match.
  • first_registration_games – The number of games won by the player whose registration number is stored in the first_registration_id column in the fixture table.
  • second_registration_games – On similar lines, this column stores the number of games won by the player represented by the second_registration_id.

In this table, the fixture_id and set_number columns form a composite primary key.

The game_score table stores the results of each game played in a match. The columns in this table are:

  • fixture_id – The ID of the related match.
  • set_number – The set number of the match.
  • game_number – An incremental number that signifies the match order of the game.
  • first_registration_points – The points scored by the first player in the game.
  • second_registration_points – The points scored by the second player.

In this table, the fixture_id, set_number, and game_number columns form a composite primary key.

The tie_breaker table handles tied sets. Players need at least two additional points to win a set; if scores are too close, a tie-breaking game is played to decide the winner. The columns in this table are:

  • fixture_id – The relevant match.
  • set_number – The set number of the match.
  • first_registration_tie_breaker – The number of points scored in the tiebreaker by the player represented in the first_registration_id column in the fixture table.
  • sec_registration_tie_breaker – The number of points scored by the player represented in the second_registration_id column in the fixture table.

What Would You Do?

Can we build a similar data model for other sports tournaments? What kinds of sports – football, hockey, cricket? What changes would we need to make for this model?

Would you like to suggest any further tennis-related requirements for the current model?

Please share your views and ideas in the comments section below.

go to top