Back to articles list
- 9 minutes read

A Database Model for a Taxi Service

Call them taxis or cabs, these convenient rides for rent have been around for centuries. Nowadays, it’s a lot more complicated to run a taxi service. In this article, we’ll look at a database model designed to meet the needs of a cab company.

The history of “calling a cab” began in 17th century London. In most places, cabs are more affordable than ever. They are also becoming a lot more accessible: we can order a taxi by phone, via mobile applications, or on the Web. Or we can use the same techniques that have been working for hundreds of years – line up at a cab station or flag one down on the street.

The taxi business model isn’t stagnant by any means. Newer concepts like Uber and Lyft are gaining popularity and will certainly have an impact on the future of taxi services. Of course, all of this complicates the lives of those who run cab companies. Let’s take a look at the pertinent parts of a data model that a cab company could use to stay organized.

What Do We Want to Achieve with This Cab Database Model?

The model presented in this article will be able to:

  • Create driver schedules
  • Track driver availability in real time
  • Provide drivers with a list of potential rides
  • Allow drivers to select a ride from the list
  • Calculate drivers’ working hours and earnings
  • Store various statistics (e.g. percentage of rides that cancel, payments per driver per month, etc.)

What Do We Need to Know About Taxi Companies?

Before we design a data model for a cab company, we should be able to answer the following questions:

  1. When do drivers work?
    In most cab companies, drivers have predefined schedules. We’ll know the exact times when the driver starts and ends a shift. In some cases, the shift’s start and end times are not defined in advance. For example, members of a cab association can log in and start working whenever they want. They can also log out and end their shift when they choose. Our model should be able to support both options.

  2. Can a driver work multiple shifts on the same day?
    If the driver is a member of a cab association, they may be able to work in the morning, go home, and then work again that same evening. However, in some areas (like New York City) there are legal restrictions on shift length and/or how many hours cab drivers can work each day.

  3. 3. Who initiates a ride?
    In most cases, a customer will contact the taxi call center and the dispatcher will enter their request into the system. Another scenario occurs when the customer orders a cab directly (by way of mobile app, for instance) and there is no call center employee involved in the process. A third option – which also bypasses the call center – occurs when a customer gets a cab at the station or hails one on the street.

The Data Model




Our model has two main sections and three uncategorized tables. We’ll take a closer look at each of them.

Section 1: Cabs, Drivers, and Shifts

cabs and drivers section

Everything starts with cabs and drivers: we need cars in a taxi company and we need drivers. (In the future, we’ll probably need to adjust our model for self-driving cars. But let’s stay in the present for now.)

We’ll begin our exploration of the data model with the driver table. In this table, we’ll include the usual attributes like name, surname, and date of birth. We’ll also have some attributes that are quite specific to this model:

  • driving_licence_number – This is an ID number or alphanumeric code usually found on a government-issued license. Since this ID is unique in the real world, it will also be unique in our database. (In some areas, cab drivers must have a special type of operating license to work; we’ll assume they have it.)
  • expiry_date – This is the date when a driver’s license is no longer legally valid. Notice that we won’t store the license history, so we’ll simply overwrite driving_licence_number and expiry_date with new data as needed. If we want to store license histories, we would need to add another table to our model.
  • working – This is a Boolean value that simply switches on and off as drivers log into the system. We’ll set this attribute by default to “Yes” (value 1) and change it to “No” only when the driver isn’t allowed to log into the system anymore.

There are many other driver-related details we could store in the database: usernames and passwords, the date each driver started working, and each driver’s current employment type. We won’t go into these details now because they aren’t specifically related to this model. I’d class them under user administration, which is the same in most systems.

Now, let’s move on to the cab and the car_model tables. This is where we’ll store a list of the cars that our company operates. We’ll also store certain details about each vehicle. The most important attributes in these two tables are:

  • model_description – This is a text attribute that keeps a company-specified description of a certain type of car. For example, cab companies may want to store the number of passengers a car can carry, trunk (boot) space, and other facts.
  • license_plate – The number on a license plate (vehicle registration plate or number plate) uniquely defines a car, both for a company and for government purposes. Of course, we may need to change license plate information if a car is bought or sold. In this table, we’ll only store the company’s current vehicles; if we need to keep a history, we can add one more table to our model.
  • owner_id – This attribute is a reference to the driver table. It is optional because we’ll only use it when the driver owns their cab. (This usually is the case in cab associations).
  • active – This is a Boolean value that denotes whether the company is still using a car.

Finally, let’s take a look at the most important table in this section: the shift table. The idea behind this table is to store the actual working hours and the schedule shifts for cars and drivers. Each shift will have at least one cab (cab_id) and one driver (driver_id). Aside from the primary key, which is a unique shift ID number, these are the only mandatory attributes in this table.

The shift_start_time and shift_end_time attributes are the actual moments when a shift starts and ends. Often, these are preset. In case there is no schedule, as in a cab association, these times would be the same as the login_time and logout_time attributes, respectively. The login_time and the logout_time are the actual times the drivers log in (via a mobile device in their car or whatever method the taxi company uses). When the driver logs into the system, a list of available rides will appear, and the driver will choose one. Of course, the dispatcher will also be notified that the driver is now working.

Section 2: Rides

Rides section

This section is composed of only two tables, but they are the true heart of this model.

In the cab_ride table, we’ll store one record for every potential ride. We’ll update this record according to what happens. Let’s have a quick overview of the attributes:

  • shift_id – This is a reference to the shift table; it provides us with driver and cab details for a given ride.
  • ride_start_time and ride_end_time – These are updated when drivers signal that they are currently busy (ride start) and subsequently available again (ride end).
  • address_starting_point and address_destination – These are the locations where a ride starts and ends. The driver will probably search for both locations to get the navigation on his tablet or GPS, so that’s likely when we’ll update these two attributes.
  • GPS_starting_point and GPS_destination – These are the GPS coordinates of the starting and ending locations described above. These values are optional because we’ll update them only when we have data.
  • canceled – This is a simple Yes/No value that tells us if a ride has been canceled. We’ll already have a record for that ride in our table, so we can just mark the ride as canceled.
  • payment_type_id and price – These provide information about the amount paid for a ride and the payment method used by the customer.

Most of the attributes in the cab_ride table can contain a NULL value. There are two reasons for this:

  • Rides do get canceled, which means no data will be entered in these fields;
  • Even if we will have all the data eventually, we won’t have it all when the record is initially inserted. We’ll update each record several times – at the very least, we’ll update it when the ride starts and ends (or is canceled).

The second table in this section is the cab_ride_status table. Here, a record is added for every change related to a single ride. When the dispatcher enters a new ride in the system, they’ll add a record to the cab_ride table, but a related record will also be created in the cab_ride_status table (along with the corresponding status). After each change related to that ride, one more record will be added to this table. The attributes are:

  • cab_ride_id and status_id– These are foreign keys that are related to the id attribute in the ride table and the id attribute in the status table (which we will cover below). Both are mandatory.
  • status_time – This stores the actual time when the record was inserted. It is also mandatory.
  • cc_agent_id and shift_id – These are references to the employee who inserted a status update. It can be either a dispatcher or a driver. Probably the dispatcher will insert the initial status and the driver will insert all the following statuses.
  • status_details – This is a text attribute that can be used to store additional information. For example, a dispatcher could use this attribute to record special instructions about a ride.

Uncategorized Tables

Finally, let’s quickly go over our uncategorized tables.

cc_agent table

The cc_agent table contains a list of call center agents or dispatchers that can add new records in the cab_ride and cab_ride_status tables.

status table

In the status dictionary, we’ll store a list of all possible statuses that we could assign to a ride. Some values include “new ride”, “ride assigned to driver”, “ride started”, “ride ended”, or “ride canceled”.

payment_type table

Payment_type is another dictionary table. Its contents are used to update the payment_type_id attribute in the cab_ride table. Common values are “cash” and “credit card”.

How Would You Improve the Taxi Data Model?

The cab/taxi database model presented in this article is focused only on the most important functionalities. There are numerous improvements we could make. Routes are just one that I can think of.

In the future, we probably will have driverless cabs. In that case, we would drop the driver from the model and substitute things like recharge and repair times.

Feel free to comment and add your ideas.

go to top