Querying MySQL’s INFORMATION SCHEMA: Why? How?

Emil Drkušić, Database designer and developer, financial analyst

by
Emil Drkušić
Database designer and developer, financial analyst

Posted: October 9, 2018

data model, model design, database design, create database from scratch, MySQL

Databases need to run optimally, but that’s not such an easy task. The INFORMATION SCHEMA database can be your secret weapon in the war of database optimization.

We’re used to creating databases using a graphical interface or a series of SQL commands. That’s completely fine, but it’s also good to understand a bit about what is going on in the background. This is important for the creation, maintenance, and optimization of a database, and it’s also a good way to track changes that happen ‘behind the scenes’.

In this article, we’ll look at a handful of SQL queries that can help you peer into the workings of a MySQL database.

The INFORMATION_SCHEMA Database

We already discussed the INFORMATION_SCHEMA database in this article. If you haven’t already read it, I would definitely suggest you do that before continuing.

If you need a refresher on the INFORMATION_SCHEMA database – or if you decide not to read the first article – here are some basic facts you need to know:

  • The INFORMATION_SCHEMA database is part of the ANSI standard. We’ll be working with MySQL, but other RDBMSs have their variants. You can find versions for H2 Database, HSQLDB, MariaDB, Microsoft SQL Server, and PostgreSQL.
  • This is the database that keeps track of all other databases on the server; we’ll find descriptions of all objects here.
  • Like any other database, the INFORMATION_SCHEMA database contains a number of related tables and information about different objects.
  • You can query this database using SQL and use the results to:
    • Monitor database status and performance, and
    • Automatically generate code based on query results.

Now let’s move on to querying the INFORMATION_SCHEMA database. We’ll start by looking at the data model we’re going to use.

The Data Model

The model we’ll use in this article is shown below.




This is a simplified model that allows us to store information about classes, instructors, students, and other related details. Let’s briefly go over the tables.

We’ll store the list of instructors in the lecturer table. For each lecturer, we’ll record a first_name and a last_name.

The class table lists all the classes we have in our school. For each record in this table, we’ll store the class_name, the lecturer’s ID, a planned start_date and end_date, and any additional class_details. For the sake of simplicity, I’ll assume that we have only one lecturer per class.

Classes are usually organized as a series of lectures. They generally require one or more exams. We’ll store lists of related lectures and exams in the lecture and exam tables. Both will have the ID of the related class and the expected start_time and end_time.

Now we need students for our classes. A list of all students is stored in the student table. Once again, we’ll store only the first_name and the last_name of each student.

The final thing we need to do is to track students’ activities. We’ll store a list of every class a student registered for, the student’s attendance record, and their exam results. Each of the remaining three tables – on_class, on_lecture and on_exam – will have a reference to the student and a reference to the appropriate table. Only the on_exam table will have an additional value: grade.

Yes, this model is very simple. We could add many other details about students, lecturers, and classes. We could store historical values when records are updated or deleted. Still, this model will be enough for the purposes of this article.

Creating a Database

We’re ready to create a database on our local server and examine what is happening inside it. We’ll export the model (in Vertabelo) using the “Generate SQL script” button.

Then we’ll create a database on the MySQL Server instance. I called my database “classes_and_students”.

The next thing we need to do is to run a previously-generated SQL script.

Now we have the database with all its objects (tables, primary and foreign keys, alternate keys).

Database Size

After the script runs, data about the “classes and students” database is stored in the INFORMATION_SCHEMA database. This data is in many different tables. I won’t list them all again here; we did that in the previous article.

Let’s see how we can use standard SQL on this database. I’ll start with one very important query:

SET @table_schema = "classes_and_students";

SELECT 

    ROUND(SUM( INFORMATION_SCHEMA.TABLES.DATA_LENGTH + INFORMATION_SCHEMA.TABLES.INDEX_LENGTH ) / 1024 / 1024, 2) AS "DB Size (in MB)",
    ROUND(SUM( INFORMATION_SCHEMA.TABLES.DATA_FREE )/ 1024 / 1024, 2) AS "Free Space (in MB)"
    
FROM INFORMATION_SCHEMA.TABLES
WHERE INFORMATION_SCHEMA.TABLES.TABLE_SCHEMA = @table_schema;

We’re only querying the INFORMATION_SCHEMA.TABLES table here. This table should give us more than enough details about all tables on the server. Please note that I’ve filtered only tables from the "classes_and_students" database using the SET variable in the first line and later using this value in the query. Most tables contain the columns TABLE_NAME and TABLE_SCHEMA, which denote the table and schema/database this data belongs to.

This query will return the current size of our database and the free space reserved for our database. Here is the actual result:

Size of database

As expected, the size of our empty database is less than 1 MB, and the reserved free space is much greater.

Table Sizes and Properties

The next interesting thing to do would be to look at the sizes of the tables in our database. To do so, we’ll use the following query:

SET @table_schema = "classes_and_students";

SELECT 

	INFORMATION_SCHEMA.TABLES.TABLE_NAME,
    ROUND(SUM( INFORMATION_SCHEMA.TABLES.DATA_LENGTH + INFORMATION_SCHEMA.TABLES.INDEX_LENGTH ) / 1024 / 1024, 2) "Table Size (in MB)",
	ROUND(SUM( INFORMATION_SCHEMA.TABLES.DATA_FREE )/ 1024 / 1024, 2) AS "Free Space (in MB)",
	MAX( INFORMATION_SCHEMA.TABLES.TABLE_ROWS) AS table_rows_number,
	MAX( INFORMATION_SCHEMA.TABLES.AUTO_INCREMENT) AS auto_increment_value

FROM INFORMATION_SCHEMA.TABLES
WHERE INFORMATION_SCHEMA.TABLES.TABLE_SCHEMA = @table_schema
GROUP BY INFORMATION_SCHEMA.TABLES.TABLE_NAME
ORDER BY 2 DESC;

The query is almost identical to the previous one, with one exception: the result is grouped on the table level.

Here is a picture of the result returned by this query:

Result returned by query

First, we can notice that all eight tables have a minimal “Table Size” reserved for table definition, which includes the columns, primary key, and index. The “Free Space” is equally distributed between all tables.

We can also see the number of rows currently in each table and the current value of the auto_increment property for each table. Since all tables are completely empty, we have no data and auto_increment is set to 1 (a value that will be assigned to the next inserted row).

Primary Keys

Each table should have a primary key value defined, so it is wise to check if this is true for our database. One way to do this is by joining a list of all tables with a list of constraints. This should give us the info we need.

SET @table_schema = "classes_and_students";

SELECT 

	tab.TABLE_NAME,
    COUNT(*) AS PRI_number

FROM INFORMATION_SCHEMA.TABLES tab
LEFT JOIN (
    SELECT

        INFORMATION_SCHEMA.COLUMNS.TABLE_SCHEMA,
        INFORMATION_SCHEMA.COLUMNS.TABLE_NAME

    FROM INFORMATION_SCHEMA.COLUMNS
    WHERE INFORMATION_SCHEMA.COLUMNS.TABLE_SCHEMA  = @table_schema
    AND INFORMATION_SCHEMA.COLUMNS.COLUMN_KEY = 'PRI'
) col 
	ON tab.TABLE_SCHEMA = col.TABLE_SCHEMA
    AND tab.TABLE_NAME = col.TABLE_NAME
WHERE tab.TABLE_SCHEMA = @table_schema
GROUP BY 
	tab.TABLE_NAME;

We’ve also used the INFORMATION_SCHEMA.COLUMNS table in this query. While the first part of the query will simply return all tables in the database, the second part (after LEFT JOIN) will count the number of PRIs in these tables. We used LEFT JOIN because we want to see if a table has 0 PRI in the COLUMNS table.

As expected, each table in our database contains exactly one primary key (PRI) column.

PRI

“Islands”?

“Islands” are tables that are completely separated from the rest of the model. They happen when a table contains no foreign keys and isn’t referenced in any other table. This really shouldn’t occur unless there is a really good reason, e.g. when tables contain parameters or store results or reports inside the model.

SET @table_schema = "classes_and_students";

SELECT 

	tab.TABLE_NAME,
    (CASE WHEN f1.number_referenced IS NULL THEN 0 ELSE f1.number_referenced END) AS number_referenced,
    (CASE WHEN f2.number_referencing IS NULL THEN 0 ELSE f2.number_referencing END) AS number_referencing

FROM INFORMATION_SCHEMA.TABLES tab
LEFT JOIN 

-- # table was used as a reference
(
    SELECT 
        INFORMATION_SCHEMA.KEY_COLUMN_USAGE.REFERENCED_TABLE_SCHEMA,
        INFORMATION_SCHEMA.KEY_COLUMN_USAGE.REFERENCED_TABLE_NAME,
        COUNT(*) AS number_referenced
    FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE 
    WHERE INFORMATION_SCHEMA.KEY_COLUMN_USAGE.REFERENCED_TABLE_SCHEMA = @table_schema
    GROUP BY
        INFORMATION_SCHEMA.KEY_COLUMN_USAGE.REFERENCED_TABLE_SCHEMA,
        INFORMATION_SCHEMA.KEY_COLUMN_USAGE.REFERENCED_TABLE_NAME
) f1 
	ON tab.TABLE_SCHEMA = f1.REFERENCED_TABLE_SCHEMA
    AND tab.TABLE_NAME = f1.REFERENCED_TABLE_NAME

LEFT JOIN

-- # of references in the table
(
    SELECT 
        INFORMATION_SCHEMA.KEY_COLUMN_USAGE.TABLE_SCHEMA,
        INFORMATION_SCHEMA.KEY_COLUMN_USAGE.TABLE_NAME,
        COUNT(*) AS number_referencing
    FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE 
    WHERE INFORMATION_SCHEMA.KEY_COLUMN_USAGE.REFERENCED_TABLE_SCHEMA = @table_schema
    AND INFORMATION_SCHEMA.KEY_COLUMN_USAGE.REFERENCED_TABLE_NAME IS NOT NULL
    GROUP BY
        INFORMATION_SCHEMA.KEY_COLUMN_USAGE.TABLE_SCHEMA,
        INFORMATION_SCHEMA.KEY_COLUMN_USAGE.TABLE_NAME
) f2 
	ON tab.TABLE_SCHEMA = f2.TABLE_SCHEMA
    AND tab.TABLE_NAME = f2.TABLE_NAME    
    
WHERE tab.TABLE_SCHEMA = @table_schema;

What’s the idea behind this query? Well, we’re using the INFORMATION_SCHEMA.KEY_COLUMN_USAGE table to test if any column in the table is a reference to another table or if any column is used as a reference in any other table. The first part of the query selects all the tables. After the first LEFT JOIN, we count the number of times any column from this table was used as a reference. After the second LEFT JOIN, we count the number of times any column from this table referenced any other table.

The result returned is:

Table

In the row for the class table, the numbers 3 and 1 indicate that this table was referenced three times (in the lecture, exam, and on_class tables) and that it contains one attribute referencing another table (lecturer_id). The other tables follow a similar pattern, although the actual numbers will of course be different. The rule here is that no row should have a 0 in both columns.

Adding Rows

So far, everything has gone as expected. We have successfully imported our data model from Vertabelo to the local MySQL Server. All tables contain keys, just as we want them to, and all tables are related to each other – there are no “islands” in our model.

Now, we’ll insert some rows in our tables and use the previously-demonstrated queries to track the changes in our database.

After adding 1,000 rows in the lecturer table, we’ll again run the query from the “Table Sizes and Properties” section. It will return the following result:

Table Sizes and Properties

We can easily notice that the number of rows and auto_increment values have changed as expected, but there was no significant change in table size.

This was just a test example; in real-life situations, we would notice significant changes. The number of rows will change drastically in tables populated by users or automated processes (i.e. tables that are not dictionaries). Checking the size of and values in such tables is a very good way to quickly find and correct unwanted behavior.

Care to Share?

Working with databases is a constant pursuit for optimal performance. To be more successful in that pursuit, you should use any tool available. Today we’ve seen a few queries that are useful in our fight for better performance. Have you found anything else useful? Have you played with the INFORMATION_SCHEMA database before? Share your experience in the comments below.

 
 

Try our online database modeler. No registration. No commitments.

 
 
Tags
 
 
Subscribe to our newsletter

If you find this article useful, join our weekly newsletter to be notified about the latest posts.

 
 
 
New SQL Course! Learn the basics of T-SQL for SQL Server, retrieve data from an SQL Server database and build simple reports. View course Discover our other courses: