Cloud databases are becoming more and more popular. That’s why Vertabelo has added support for Google BigQuery, one of the most popular cloud databases. BigQuery is an enterprise data warehouse that enables the storing and very fast processing of huge data sets. It can be useful in data analytics, machine learning, and other areas.
In this article, we’ll explain how to create a physical ER diagram for a Google BigQuery database in Vertabelo. Remember that you can also choose to create a logical ER diagram first and then use that to generate a physical ER diagram. You can read more about moving from logical to physical ERDs here.
Create a Model
First, let’s see how to create a Google BigQuery model in Vertabelo. Log into Vertabelo and click on the Create new document icon.
A new window will appear. Click the Create button next to Physical data model.
In the Database engine section, choose Google BigQuery. In the Content section, choose the model you want to begin with. You have the following options:
- Empty – Starts with an empty diagram with no tables and views.
- Example – Starts with a simple shop diagram.
- From SQL – Imports a DDL file, which will be processed to extract tables and views. This is a useful option if you’re migrating an existing schema to Vertabelo.
- From XML – Imports a Vertabelo XML file. This can be useful for restoring old versions of models created in our Vertabelo.
After you’ve made your choice, click Start modeling.
Note: For more details, see the full documentation on working with tables in Vertabelo.
To create a table, click the Add a new table icon and type the table name. To add columns, click Add column in the Columns section in the right pane. To see column features, click on the arrow to the right of the column’s name and type.
Column features are very limited in Google BigQuery: you can only set a data type and add a comment.
To set a column data type, click on the gear icon .
You should see all the available data types:
Data types are grouped into five categories:
- Numeric (INT64, NUMERIC, FLOAT64).
- String (STRING).
- Date and time (DATE, TIME, DATETIME, TIMESTAMP).
- Large (BYTES).
- Other (BOOL, STRUCT, ARRAY, GEOGRAPHY).
Note that complex data types, such as ARRAY and STRUCT, can be nested:
ARRAY<INT64> STRUCT<INT64, STRING> ARRAY<STRUCT<INT64, INT64>> ARRAY<STRUCT<ARRAY<INT64>>>
To use complex data types, manually enter them in the Type field:
Vertabelo understands nested data types for BigQuery.
To see additional table properties, select the table. In the right pane, scroll down to the Additional properties section.
You should see all the table properties that are available:
Click Set to add a property.
Click Unset to remove it.
Let’s briefly discuss the role of each table property in BigQuery.
- Project name – The name of the project in which the table will be placed.
- Dataset name – The name of the dataset in which the table will be placed. This is the equivalent of a relational database schema. Together with Project name, this creates a full path that identifies the table in the database.
- Partition by – Determines how to partition the table. A partitioned table is split into separate segments, which are called partitions. Dividing a large table into smaller partitions makes it easier to improve query performance and control costs, as optimized queries read a smaller amount of data.
- Cluster by – Determines how to cluster the table. This property is a comma-separated list that can contain up to four clustering columns. Note that the order of the columns is important. When the table is clustered, the data inserted is sorted based on the clustering columns. Later on, when the data is queried, this can improve performance, as not all the rows will be scanned.
- Expiration timestamp – The timestamp when the table will expire and will be deleted. If this is not set, the table will exist indefinitely.
- Partition expiration days – The number of days until the partitions created in this table will expire and be removed. This property can only be set for partitioned tables.
- Partition filter – If this property is set to yes, all queries to this table need to include a partition filter. The role of the partition filter is to eliminate partitions that should not be queried. This property can only be set for partitioned tables.
- KMS key name – The name of the Cloud KMS (key management service) key that is used for table encryption.
- Friendly name – A descriptive name for the table that will be easy for humans to read and remember.
- Labels – Key-value pairs associated with the table. They are useful for organizing and grouping resources; you can search the resources based on label values.
Note: As there are no primary keys in BigQuery, you cannot set a primary key in the table options.
Note: You can read the full documentation on working with views in Vertabelo for more details.
To add a view, click the Add new view icon.
In the right pane, fill in the view name and SQL query.
Scroll down and click Update columns.
A new window will appear. Verify the columns that will be generated based on the SQL query you provided. Then, click Update columns.
For additional view properties, select the view. In the right pane, scroll down to the Additional properties section.
You should see all the view properties that are available:
Click Set to add a property.
Click Unset to remove it.
Let’s briefly discuss the role of each view property in BigQuery.
- Project name – The name of the project where the view will be used.
- Dataset name – The name of the dataset where the view will be placed. (In BigQuery, a dataset is the equivalent of a relational database’s schema.) Together with Project name, it creates a full path that identifies the view in the database.
- Expiration timestamp – The timestamp when the view will expire and be deleted. If this is not set, the view will exist indefinitely.
- Friendly name - A descriptive name for the view that’s easy for humans to read and remember.
- Labels - Key-value pairs associated with the view, which are useful for organizing and grouping resources. You can search the resources based on label values.
To generate documentation for a Google BigQuery database, click on the Generate model documentation icon in the top menu bar.
Next, choose one of the three available formats (PDF, HTML or DOCX) and click Generate.
After the file is generated, click Download.
BigQuery documentation contains lists of the following database objects:
- Tables and their columns
Upload a DDL File into Vertabelo
It can be useful to generate a model based on an existing DDL file. Google BigQuery does not generate a DDL file, but you can upload a Vertabelo-generated DDL file into our tool.
To import a DDL file into Vertabelo, click on the model name, then click Import, followed by Import from SQL.
Generate a Google BigQuery Migration Script
Vertabelo can automatically generate migration scripts for Google BigQuery databases. They can be useful when the database model has more than one version and you want to update your database from one version to another. Migration scripts contain SQL statements that update the database structure. You can learn more about migrations in Vertabelo in our documentation and on our blog.
In our example, we have added a new view. To generate a migration script, open Drive, right-click on the model name, and click Model migration.
Alternatively, click the Generate migration SQL icon.
A new browser tab will open. On the left, select the model and version from which you will migrate. On the right, select the model and version to which you will migrate. Note that you can select versions from different models. (In that case, make sure that the Show all models option is checked.)
Scroll down, select additional options and click Generate.
Verify the SQL statements generated and click Download.
Note that if you modify the table or view, the migration script generated by Vertabelo will drop the old table/view and create a new one.
Design Your Own BigQuery Schema with Vertabelo
In this article, we have explained all the BigQuery features available in Vertabelo. Equipped with this knowledge, you can now make the most of our tool when creating an ER diagram for BigQuery.