Tag: ETL

Using Python and MySQL in the ETL Process: Using Python and SQLAlchemy

In the previous two articles of this series, we discussed how to use Python and SQLAlchemy to perform the ETL process. Today we’ll do the same, but this time using Python and SQL Alchemy without SQL commands in textual format. This will enable us to use SQLAlchemy regardless of the database engine we’re connected to. So, let’s start. Today we’ll discuss how to perform the ETL process using Python and SQLAlchemy.

Using Python and MySQL in the ETL Process: SQLAlchemy

SQLAlchemy helps you work with databases in Python. In this post, we tell you everything you need to know to get started with this module. In the previous article, we talked about how to use Python in the ETL process. We focused on getting the job done by executing stored procedures and SQL queries. In this article and the next, we’ll use a different approach. Instead of writing SQL code, we’ll use the SQLAlchemy toolkit.

Using Python and MySQL in the ETL Process

Python is very popular these days. Since Python is a general-purpose programming language, it can also be used to perform the Extract, Transform, Load (ETL) process. Different ETL modules are available, but today we’ll stick with the combination of Python and MySQL. We’ll use Python to invoke stored procedures and prepare and execute SQL statements. We’ll use two similar-but-different approaches. First, we’ll invoke stored procedures that will do the whole job, and after that we’ll analyze how we could do the same process without stored procedures by using MySQL code in Python.

The Star Schema

Today, reports and analytics are almost as important as core business. Reports can be built out of your live data; often this approach will do the trick for small- and medium-sized companies without lots of data. But when things get bigger – or the amount of data starts increasing dramatically – it’s time to think about separating your operational and reporting systems.Before we tackle basic data modeling, we need some background on the systems involved. We can roughly divide systems in two categories: operational and reporting systems. Operational systems are often called Online Transaction Processing (OLTP). Reporting and analytical systems are referred to as Online Analytical Processing (OLAP). OLTP systems support business processes. They work with “live” operational data, are highly normalized, and react very quickly to user actions. On the other hand, the primary purpose of the OLAP systems is analytics. These systems use summarized data, which is usually placed in a denormalized data warehousing structure like the star schema. (What is denormalization? Simply put, it’s having redundant data records for the sake of better performance.