Storing sales data properly and later combining it can lead to creating a predictive model with a high rate of accuracy. In this and the next few articles we’ll analyze a database design for recording sales.
Everyone lives by selling something.
Robert Louis Stevenson
In today’s world, selling products is ubiquitous. And salespeople who have access to robust tools that leverage historical data to analyze trends and enable an enterprise to adjust business strategies accordingly have an advantage over their competitors. There are lots of parameters that can affect company results: the current global economic situation, clients’ location, age, material and marital status, and history of previous contacts or sales to clients.
We’ll start with a very simple example: a database model for sales in a coffee shop. In subsequent articles, we'll extend the model to selling products in other branches.
In this article we’ll analyze just a part of model that contains sales data with other parts missing.
We still have connections to missing tables and we’ll look at the model as a black-box assuming that the following is correct for table
user_has_role_idrefer to id in
user_has_role(as presented in my previous article in the section “Time component added”) and stores information about the user that created sale record
This model enables us to create sales records with multiple items. Each item is related to a product from our catalog. The moment when we generate a sale can be different from the moment when the sale is paid for. For example, for a cup of coffee these moments will differ in a matter of minutes or hours. If our shop sold telecommunication devices, the difference can be a few days, maybe even months.
Let’s take a look at table definition and explain the purpose and usage of attributes.
The most important table in the model is
product. It’s used to store details about products we’ll offer to our clients. Products are usually delivered to a client and paid for once, usually at delivery time. Also, products are usually physical objects like cars, phones, packages of sugar, or cups of coffee.
We’ll talk about selling non-physical things (services) in the next articles.
Attributes in the
product table are:
name– the name of the product in system
price_per_unit– cost of product per unit (e.g., 1 cup of coffee costs 1.8 Euro, 1 car costs 17,500 Euro, 1 kg of rice costs 2 Euros)
basic_unit– base unit when we’re selling a product (e.g., piece, kg, liter)
tax_percentage– percent of the price_per_unit to be charged as tax. We must assume that tax percentage wouldn’t be the same for all products
limited– this field is set to True if we have a limited quantity on stock and False otherwise (e.g., we can order any quantity we need for our store from a distributer)
in_stock– if limited=True this attribute shows how many we have available to sell
active_for_sale– if this attribute is False than we’re currently not offering that product for sale, otherwise we can offer it to clients
We can get a list of products we can offer to clients with the following query:
SELECT product.id, product.price_per_unit, product.basic_unit, product.limited, product.in_stock FROM product WHERE product.active_for_sale = True AND (product.limited = False OR (product.limited = True and product.in_stock > 0))
sale_status is just a simple dictionary that contains all statuses that a sale can have in the system (e.g., “receipt issued”, “receipt paid”).
saleis the second most important table in this model. So far, this table has no connection with clients to whom we sold products because, in our coffee shop example, we don’t need to know such information. In part 2, the model will be extended to cover such cases.
Attributes in the table and their meanings are:
time_created– time when a sale record was generated in the system (e.g., automatic time that the record was created when we generated a sale for coffee in our coffee shop or a manually added time if we want it so)
time_paid– generally we can expect that some sales will be paid within a few days or even a month after creation (e.g., if we deliver software and create a receipt we can wait up to 90 days to get paid in some countries, if everything goes by the law)
sale_amount– original amount intended to be charged to the client
sale_amount_paid– amount that the client actually paid. It can be null because at the moment we create a receipt we don’t always have this information
tax_amount– sum of all tax amounts for items on that receipt
sale_status_id– reference to
user_has_role_id– reference to user and his role at the moment he or she entered the receipt into system
We can get the issued and paid amount (according to time_created) within a period of time with query like this:
SELECT SUM(sale.sale_amount) AS amount_issued, SUM(sale.sale_amount_paid) AS amount_paid FROM sale WHERE sale.time_created >= @start_time AND sale.time_created <= @end_time;
To get the exact amount paid within a period of time we must use a query like this:
SELECT SUM(sale.sale_amount_paid) AS amount_paid FROM sale WHERE sale.time_paid >= @start_time AND sale.time_paid <= @end_time;
The query below will calculate the issued and paid amount within a period of time with the issue date and payment date checked separately:
SELECT SUM(CASE WHEN sale.time_created >= @start_time AND sale.time_created <= @end_time THEN sale.sale_amount END) AS amount_issued, SUM(CASE WHEN sale.time_paid >= @start_time AND sale.time_paid <= @end_time THEN sale.sale_amount_paid END) AS amount_paid FROM sale
In all examples
@end_time are variables containing the start time and end time of period for which we want to check issued and paid SUM.
sale_item connects products and sales. Of course, we must assume that we’ll have multiple items on one receipt so we need this table to have a many-to-many relationship.
Attributes and theirs meanings are:
quantity_sold– quantity of product that was sold and is charged on that sale/receipt (e.g., 3 coffees)
price_per_unit– same value as
product.price_per_unitat the moment when the sale was created. We have to save it because
producttable can change over time
price– product of
price_per_unit; a small redundancy that helps us to avoid this calculation in queries. Generally, the sum of all item prices belonging to the same sale should be equal to the
tax_amount– tax amount for that item on receipt
sale_id– id of sale that this item belongs to
product_id– product id related to this item
We could now easily make a simple report, how many products/services we sold in period and at what price.
SELECT product.name, SUM(sale_item.quantity_sold) AS quantity, SUM(sale_item.price) AS price FROM sale, sale_item, product WHERE sale.id = sale_item.sale_id AND sale_item.product_id = product.id AND sale.time_created >= @start_time AND sale.time_created <= @end_time GROUP BY product.id