Very few database authors mention the challenges of globalization and localization in any meaningful way. There’s a similar lack of foresight from database architects. The fact is that many authors and designers are frequently very ‘self-centric’: they create (or write about) data models that only properly handle their local time zones, addresses, etc.
A self-centric approach has a big problem: the resulting model will only support local data. In today’s Internet-fueled world, applications are often unexpectedly accessed by users around the globe. We need to support as much flexibility as possible for this international audience. Therefore, we need to design our data models with a globalized approach.
I’m fortunate enough to work in a very multi-national and multi-lingual environment, so I’ve learned how globalization issues can be handled at the project’s start. With that in mind, I’ve put together seven important points for creating a data model that will support international use.
1. Number Formatting
There are two things to consider when you look at number formatting: the output that users see (i.e. the format), and the underlying data type.
You shouldn’t need to worry about how numbers will be displayed in your data model – the database system will handle the storage of decimal numbers and your application should adjust to how decimal numbers are displayed (“.” or “,” as the decimal point, for instance). Similarly, you should not have to worry about which thousands separator (such as a period or comma) your data model will use.
Here’s the point: Choose your data types correctly when modelling. Your application should handle the output formatting.
For example, in this simple model of a weather station application, the weather measurements (temperature, humidity, rainfall) are stored as floating-point numbers. But the price information is in decimal, similar to the GPS coordinates of each weather station.
2. Currencies and Exchange Rates
If you are storing information related to currencies, then you must store the correct number of decimal places for each currency. Most currencies have two decimal places, but some have none (i.e. the Chilean peso), one (the Malagasy ariary), three (the Tunisian dinar) or even four decimal places (Chile’s Unidad de Fomento, a unit of account used to express a range of price values.)
So, be sure that your “amount” fields in the data model support more than two decimal places – while four decimal places is very rare, it does happen. Three decimal places is more common. For example, dinars in six different countries (Bahrain, Iraq, Jordan, Kuwait, Libya, Tunisia) and rial in one country (Oman) require three decimal places.
Point Number 1: Choose your data type correctly when modelling.
Another important point related to currencies is exchange rates. These demand even more precision. Many systems only provide 4-6 significant digits in exchange rates; sometimes there is a scaling factor between currencies which have vastly different values. However, four or six significant digits does not necessarily mean there will be six decimal places. Check the exchange rate between Indonesian Rupiah and Euros: 0.0000668755. That’s a lot of decimal places to store in your exchange-rate field.
Point Number 2: Your model may need to handle a high level of precision – a lot of decimal places – when it comes to exchange rates.
Below we’ve created an example of an online store with prices. We’ve also added a simple table (highlighted in aqua) that stores currency exchange rates, including historical exchange rates. Each row in the
exchange_rate table is linked to a currency (
currency_id, which is the ISO 4217 currency code). We allow one exchange rate to be stored for each currency on a particular day (
rate_date), and have one active exchange rate for each currency.
Obviously, you’ll need some additional information to use this rate table. For example, against what base currency are these exchange rates defined? Euros or US dollars might be typical, but your application would need exact information here.
Alternatively, a more complex model could store currency-pairs, the mid-rate (or bank rate) and the ‘buy’ and ‘sell’ rates between those currency pairs.
Point Number 3: Your model needs to have enough information so that the application can properly use the data.
3. Phone Numbers
I have often seen systems that only support a North American ten-digit phone number with a three-digit area code, three-digit exchange, and four-digit subscriber number (i.e. 012-345-6789). This bias is understandable to some extent; people create systems that support their local users. However, data modeling should not ignore the possibility that global users might access your system. (Note: The ten-digit length can also be used for other numbers, such as French mobile phones, but the format is different (i.e. 06 12 34 56 78).)
Let’s take this as an example: Suppose I live just outside of the French border, but I work in France. Therefore, while I might need to use French applications and service providers, my mobile phone number is not a French number. Systems that require a ten-digit mobile number starting with 06 or 07 are not going to work for me. In order to get French railway tickets, buy tickets to a concert in France (etc, etc), I would be forced to get a French phone number. Bothersome, to say the least.
Here’s the point: When phone number constraints are built into the data model, data model modifications will be required to support non-local users. Ideally, enough flexibility should be incorporated into the model to handle all eventualities.
A more logical data model would support phone numbers of different lengths (up to 16 digits in some areas) and non-numeric characters (like the universal “+” symbol for an international phone number). Certainly, some applications can perform more validation by implementing “local rules” that local developers would be more familiar with. Other apps might use local phone numbers to access other data sources, such as verifying or retrieving an address based on a phone number.
The data model should support flexibility in storing information. The application or its UI can be more restrictive or perform additional validation.
As an American living abroad, I often find data model examples and patterns that are too Amero-centric. For example, a non-American might not understand what a Zip+4 is and therefore would have no understanding of why an author states that this domain must have a NOT NULL characteristic.
This Amero-centric view is even present in books. For example, take the quite extensive book “Data Model Patterns. Conventions of Thought” by David C. Hay. Mr. Hay’s very complex explanation of addresses, sites, geographic locations, land parcels, and geographic structure elements included examples from Canada, but even so, this information may not be readily grasped by everyone.
Mr. Hay’s patterns say the address attributes will include:
The "text" of the address, plus at least "city", "state", and "postal (ZIP) code".
Now, Mr. Hay is quick to explain in a footnote that:
The context of the model will determine whether this attribute is "ZIP code" or "postal code". If the client organization will operate entirely within the United States for the foreseeable future, the assumption of a nine-digit, two-part numeric "ZIP code" can be made. If not, "ZIP code" must become "postal code" and no formatting assumptions are possible.
However, he fails to mention that “state” might be a state in the US, a province in Canada, or a nullable attribute for almost any other country, as “states” within countries rarely exist outside of the US, Canada (where they are called provinces but function similarly) and Australia. Certainly, other countries have provinces and regions, but these are rarely used as part of an address.
To illustrate how serious this address issue can be, I made a data model for both American and non-American addresses. (Note: This is not the complete model.)
PrimaryPhone of the
US_Customer table only stores phone numbers with ten or fewer characters. The international design of the
PrimaryPhone attribute allows a phone number of 15 digits plus “+”, which is the maximum specified by E.164.
TimeOffset attribute in the
US_Customer table only allows four time zones: Eastern Time, Central Time, Mountain Time and Pacific Time (stored in the data model as: 0 = Eastern, 1 = Central, 2 = Mountain, 3 = Pacific). Incidentally, this doesn’t even cover all the time zones in the US and its territories. In contrast, the
Timezone attribute in the
Customer table stores the international code for the customer’s time zone regardless of where it is.
Next, let’s look at the postal/Zip code. The US requires a 5-digit ZIP code (
Zip of the
US_Address table) plus the optional ZIP+4 (
Zip4 of the
US_Address table). The
Address table has a more flexible
PostCode field. Aside from length, there is no constraint on the information that may be stored in
PostCode; of course, the application could implement checks on postal codes.
Notice also that the US and the non-US designs both have a field for regions within a country (
State in the
US_Address table and
Region in the
Address table), but the US design requires that a 2-character state abbreviation is included. Also, notice that the US design does not accept international addresses, while the international address version does (hence the 2-character ISO country code Country of the
Here’s the point: Don’t limit your data model of addresses to one locality; leave enough room for different styles.
5. Date and Time Formatting
Data models shouldn’t be concerned with multiple date and time formats; the application handles the actual display. This can be done in various ways:
- The month-first style, common in North America and elsewhere: mm-dd-yyyy
- The day-first style, which is more common in Europe: dd-mm-yyyy
- The year-first style, used as the ISO 8601 date format: yyyy-mm-dd
Here’s the point: This might be repetitive, but we’ll say it again: choose your data types correctly when modelling. This will make it easier for the application code to interpret and display stored values.
Another item in this category might be a bit unexpected: what day the week starts on. For some, this is Sunday; for others, it is Monday (and then there’s the Persian calendar, which starts the week on Saturday).
Times must also be displayed in a user-friendly way. Remember that while your data model doesn’t need multiple time formats, you might store the user’s time preference, i.e. the 12- or 24-hour format.
This leads us to time zones.
6. Time Zones
It’s not unusual to find apps that only allow users a few time zone choices: Eastern Time, Central Time, Mountain Time and Pacific Time. When I see that, I know that I am dealing with an Amero-centric application designer. Some designers allow a time zone to be expressed as an offset from (usually) GMT or UTC. However, many make the mistake of allowing only whole-number offsets, not realizing that some countries (India, Iran, Pakistan, Afghanistan, and others) are not an integer offset. They are fractionally different: India Standard Time is UTC+5:30. Some locations even have a smaller fractional offset, such as Nepal Standard Time – it’s UTC+05:45.
Some time ago, I wrote about a model for an online survey database. Here I have added a time zone to the user table so that we can display dates/times in the users’ local times.
For more information about dates, times, time zones, you can refer to the ISO 8601 standard representation of dates and times or this Wikipedia article.
Here’s the point: Learn to think globally, not just locally.
7. Multi-Lingual Support
There are numerous times when your application may need to provide multi-lingual support. From a data model perspective, you might need to have information stored in multiple languages; however, most of the linguistic handling should be covered in your application. The implementation of multi-lingual support is beyond the scope of this article, but we have already discussed it in this blog.
Localization is very important and needs to be handled properly. As we’ve already pointed out, this means more than just supporting diverse languages; it’s also about the preferred formats for dates, times, currencies, and even decimal indicators.
Data Modeling? Think Globally
While creating your data model, consider the potential international usage of your application and its database. Think globally during the design phase and you’ll avoid some problems later on – a phone number field that only accepts 3-digits + 3-digits + 4-digits works fine in the USA, but not so well in China or India.
Are your databases prepared to go global? Your goal should be to enable flexibility without creating overwhelming complexity.