In my last post, I wrote about ensuring that your data model properly handles global information: numbers, currencies, phone numbers, addresses, dates, and time zones, among other things. However, I’ve realized that many example data models have exactly the “self-centric” or “Amero-centric” approach that I cautioned against.
As an American living abroad (for almost 30 years now), I often find that people make too many assumptions about the universality of what they know. For example, some Americans assume that others automatically understand their country’s ZIP code system, and its supplementary ZIP+4 version. In my experience, most of the world has no idea what a ZIP+4 is.
So I thought that it would be good to write about a more global approach to modelling addresses.
Addresses Known Around the World
Let’s take some well-known and some less well-known addresses around the world and create a data model that supports their storage.
I’ll start with the address of the President of the United States: the White House.
1600 Pennsylvania Ave NW
Washington, DC 20500
Now, 1600 is what is generally referred to as the street number or house number. That’s the number of the house (building) on the street. Pennsylvania Avenue is the name of the street. “NW” is the abbreviation of Northwest and refers to the section of Washington where this particular address is located. Interestingly, 1600 Pennsylvania Avenue SE is a different address altogether; it’s an apartment building.
Washington, then, is the city name. And DC, or District of Columbia, is the “state”; however, DC is special among the American states. Washington D.C. is the capital of the United States, but it is not properly termed a state, and it does not count as one of the 50 American states. (There is a Washington state, but it is located on the completely opposite coast from D.C. You might know it as the home of Starbucks, Boeing, and other famous companies.) Our point is that in the US postal system, “DC” functions as any other state abbreviation.
Now, that number: 20500. In most countries, this would be referred to as a post code or postal code; Americans call it the ZIP code. ZIP (Zone Improvement Plan) codes were introduced in the US in 1963. We see that for the White House, the ZIP code is 20500.
In 1983, the US Postal Service wanted to be more precise so they created an extended ZIP+4 code. This ZIP+4 includes the five digits of the ZIP code, a hyphen, and four additional digits that point to a specific location within the ZIP code. The ZIP+4 of the U.S. President is 20500-0001. The ZIP+4 of the First Lady (the President’s wife) is 20500-0002. And the ZIP+4 of the White House is 20500-0003.
While the ZIP+4 can be quite precise, you will rarely find a ZIP+4 that is used for a single person. They do often refer to a post office box (for a large company), a range of post office boxes, or a neighborhood within a city.
Beverly Hills 90210
Depending on your age, you might know the US TV series 90210 or Beverly Hills 90210. This popular teen soap opera is another good example of the U.S. postal system at work. Some people think that it is the precise address of a neighborhood, a street, or even a particular house.
First, let’s start with some background. Beverly Hills is a city in California’s Los Angeles County. It is home to many Hollywood stars and the upscale shopping street Rodeo Drive (made famous in the 1990 movie Pretty Woman).
One of the ZIP codes in Beverly Hills is 90210, but the city of Beverly Hills also uses other ZIP codes (90209, 90211, 90212, and 90213). The 90210 code refers to one of the posher neighborhoods of Beverly Hills, and therefore to an exclusive neighborhood in an upscale part of Los Angeles.
But the 90210 ZIP code extends beyond Beverly Hills. It includes some of the adjoining Hollywood Hills. You know the hills where the “HOLLYWOOD” sign is located? That’s in the 90210 ZIP code.
The remaining section of this ZIP, stretching north to Mulholland Drive (immortalized in David Lynch’s 2001 psychological thriller), was left as part of the hills of Los Angeles, where it remained anonymous for decades.
Buckingham Palace has a simple address, so if you’d like to write to Queen Elizabeth, just send a note to:
London SW1A 1AA, United Kingdom
In this address, we find an odd type of postal code. In the UK (plus the countries that make up the British Commonwealth and most former British colonies), the postal code is alphanumeric. Contrast this with the all-numeric postal code found in other many other countries, including the ZIP codes of the U.S.
In the UK, the post code usually contains a one or two-letter “city code” named for a local town or an area of London. In our example, this is the “SW”, which refers to the south-western part of London. It is followed by one or two digits signifying a district in that region – the “1A”. Then there is an arbitrary code consisting of one number and two letters, which in this case is “1AA”.
You can find the British Prime Minister at the address known as “Number 10”, which takes its name from the building’s street number:
10 Downing St
London SW1A 2AA, United Kingdom
Like Buckingham Palace, “Number 10” is located in the south-western part of London. Post codes in some UK cities can even help you recognize in what part of the city an address is located.
On the other hand, you can generally find the French President at this address:
55 Rue du Faubourg Saint-Honoré 75008
This example follows a familiar pattern: street number (55) and street name (Rue du Faubourg Saint-Honoré) come first.
Next, we find a 5-digit postal code. In French post codes, the first two digits indicate in which “département” (region) the address is located. For Parisian postal codes, the last three digits indicate the “arrondissement” (neighborhood) of the address. So, in this instance, the postal code indicates the neighborhood of the address.
Less Well-Known (But Complicated) Addresses
Enough simple addresses. In some countries, it is rare to use street numbers. Instead, addresses might specify that a location is next to a well-known site. Or they might just state the name of the building and the neighborhood in which it is located.
Here is the address of a Hyatt hotel in the Indian city of Pune, in the state of Maharahashtra.
Adjacent to Aga Khan Palace, Nagar Road, Kalyani Nagar
Pune, India, 411006
Here we find another postal code, but in this case it is not 4 digits, nor 5 digits, nor 5+4 digits, nor a mix of digits and letters. It’s 6 digits. The Indian postal code starts with a region code, then a sub-region code and a code representing the sorting district. That is not very useful information if you are trying to find this location in person!
In this case, there is no mention of the state in which Pune is located. However, when several cities share the same name but are located in different Indian states, a state can be included in the address.
We find a street name (Nagar Road) and the indication of the neighborhood (Kalyani Nagar), but there is no actual street number. You could walk up and down Nagar Road looking for this location. Thus we find the location is specified as being adjacent to a well-known site – the Aga Khan Palace.
I will take an example from one more Hyatt hotel, which you may have seen in the film “Lost In Translation”:
Park Hyatt Tokyo
3-7-1-2 Nishi Shinjuku, Shinjuku-Ku
Tokyo, Japan, 163-1055
Just to state the obvious, I’m using Western formatting and an English translation rather than Japanese characters here.
Again, you can find elements of neighborhood, city, and postal code in this address. “Nishi-Shinjuku” (西新宿) is the skyscraper business district in the Shinjuku neighborhood of Tokyo.
Now, on what street is this hotel located? From the address, you have no clue. Street names are seldom used, and most Japanese streets do not even have names (except in a few cities, such as Kyoto).
And don’t try walking down the street looking for buildings marked 3-7-1-2. These numbers refer to districts, sections, sub-sections and specific city blocks (in this case, within Tokyo).
The Japanese address system, when written in Western style, starts with the smallest entity (typically a house number) and proceeds to the largest. You will often find “Chome” in the address, which is a city district (丁目 chōme).
Why Self-Centric Address Modelling?
One problem with address modelling is when an author creates a model that only supports his local address properly. I’ve explained this in my other post, so I’ll just summarize the main problems:
- A data model that is designed for a local audience often can’t accept information from a different country. For example, if someone from the UK tried to enter their eight-digit alphanumeric post code into a five-digit, numeric-only American ZIP field, it wouldn’t work.
- An international audience may not understand or use the same terms, formats, and layouts that a local audience does.
So why does this happen? There’s a lack of foresight, but it also has to do with training. If data modelers are not taught to think globally, they’re almost always going to think locally. Not to belabor the point – again, I mentioned it in the post referenced above – but it’s common in many database modelling textbooks to focus on a Amero-centric approach.
Fortunately, thinking globally in terms of data modelling for multi-national addresses is not a hard skill to acquire. It has to do with forethought and flexibility.
The Solution: Global Address Modelling
I made a data model to store global addresses.
There are two generic address line fields (
AddressLine2 of the
Address table) that will store nearly any address. This should work whether we’re storing “10 Downing St”, “adjacent to Aga Khan Palace”, or “3-7-1-2 Nishi Shinjuku”. The risk is that two fields of 70 characters might be too short for some addresses. Still, 140 characters should be enough for almost every situation.
I have used the common convention of two address line fields so that I can make
AddressLine1 mandatory (not nullable) and
AddressLine2 optional (nullable). So this model requires that some address information is given, but it allows for both very short and very long, detailed address.
Next, let’s look at the postal/ZIP code. The
PostCode field is flexible and doesn’t have any constraints on the information that may be stored in it. This would support American ZIP codes, American ZIP+4, 4-digit postal codes, 5-digit postal codes, British post codes (with letters and digits) and many other types of post codes.
Region field stores regions within a country (i.e, states, provinces, departments, etc.) but this field is nullable. Of course, a global address model requires that the country be specified via a 2-character ISO country code (the
Country field in the
Now, please open the model and give it a tweak. That’s why we created the model in the first place. Maybe you can see ways to improve the model. Or maybe you need to adapt it to your specific needs. Please feel free to play with it and make it your own.
What is your address? Can it fit into this
Address table design? If not, let us know! It would be interesting to know what must be extended to support your address.