Back to articles list

August 18, 2016 - 10 minutes read

Design patterns

Does Evolving Contact Information Mean Changing Your Database?

Jean-Marc Reynaud

Freelance Data Modeler

Tags:

There are a number of ways to contact someone these days, right?

We have various phones: mobile and landline, personal and work. We have different addresses – residential, mailing, billing, business, etc. – and likely several email addresses, too. Don’t forget Skype and various messaging apps. Now add in LinkedIn and Facebook –which by the way, both have their own messaging elements.

Not that long ago, many of these didn’t exist. So you can pretty much guarantee that in a few years, we’re going to have some new way of contacting people and organizations.

Can we model all of this contact info in such a way that we don’t have to change our database design when ‘the latest thing’ comes along? Read on to find out…

The Party Contact Point Model

In a word, yes. Databases can be designed to accommodate information we don’t even have yet.

I’m going to jump straight in and show you the solution, then I’ll describe how the pieces work together. I’m going to call the various ways of contacting parties contact points, though I have seen contact methods and even contact locations used.

Physically, all these contact points will be stored in a single table column, contact_point.contact_value. Think of a phone number, an email address, or a web address (URL) and you’ll understand why we can store them all here; they’re just strings (varchars) at this level. The differentiation is in the metadata. The only exception to this is the postal address, which will be described in more detail later.

The yellow tables on the left contain metadata, and the blue tables on the right contain business data.

The Major Categories

Although we have many ways of contacting someone, these ways actually fall into a small number of categories, or types. You’ll see what I mean when you look at the list below:

Contact Point Type
Phone Number (landline)
Mobile Number
Fax Number
Email Address
Postal Address
Web Address
Pager

In a sense, these are physically distinct. Of course, you can use a mobile phone to call a landline or another mobile. When it comes to voice calls between landlines and mobiles, the distinction is not that important. Still, we’re more likely to send a text (SMS) to a mobile than a landline.

But you’re not likely to deliberately voice call a fax number. After all, what are you going to say to it when you hear it, apart from ‘Oops, wrong number’? You are naturally much more likely to call with another fax machine, whether it’s physical or emulated. Neither would you send a letter to landline, or attempt to make a voice call to a postal address.

It’s important that we distinguish these types, because we interact differently with them. This will be especially true if your application has any sort of integration with communications services. It needs to know which type to interact with.

How Parties Use Contact Points

This is probably a bit more intuitive, a bit more in line with how we think about contact types. Here’s a longer list (but not an exhaustive one!) that will help you get a feel for these types:

Party Contact Type (Contact Point Type)
Conference line (Phone Number)
Billing address (Postal Address)
Delivery address (Postal Address)
Direct line (Phone Number)
Holiday/vacation address (Postal Address)
Holiday/vacation phone (Phone Number)
Home address (Postal Address)
Home phone (Phone Number)
Home phone/fax (Phone Number)
LinkedIn profile (Web Address)
Main address (Postal Address)
Main email (Email Address)
Main fax (Fax Number)
Main phone (Phone Number)
Main website (Web Address)
Personal email (Email Address)
Personal fax (Fax Number)
Personal mobile (Mobile Number)
Personal pager (Pager)
Personal website (Web Address)
Secondary address (Postal Address)
Secondary phone (Phone Number)
Social media profile (Web Address)
Work address (Postal Address)
Work email (Email Address)
Work fax (Fax Number)
Work mobile (Mobile Number)
Work phone (Phone Number)

The Postal Address – A Special Case

All of these contact point types are stored in a single field, with the exception of a postal address. This normally requires a number of lines (or fields).

There is a blog article here that proposes a simple, language-agnostic way to store postal addresses. If your requirements are rather basic – e.g. to print address labels pretty much as they are entered into the system – this approach will likely suffice. If your needs are more sophisticated, you will probably have to develop a different solution.

To get an idea of how complex addressing can be, have a quick look at this Schema for British Standard BS7666 Address Types. The standard comprises a number of parts covering Street Gazetteers, Land and Property Gazetteers, and delivery points. It does not differentiate between commercial or residential properties; between occupied, developed or vacant land; between urban or rural areas; or between postally-addressable entities and non-postally-addressable entities such as communications masts (towers). To achieve this, it introduces terms that most of us are probably not familiar with, such as Primary Addressable Object (PAO), which is the name given to an addressable object that can be addressed without reference to another addressable object. Familiar examples of PAOs include a building name or a street number. A Secondary Addressable Object (SAO) is given to any addressable object that is addressed by reference to a PAO. This might be the first floor of a named building.

To give us a visualization of this, I quickly reverse-engineered it into a UML modelling tool. Here’s what we get:

My point is that it can get pretty complicated and messy; addressing in some domains can be very complex indeed.

If you were to flatten this out into a single relational table, you’d get something like the following:

While this captures BS7666 address components, it doesn’t tell you how the model works. All the relational logic of the XML schema gets hidden away in application logic.

These two diagrams represent two data modeling extremes. But is there a middle way to model addresses?

It is indeed possible to have a relatively simple address model that is flexible and configurable.

Address Components

An address component is typically a line on an address label, or rather a type of line on an address label. The sort of components we’d typically use for UK addresses are listed in the following table:

Address Component Type
Addressee
Area
Building Name
Building Number
Country
County
Department Name
Dependent Locality
Dependent Thoroughfare Name
Double Dependent Locality
International Post Code
Level
Locality
Mailsort SSC
Organization Name
PAO End Number
PAO End Suffix
PAO Start Number
PAO Start Suffix
PAO Text
PO Box
Post Code
Post Town
Postcode
Postcode Type
SAO End Number
SAO End Suffix
SAO Start Number
SAO Start Suffix
SAO Text
Street
Street Description
Sub-Building Name
Thoroughfare Name
Town

You could have three or four address lines, plus the post town and postcode. However, the difficulty you will encounter is identifying what these lines actually contain when it matters – e.g. when mapping data between systems. When you carry out data profiling, you will find that Address Line 3 sometimes contains a dependent locality, but at other times it contains a county or locality. Now you’re into natural language processing (NLP); you have to recognize the difference between locality and county. And the permutations multiply as you add more countries.

So we must define all the address components for all the countries we operate in.

Address Formats

Address formats are made up of two parts: a header and its detail. The header is basically the name or title that the address format is known by. Examples could include:

Address Format Type
Generic 3-line
Generic 5-line
British Forces Post Office (BFPO)
International
Post Office Address (PAF)
U.S. Address
French Address

Taking the UK’s Full Post Office Address Format (PAF) by way of example, we then define the following address format components:

Format	Component	Sequence	Is Mandatory?
PAF	Addressee	1	N
PAF	Organization Name	2	N
PAF	Department Name	3	N
PAF	PO Box	4	N
PAF	Building Name	5	N
PAF	Sub-Building Name	6	N
PAF	Building Number	7	N
PAF	Thoroughfare	8	N
PAF	Street	9	N
PAF	Double Dependent Locality	10	N
PAF	Dependent Locality	11	N
PAF	Post Town	12	Y
PAF	Postcode	13	Y

Our application reads this metadata and displays the address components in the correct order. When address capture is required, the metadata tells us whether the address component is mandatory or not.

More often, our application requests the postcode from the end user and looks up the corresponding values and populates the address components automatically. Some applications allow the user to edit the address; other [annoying] ones don’t!

It’s not shown in the PDM, but if your organization operates internationally, you can define a many-to-many relationship between address_format_type and country so that the correct address format (based on the user’s country) is presented to the end user (party).

When and only when the contact_point is a postal address contact_point_type, it must have a relationship to an address_format_type. Conversely, it follows that non-postal address types never have a relationship to an address_format_type. Furthermore, the format must remain fixed for the life of the contact_point, otherwise you’ll introduce the possibility of data integrity issues. (For this not to be the case, the target address_format_components must be a subset of the source address_format_components).

The column contact_value has no meaning for a postal address because the values are stored in address_line.line_content. Conversely, contact_value is mandatory for all other contact_point_types. Basically, contact_point.contact_value and address_line.line_content are mutually exclusive.

The Many-To-Many Relationship Between Party and Contact Point

You can think of contact_point (plus address_line) as containing the values and party_contact as defining the usage. This allows a single contact_point to have multiple uses. Our home [postal] address could also be our billing address and delivery address, depending on the context.

So far, the narrative has assumed that a party owns a particular contact_point. But the data model does not impose this ownership rule! It makes no such restriction whatsoever. There’s another possibility that exists with this design: multiple parties for the same contact points.

You need to consider the implications carefully before venturing down this route.

Here’s an example. In the UK, Awarding Organizations (AOs) generally employ teachers as examiners. A teacher has two relationships: one with the school where he or she works, and another with the AO as an examiner. The school will have a bank of contact_points with various phone numbers and possibly one or more postal addresses. These will be things like the school’s main address (postal address), main email (email address), main fax (fax number), and main phone (phone number).

It is entirely feasible that our examiner can use the very same contact_points as his or her school, but he or she will use party_contact to define them as work-related. If the school’s main phone number changes, the teacher’s work number will automatically be updated, which is pretty neat.

If you go down this route, you will need to define at the application level which party or parties are permitted to update contact_points.

A Quick Word On Performance

The yellow metadata tables are going to be constantly used by queries. Consequently, they are likely to remain in memory. On most RDBMSs, you can pin tables into memory to ensure this. In Oracle, I would create these as index-organized tables, which are small and perform well. Do whatever the equivalent is for your RDBMS.

You also want to ensure that party_contact rows are co-located in the same block (or page) using a clustered index on party_id. Do the same with address_line.contact_point_id. This cuts down on the amount of IO.

Another option exists if you want a party to exclusively own a contact_point. You can then can merge contact_point into party_contact to create party_contact_point (still clustered on party_id). This simplifies the model and could aid performance.

Changing Contacts Doesn’t Mean Changing Databases

We live in a time when it can be said that change is the only constant.

That doesn’t mean that each time something changes it has to impact your database. With a bit of thought, we can future-proof our designs – perhaps more than we have done to date. Doing so helps us to respond quickly to the inevitable change.

If you’re embarking on a green-field project, I would recommend using the Party Model (of which Contact Point is a part) for organizations and people. Why not open up the model and tweak it to your needs? Please feel free to grab a copy and make it your own.

But if your database or databases are already determined, the schema I’ve presented here can still be used, in XML form, to define your payload when integrating data between systems.

Tags:

The Party Contact Point Model

The Major Categories

How Parties Use Contact Points

The Postal Address – A Special Case

Address Components

Address Formats

The Many-To-Many Relationship Between Party and Contact Point

A Quick Word On Performance

Changing Contacts Doesn’t Mean Changing Databases

You may also like

Party Relationship Pattern. How to Model Relationships

Identifying the Bill of Materials (BOM) Structure in Databases

The Reference Data Pattern: Extensible and Flexible

Database Modeling Tips

The Star Schema

Tips for Better Database Design