Does Evolving Contact Information Mean Changing Your Database?
There are a number of ways to contact someone these days, right?
We have various phones: mobile and landline, personal and work. We have different addresses – residential, mailing, billing, business, etc. – and likely several email addresses, too. Don’t forget Skype and various messaging apps. Now add in LinkedIn and Facebook –which by the way, both have their own messaging elements.
Not that long ago, many of these didn’t exist. So you can pretty much guarantee that in a few years, we’re going to have some new way of contacting people and organizations.
Can we model all of this contact info in such a way that we don’t have to change our database design when ‘the latest thing’ comes along? Read on to find out…
The Party Contact Point Model
In a word, yes. Databases can be designed to accommodate information we don’t even have yet.
I’m going to jump straight in and show you the solution, then I’ll describe how the pieces work together. I’m going to call the various ways of contacting parties contact points, though I have seen contact methods and even contact locations used.
Physically, all these contact points will be stored in a single table column,
contact_point.contact_value. Think of a phone number, an email address, or a web address (URL) and you’ll understand why we can store them all here; they’re just strings (varchars) at this level. The differentiation is in the metadata. The only exception to this is the postal address, which will be described in more detail later.
The yellow tables on the left contain metadata, and the blue tables on the right contain business data.
The Major Categories
Although we have many ways of contacting someone, these ways actually fall into a small number of categories, or types. You’ll see what I mean when you look at the list below:
|Contact Point Type|
|Phone Number (landline)|
In a sense, these are physically distinct. Of course, you can use a mobile phone to call a landline or another mobile. When it comes to voice calls between landlines and mobiles, the distinction is not that important. Still, we’re more likely to send a text (SMS) to a mobile than a landline.
But you’re not likely to deliberately voice call a fax number. After all, what are you going to say to it when you hear it, apart from ‘Oops, wrong number’? You are naturally much more likely to call with another fax machine, whether it’s physical or emulated. Neither would you send a letter to landline, or attempt to make a voice call to a postal address.
It’s important that we distinguish these types, because we interact differently with them. This will be especially true if your application has any sort of integration with communications services. It needs to know which type to interact with.
How Parties Use Contact Points
This is probably a bit more intuitive, a bit more in line with how we think about contact types. Here’s a longer list (but not an exhaustive one!) that will help you get a feel for these types:
|Party Contact Type (Contact Point Type)|
|Conference line (Phone Number)|
|Billing address (Postal Address)|
|Delivery address (Postal Address)|
|Direct line (Phone Number)|
|Holiday/vacation address (Postal Address)|
|Holiday/vacation phone (Phone Number)|
|Home address (Postal Address)|
|Home phone (Phone Number)|
|Home phone/fax (Phone Number)|
|LinkedIn profile (Web Address)|
|Main address (Postal Address)|
|Main email (Email Address)|
|Main fax (Fax Number)|
|Main phone (Phone Number)|
|Main website (Web Address)|
|Personal email (Email Address)|
|Personal fax (Fax Number)|
|Personal mobile (Mobile Number)|
|Personal pager (Pager)|
|Personal website (Web Address)|
|Secondary address (Postal Address)|
|Secondary phone (Phone Number)|
|Social media profile (Web Address)|
|Work address (Postal Address)|
|Work email (Email Address)|
|Work fax (Fax Number)|
|Work mobile (Mobile Number)|
|Work phone (Phone Number)|
The Postal Address – A Special Case
All of these contact point types are stored in a single field, with the exception of a postal address. This normally requires a number of lines (or fields).
There is a blog article here that proposes a simple, language-agnostic way to store postal addresses. If your requirements are rather basic – e.g. to print address labels pretty much as they are entered into the system – this approach will likely suffice. If your needs are more sophisticated, you will probably have to develop a different solution.
To get an idea of how complex addressing can be, have a quick look at this Schema for British Standard BS7666 Address Types. The standard comprises a number of parts covering Street Gazetteers, Land and Property Gazetteers, and delivery points. It does not differentiate between commercial or residential properties; between occupied, developed or vacant land; between urban or rural areas; or between postally-addressable entities and non-postally-addressable entities such as communications masts (towers). To achieve this, it introduces terms that most of us are probably not familiar with, such as Primary Addressable Object (PAO), which is the name given to an addressable object that can be addressed without reference to another addressable object. Familiar examples of PAOs include a building name or a street number. A Secondary Addressable Object (SAO) is given to any addressable object that is addressed by reference to a PAO. This might be the first floor of a named building.
To give us a visualization of this, I quickly reverse-engineered it into a UML modelling tool. Here’s what we get:
My point is that it can get pretty complicated and messy; addressing in some domains can be very complex indeed.
If you were to flatten this out into a single relational table, you’d get something like the following:
While this captures BS7666 address components, it doesn’t tell you how the model works. All the relational logic of the XML schema gets hidden away in application logic.
These two diagrams represent two data modeling extremes. But is there a middle way to model addresses?
It is indeed possible to have a relatively simple address model that is flexible and configurable.
An address component is typically a line on an address label, or rather a type of line on an address label. The sort of components we’d typically use for UK addresses are listed in the following table:
|Address Component Type|
|Dependent Thoroughfare Name|
|Double Dependent Locality|
|International Post Code|
|PAO End Number|
|PAO End Suffix|
|PAO Start Number|
|PAO Start Suffix|
|SAO End Number|
|SAO End Suffix|
|SAO Start Number|
|SAO Start Suffix|
You could have three or four address lines, plus the post town and postcode. However, the difficulty you will encounter is identifying what these lines actually contain when it matters – e.g. when mapping data between systems. When you carry out data profiling, you will find that Address Line 3 sometimes contains a dependent locality, but at other times it contains a county or locality. Now you’re into natural language processing (NLP); you have to recognize the difference between locality and county. And the permutations multiply as you add more countries.
So we must define all the address components for all the countries we operate in.
Address formats are made up of two parts: a header and its detail. The header is basically the name or title that the address format is known by. Examples could include:
|Address Format Type|
|British Forces Post Office (BFPO)|
|Post Office Address (PAF)|
Taking the UK’s Full Post Office Address Format (PAF) by way of example, we then define the following address format components:
|PAF||Double Dependent Locality||10||N|
Our application reads this metadata and displays the address components in the correct order. When address capture is required, the metadata tells us whether the address component is mandatory or not.
More often, our application requests the postcode from the end user and looks up the corresponding values and populates the address components automatically. Some applications allow the user to edit the address; other [annoying] ones don’t!
It’s not shown in the PDM, but if your organization operates internationally, you can define a many-to-many relationship between
country so that the correct address format (based on the user’s country) is presented to the end user (
When and only when the
contact_point is a postal address
contact_point_type, it must have a relationship to an address_format_type. Conversely, it follows that non-postal address types never have a relationship to an
address_format_type. Furthermore, the format must remain fixed for the life of the
contact_point, otherwise you’ll introduce the possibility of data integrity issues. (For this not to be the case, the target
address_format_components must be a subset of the source
contact_value has no meaning for a postal address because the values are stored in a
contact_value is mandatory for all other
address_line.line_content are mutually exclusive.
The Many-To-Many Relationship Between Party and Contact Point
You can think of
address_line) as containing the values and
party_contact as defining the usage. This allows a single
contact_point to have multiple uses. Our home [postal] address could also be our billing address and delivery address, depending on the context.
So far, the narrative has assumed that a party owns a particular
contact_point. But the data model does not impose this ownership rule! It makes no such restriction whatsoever. There’s another possibility that exists with this design: multiple parties for the same contact points.
You need to consider the implications carefully before venturing down this route.
Here’s an example. In the UK, Awarding Organizations (AOs) generally employ teachers as examiners. A teacher has two relationships: one with the school where he or she works, and another with the AO as an examiner. The school will have a bank of
contact_points with various phone numbers and possibly one or more postal addresses. These will be things like the school’s main address (postal address), main email (email address), main fax (fax number), and main phone (phone number).
It is entirely feasible that our examiner can use the very same
contact_points as his or her school, but he or she will use
party_contact to define them as work-related. If the school’s main phone number changes, the teacher’s work number will automatically be updated, which is pretty neat.
If you go down this route, you will need to define at the application level which party or parties are permitted to update
A Quick Word On Performance
The yellow metadata tables are going to be constantly used by queries. Consequently, they are likely to remain in memory. On most RDBMSs, you can pin tables into memory to ensure this. In Oracle, I would create these as index-organized tables, which are small and perform well. Do whatever the equivalent is for your RDBMS.
You also want to ensure that
party_contact rows are co-located in the same block (or page) using a clustered index on
party_id. Do the same with
address_line.contact_point_id. This cuts down on the amount of IO.
Another option exists if you want a
party to exclusively own a
contact_point. You can then can merge
party_contact to create
party_contact_point (still clustered on
party_id). This simplifies the model and could aid performance.
Changing Contacts Doesn’t Mean Changing Databases
We live in a time when it can be said that change is the only constant.
That doesn’t mean that each time something changes it has to impact your database. With a bit of thought, we can future-proof our designs – perhaps more than we have done to date. Doing so helps us to respond quickly to the inevitable change.
If you’re embarking on a green-field project, I would recommend using the Party Model (of which Contact Point is a part) for organizations and people. Why not open up the model and tweak it to your needs? Please feel free to grab a copy and make it your own.
But if your database or databases are already determined, the schema I’ve presented here can still be used, in XML form, to define your payload when integrating data between systems.