Early in the movie “The Fellowship of the Ring”, the wizard Gandalf asks the hero Frodo this question: “Is it secret? Is it safe?” We may not have a magic ring to protect, but we’re asking the same question. But we’re talking about information.
This is the second in a multi-part series on how to apply information security principles and techniques as part of data modeling. This series uses a simple data model designed to manage non-commercial clubs as an example of security approaches. In later articles, we will address modeling for fine-grained access controls, auditing, authentication, and other key aspects of secure database implementation.
In the first article of this series, we applied some simple access controls to our club-managing database. Obviously, there’s more to it than just providing access controls when adding photos. Let’s take a deep dive into our data model and find what needs to be secured. Along the way, we’ll discover that our data has more information than what we have so far included in our model.
Identifying the Club’s Secure Content
In our first installment, we started with an existing database that provided a bulletin-board service to private clubs. We examined the effect of adding photographs or other images to the data model, and we developed a simple model of access control to give some security for the new data. Now, we’ll look at the other tables in the database and determine the what of information security for this application.
Know Your Data
We have a data model that has nothing to support any kind of information security. We know the database carries data, but we need to examine our understanding of the database and its structures to secure it.
Key Learning No. 1:
Scrutinize your existing database before applying security controls.
I’ve built out
person here with the typical information you’d enter on any website or give to a club. You’ll certainly notice that some of the information here could be considered sensitive. In fact, apart from the
id surrogate primary key, it’s all sensitive. All of these fields are categorized as personally identifiable information (PII) according to the definitive PII guide from the US National Institute of Standards and Technology (NIST). In fact, the information from this table is nearly sufficient to commit identity theft. Handle this wrong and you can get sued by people, sued by corporations, fined by regulators, or even prosecuted. It’s even worse if children’s data is compromised. And this isn’t just in the USA; the EU and many other countries have strict privacy laws.
Key Learning No. 2:
Always start a security review with the tables describing people.
Of course, there is more data in this schema than just
person. Let’s consider these table by table.
graphic_format– This table has little other than a snapshot of commonly-known items like JPEG, BMP, and such. Nothing sensitive here.
photo_action– This table is itself very minimal. It only has a handful of rows, each describing the sensitivity, not of the data, but of an action on a photo. This isn’t very interesting on its own.
photo– Ah, photo. The saying goes that “a picture is worth a thousand words”. Is that true from the standpoint of security and sensitivity? If you thought
personwas risky, brace yourself for
photo. Take a look at what those “thousand words” may contain:
- details and conditions of important public infrastructure, buildings, etc.
- a copyrighted image
- metadata indicating the exact time and GPS location of the photo
- metadata identifying camera model, serial number, and owner
- incidental information like expensive jewelry construction, art, vehicles, or businesses
- a record or depiction of actions that are of dubious legal, moral, or ethical status
- textual messages: banal, provocative, hateful, benign
- an association with a club
- an association with the person who uploaded the image
- non-sexual bodily details, such as facial features, injuries, disabilities, height, weight
- faces of non-consenting people, included intentionally or incidentally (more PII)
- the implicit association of the people in the photo with each other and with any of the information previously mentioned. This may suggest employment, military service, cars owned, size or value of houses or real estate…
Key Learning No. 3:
Captures of physical data, as in photos, must be scrutinized for the many sorts of information and relationships they might carry.
club– Some clubs’ names and descriptions may convey more information than you’d expect. Did you really want to advertise that your club meets at Martha’s house on Elm Street? Does it indicate political activity that others may target?
club_office– Identifies the meaning and privileges of a club leadership position. The use of or description of titles may convey a lot of information about the club. Some of that could be deduced from a club’s public description, others might reveal private aspects of the club’s operation.
member– Records a person’s history with a club.
officer– Records a member’s leadership history with a club.
Clearly, there are items here that should be protected. But whose responsibility is that?
Who Owns That Data?
You’re storing it, you own it! Right? Wrong. Way, way wrong. Let me illustrate just how wrong with a common example: health care information. Here’s a U.S. scenario – hope it’s not as bad elsewhere! Aldo’s physician Dr. B. found underarm nodules and ordered a blood test. Aldo went to Lab C where Nurse D. drew blood. Results went to endocrinologist Dr. E. via Hospital F, using YOUR system operated by IT contractor YOU. Insurer G got the bills.
So do you own the lab information? In addition to Aldo, his doctors, and his insurer having an interest, you and any one of these people could get sued if they do something that compromises this confidential information. In this way, everyone in the chain is responsible, so everyone “owns” it. (Aren’t you glad I’m using a simple example?)
Key Learning No. 4:
Even simple data may connect to a web of people and organizations you must handle.
Let’s look at our club again. Whew! What do we know about the parties interested in each main data entity?
|parent or other guardian if any||parents or guardians are responsible for the person if a minor or if incapacitated|
|Court officers||if the Person, under certain legal restrictions, may be subject to scrutiny by an officer or designee of a court|
|the club itself|
|officers of the club||officers are responsible for maintaining the club, its description, and its outward appearances|
|members of the club|
|the club itself||offices and titles form part of the internal structure of the club|
|club officers||depending on the type of powers and responsibilities associated with an office, the officers will be affected in what they do and how they do it|
|club members||members may want to seek a club office or understand it, sometimes to hold an officer accountable|
|owner (copyright holder) of the photo||the photo may not be owned by the person who uploaded it!|
|licensees of the photo||the photo may be included under a licensing agreement|
|people in the photo||if your image is included in published material, it could affect your interests in some way – maybe in lots of different ways|
|owners of land or other objects in photo||such people may have their interests affected by the depictions in a photo|
|owners of textual messages in the photo||messages and symbols may be subject to intellectual property restrictions|
Know the Relationships Among Your Data
No, we’re not done yet. Take a look at the data model. We have not examined
officer. Note that
officer don’t have a single field that is real data. Everything is a foreign or surrogate key, except the dates which only time-box each record. These are purely relationship tables. What can you derive from this?
memberwill suggest a
person’s interests because of the
memberwill suggest what
membersays how large the
memberwill suggest similar or related
clubs when a
personhas multiple memberships.
officerwill strongly tie a
personto the interests of the
officermay suggest access to club money, facilities, or equipment by a
officerwill indicate the abilities of a
person(leadership in particular) when the
officedefinition suggests other skills. Treasurer would suggest accounting and budgeting skills, for example.
officermay indicate relatively tight control of a
clubby a small group when durations are long or when the number of distinct
members is small.
Key Learning No. 5:
Data relationships may leak a lot of information about primary data entities.
But let’s not forget our old favorite,
photoother than the uploader may suggest a
clubassociation akin to
persons in a
photosuggest relationships among them.
- Activity depicted in a
clubactivities or the interests or abilities of
persons in the photo.
photoGPS information will document the presence of depicted
persons in a particular location, as will the background of the
photowill typically participate in zero or more photo albums for presentations, etc.
Getting the Full View
With this analysis of the data, we start to see where we have to focus our efforts. We can view the model with some visual assistance:
In other words … nearly our whole data model has some security content.
Key Learning No. 6:
Expect the majority of your schema to have security content.
That’s right. Practically the whole thing. This will happen to you all the time. Any table more trivial than a simple look-up may be involved in your overall database security approach. This makes it important for you to practice economy and care in modeling to minimize the number of tables you’re wrestling.
In Conclusion: Know Your Data
Knowing your data is essential to securing it. Knowing the value of your data and its sensitivity will give you crucial guidance in how to implement a comprehensive security architecture within your database.
Information security is an extensive task, and in this series I am bringing issues and techniques for you to use incrementally in improving database security. In the next installment, I will show how to use this information in the Club’s database to help you identify the sensitivity and value of your data. As we continue in the series, we will improve the access control approach from the last article with more comprehensive and flexible controls. We’ll also see how data modeling can be used to support authentication and auditing, as well as database multi-tenancy and recovery.
I hope that this article has given you tools and – just as importantly – insights on how to go about this crucial step in database security. I eagerly welcome feedback on this article. Please use the box for any comments or critiques.