Database Indexing – Whose Role Is It Anyway?
I remember my own confusion about who was responsible for database indexing when I was a junior programmer some years ago. At one of my very first commercial projects, software architects created a database structure, developers wrote the code, and browser magicians made it look outstanding. The final product was deployed to servers and champagne corks popped. Our good mood didn’t last for too long...
After three months or so it turned out that one of the main features, a search engine, started choking. A rather simple search query took a few seconds to run. The time for optimization came very quickly. After some fast profiling with the production data and the cause of the slowdown became obvious – it was the DB that consumed too much time.
So we needed to create some indexes. But who should do it? The decision seemed very strange to me, a very young programmer: a developer had to add the indexes. I was like: but why? The DB structure was an architect’s work. Why should a developer fix it?!
I was completely wrong. My confusion was a result of my lack of knowledge about indexes.
An Index Is a Derivative of the Queries Being Performed
Creating indexes at the stage of creating a DB structure makes no sense. Not because it’s a premature optimization, but because indexes make the queries execute faster. When a software architect creates a schema he doesn’t know what the queries will look like. Without knowing the exact queries it’s impossible to properly index the DB. It’s like tuning a racecar’s suspension without knowing a type of the road it’ll be competing on.
The developer, on the contrary, has full knowledge and all of the tools required to do the job perfectly. He’s the author of the code, so he knows exactly what queries are executed against the DB. Even if the ORM you use operates on such a level of abstraction that it’s impossible to figure out a query structure looking at the code (by the way, such a situation would make me doubt if using this ORM was a good choice...), the developer can turn a debug mode on or, as a last resort, look at the RDBMS logs.
Knowing the exact queries, the developer may then run them against the DB and get a query execution plan as a result. Based upon this information he may decide how many indexes there should be, what columns (or functions on columns) they should contain and in what order. Having created indexes, he may then test them in his environment before introducing changes to production.
There’s one more reason why it should be a developer who performs such an optimization. Sometimes it turns out that indexing the data will not help that much. No actions on the DB side can help if stupid queries are executed from the code. By “stupid queries” I mean queries fetching all rows from tables when the vast majority of them will never be presented or used in any manner. It’s not that uncommon; test data used during the development process are usually very simple and sparse. An inattentive developer may accidentally omit the LIMIT clause (or any other windowing technique) and the application fueled with test data will give the false impression of working perfectly. Having found such bugs, the author of the code may fix them immediately.
Some developers think that touching the DB in any way is not their job. They feel comfortable with code and ignore everything outside of it. That’s wrong. The database isn’t just another integration point; it’s much closer to the application than any other external part of the system.