Data modeling or database design is the process of producing a detailed model of a database. The start of data modeling is to grasp the business area and functionality being developed. When we work with an Agile process (in this case, Scrum), there is a tendency to assume that everyone can work with everything. However, I would like to point out flaws in that idea and my recommendations related to data modeling and Scrum.
To effectively perform data modeling, first, we need to know the functionality that is being developed: what User Stories are being supported? Hopefully, the functional requirements of the application are well known, well defined and documented.
Also, we need to listen to the business. As a data modeler, we need to solve business problems by providing the underlying data that is needed. I have heard some people say “the business doesn’t know that a database exists”; that just makes me want to laugh. Of course they do; most business people work with data regularly or even constantly. Sure, they don’t care about the details, problems and challenges of databases, but they do want their data to be safely stored, up-to-date and available in an efficient way. Nothing drives business people crazier than having to wait to get information (data) that they need to do their business task; at least that is the case in my environment.
Now, how does data modeling fit into the Agile software development process? Software developers tend to think that the data model is a living outgrowth of their work, while data modelers tend to think of the model as a static design with a more static and strategic approach: that the data model must be created up-front based on user needs and fit into the enterprise data model.
On the contrary, I see data modeling partially as a team effort; team members other than the data modeler can contribute to the development of the model. However, there should be an “owner” of the model, which means someone who is responsible for the model and whose baby it is. At the same time, the model should not be a sacred cow that can only be modified by the model owner (to take the analogy one step farther, we allow baby-sitters to take care of our children), but the owner must own the model and have the final say. In the same way that Agile methods work best with a complete team approach, data modelers should be integrated into the Agile process as data modelers.
However, this division of labor can be a challenge; data modelers must participate in the development sprints and make the importance of the data clear. Modelers must sprint with the developers – quickly turn requirements into model updates so that we are not roadblocks to the development process. Data modelers must work closely with the development team in an interactive way so that the model fits the development needs. The developers might complain that the data model is too normalized to support high-performance, or they might ask for additional normalization in certain areas.
Our experience is that the data modeling should be run closely in line with the user story creation; that the data modelers need to be working in lock-step with the business analysts, so that the user stories and data model are ready at the start of each Sprint. In other words, the data model is prepared in Sprint 0 with the user stories and architectural design, and updates for Sprint n are prepared during Sprint n-1 or n-2. Trying to update the data model and the code in the same Sprint leads to problems and excuses (“my task is not complete because the data modelers didn’t deliver the required tables until the day before the end of the Sprint”; I hate excuses).
By having a fairly complete list of User Stories known at the end of Sprint 0, the risk of discovering new requirements late in the project, when the database model needs to be very stable, is less likely. Of course, this is challenging and may not be possible in every project, but for, moderately- and reasonably-sized projects, this should be a goal. Large projects often require different approaches to deal with the vast scope and potential for change.
One advantage in an Agile project is that the data model would be tested quickly, early and often with examples of data to be stored in the tables and how it will be used by the application; then with table data for unit testing of the application.
The contribution of the data modeler to the Agile project is more than just an entity relationship diagram (ERD). The ERD will be iterative; it will be a living, breathing beast. There is no need to try to create “the world’s greatest ERD” before handing it over to the developers. The ERD is important, but the ERD must integrate with the progress of the Agile project and be capable of adapting when requirements change, and the underlying data model must fit into the larger enterprise data strategy.
These are some of the observations that I have found useful while working in Agile projects. Hopefully, my recommendations might help you avoid these pitfalls that I have seen.
What are your experiences of data modeling in Agile projects?