Data Modelling

What is data modelling?

Data modelling is about understanding your client's needs and creating an accurate (database) system model to fit those needs.

The process involves performing an initial needs analysis with the client (i.e. to explore new ways, or improve current approaches) including:

an understanding of the current system (if it exists);
defining what data is to be held in the new system and how the data should be related; and
designing on paper or with a computer the improved or enhanced system to meet those needs.

Why do we need data modelling?

We need data modelling because for database developers they need to create an effective database for their clients which is accurate, lean and effective enough to achieve what clients need.

Are there other terms we can use for "data modelling"?

Yes. The terms entity modelling, entity-relationship modelling and logical data modelling all refer to the same term: data modelling.

What are needs?

Data modelling can be applied to almost any situation in life (except human relationships!). But it is usually applied in the business world. So when we talk of the client's needs, we usually mean real world business processes such as the recording, measuring and maximising production (or minimising costs) of certain things. It is really about developing systems for capturing information about the real world, and then processing and presenting the data in a suitable format so that the client can make appropriate decisions.

And by suitable format, we mean:

Accurate
Complete
Timely
Usable

For example, the client's needs could be to record information about the contact details of all customers who use the client's business. Or the client's needs could be to measure how many products get produced on the factory floor and have this information captured on a system. It really doesn't matter.

You will often hear from clients a variety of other words to describe this more common term we call "needs". People may say they have "workloads", "outcomes", "requirements", "progress levels", "information flows", "quantities of resources and outputs", "work practices" and so on to achieve in their business. Basically all this means one thing: the client's needs.

Who does all this work?

Well yes, it does seem like a gruesome job to do but it isn't all that bad. In fact, a person who does the listening when gathering data from the client is usually called a consultant. The person who models the data to create the system for the client is usually called a developer.

What is the most important step in data modelling?

The most important step in developing any new or enhanced system is the initial analysis of client requirements (or more appropriately, "the needs of the client").

Before you can ever hope to design your system as the solution to whatever problem the client has, or even to purchase a single piece of equipment to make your system work, it is paramount that you do the needs analysis first with your client. Otherwise it could result in wasted money and time on developing a system that does not do what it is supposed to for your client.

Generally:

the more time you spend gathering information about the client's needs;
the more clients you decide to talk to about their needs; and
the more you get to the core needs of the client,

the more effective and accurate will be your final system design.

We call this consultative process the "data"(-gathering) part of data modelling whereby we gather data about the client and his/her requirements first.

What sorts of questions should I ask the client?

The sorts of questions you should be asking include the following:

Who are all the clients who will use or be affected by the new or enhanced system?
How do I get into contact with them?
What do all the clients hope to achieve from the new or enhanced system?
With whom are the clients offering their services to?
What specific system(s) are already in place?
What kinds of data are we to collect and how should they be related when developing the new or enhanced system?

In essence, what we are learning from our meeting with the client is:

What needs to feed into something?
What feeds from the core purpose of the system?

We ask these questions because they help us to understand the aims of the system we are developing and in learning more about the people, the services, the products and the current equipment available so we may develop an accurate new or improved system for the client.

When asking questions, don't worry about things like constraints of the current system they are using. Try to concentrate on what it is they need right now and perhaps what they would want in the future. Later, you will determine whether there will be constraints in the new system.

How do I design a good (database) system on paper?

This is done through the use of diagrams consisting of boxes and lines with certain notations added to them to help indicate the type of data to be processed and have it do something in the way the client wants.

We call this the "modelling" part of data modelling whereby we create a model or graphical representation of the data and processes required to meet the needs of the client.

So when we do eventually go out to develop a database (or purchase any necessary equipment) for the client, we use this diagram or model as the framework for the implementation of the new or enhanced system.

What kinds of notations can I include in my system model?

The model we are talking about is really a special diagram known as an entity relationship diagram (or ERD). It consists of entities drawn as boxes and relationships between the entities (if there are any) drawn as lines.

Now there is nothing special we do to the boxes. We just draw a box and put a name inside to show the entity we are looking at for the type of specific service being modelled by the system. For example, if our system looks at the service of watering plants, we would have two entities called "Water" and "Plant". We then draw two boxes on a piece of paper to represent these two simple entities.

Sometimes you will find professional developers designing fancy boxes for their ERDs. For example, some people may use only circles or ovals; or perhaps a mixture of boxes, circles, triangles and so on; turn them into 3D objects; or fill them in all sorts of colours. Never mind about the fancy stuff. Just see it for what they are: simple boxes.

Next, after consultation with our client (if it isn't quite obvious at this stage), we realise there is going to be a relationship between "Water" and "Plant". So we draw a line between the two entities.

Watering System

How we read this diagram is important. The line between the two boxes tell us (from left to right) that water is needed by the plant; and (from right to left) each plant may possibly collect some water for the watering system (e.g. through condensation on the leaves or whatever). Just use your imagine and find something that makes sense to you as you read this diagram.

Now we could rely on our client for even more information to help develop our system more accurately. But for the purposes of this exercise, let us assume that "Water" can come from different sources (e.g. town water, the dam, rain water, condensation on the leaves etc) or that the activity of watering itself can take place at different times of the day or month.

This "multiple sources" or "different times" concept can be represented on the diagram like so,

Watering System

The three small lines converging on a point along the main line merely represents the different water sources or the different times of the day the water will come for the plant (depending on how the client wants to look at it).

Note that some people will like to emphasise this "many-to-one" relationship by including additional smaller boxes inside the "Water" box and then draw individual lines from these smaller boxes to the "Plant" box. Again, it is up to you to design a model that works for you and which is simple to understand.

But we don't just stop there. Things do fortunately get a little more exciting. For example, we can add further details to the relationship depending on whether the process going from one entity to the other or vice versa is required or optional. A solid line coming out of one entity means the process is mandatory or must be done. A dotted line coming out of another entity represents an optional process.

Watering System

For example, in this diagram, we read from left to right how the watering must be done for one and only one plant. If we read from right to left, it means each plant may be given one or more waterings (or it could mean each plant may be a collector of water for the watering system). Whether we give more than one watering to the plant or the plant can be a collector of water for the watering system is optional.

To finish off the ERD, we could add words to indicate the name of the relationship (or process) and other things to make it easier to interpret the system like so,

Watering System

To know whether the relationship is a valid one, look at the logic behind the statement you are making when reading the diagram from left to right and vice versa. Does it make sense? If so, the diagram is probably true. If it doesn't, then there is a mistake and the diagram should be improved.

NOTE: For database developers, you need to avoid "many-to-many" relationships because databases are not designed to handle this situation. Talk to the client and see what can be done to convert the relationship to a "one-to-many", "many-to-one" or "one-to-one" relationship.

So the steps in the data modelling process are...?

The basic steps to creating a model are as follows:

Agree on a scope and objectives statement.
What is the whole purpose for the clients being there and doing their work? What are their objectives? How much scope should the new or enhanced system cover.
Create an audience list.
With whom are the client meant to serve? For example, a client that needs to water plants means the audience is the "plants".
Create a list of services.
What kind of services are to be provided to the audience list through the new or enhanced system? For example, a client that needs to water plants would need someone or something to provide the water. We call this a service. When creating the list of services, look at the services of the existing system (if any) and any desired future services in the new system.
Rank the services required.
Now how important are each of the services? Try to identify the three most important services, and then of those three, pick out the most important service. Eventually we will model all of the services. It is just that focussing on the most important service will at least give you a starting point.
Model the information requirements for the delivery of each service.
This is the fun part. Working from the most important service, start drawing boxes and lines and link them all up. This will show how all the parts to a service and eventually all the services are interrelated. For example, the most important service in keeping plants alive for a nursery is the watering service. So we would draw a box representing the water, a box for the plant, and a line indicating the process of watering the plants. Then we may break this line and add a box to indicate other services like adding fertiliser to the plants (called an additive).
Identify entities.
The word "entities" is a technical term meaning the identification of specific items involved in a business process about which information is to be held (or captured). Entities, whether real or imagined, are items or events of significance to the business in achieving a particular service.

An entity may be a real object like a person or a building; or it may be more intangible like an appointment or some other activity. It could even be conceptual like an organisational unit. The critical thing to remember here is that the entity must be described in real terms and be uniquely identifiable in some way.

For example, in the case of plants in a nursery, one of the processes needed to take place at the business might be tracking plant waterings. The service is clearly the process of "watering the plants". But the system we are developing is actually to measure the amount of water being used for each plant. Now within this system, there are two entities: the Plant entity and the Watering entity. We can call them entities because each plant can be uniquely identified by its biological name. Similarly, we can uniquely identify the water by the date and time of applying the water to a plant.

NOTE: The drawing of the line between any two entities means the process or service being carried out or to be carried out.
Describe entities via their attributes (or primary keys as they are known in the technical jargon).
This means finding those details within an entity which serve to identify, qualify, classify, quantify or otherwise express the state of the entity in a particular relationship or occurrence. Think of the term "attributes" as specific pieces of information called data which may be collected or maintained or needs to be known or held by the business in order to perform a service.

For example, we could choose to measure or identify the following attributes for the Plant entity: description, acquisition date, pot size, deciduous or non-deciduous, and so on.

Remember, not all attributes are required, but optional. For example, the acquisition date for the Plant entity and its description is probably required by the business to help identify the item and when it was acquired. However, pot size is optional as not all plants may come in a pot.

Fortunately, you should have enough information gathered from the needs analysis meeting with your client to determine which attributes are required (as well as to which attributes belong with which entity), and which attributes are optional. Or at least the client will later decide which is more important.

NOTE: The more attributes you assign to an entity, the more accurate and detailed the results will be from the new system. However, it will mean more time and money spent in collecting and maintaining this information. The client therefore has to weigh up the importance of certain attributes for him/herself.
Define the primary key(s) for each entity.
This is a useful step for people developing a database system for their clients. Defining the primary key means looking at the attributes of each entity and identifying the one attribute which has unique values and so is capable of uniquely defining an occurrence of the entity.

If there isn't an attribute to uniquely define the occurrence of the entity, create a new attribute for that purpose. Call this attribute the "ID Number" where unique values will be held.
Establish the relationships between entities.
A relationship is an association between two entities which is important to the system. To define a relationship, we draw a line connecting the two related entities and then give this line a name to help confirm the validity of the relationship. Finally, the type of line we draw will reveal the significance (or degree) of the relationship between the two entities (i.e. "many-to-many", "many-to-one", "one-to-one" or "one-to-many" relationship)..

Not all entities will be related to every other entity. Some entities will have a natural link with other entities, while other types will be best left as its own separate sub-system.

For example, a plant in a nursery needs water. Clearly we have a relationship defined here. Now the plant may be given water at different times of the day or month, but each watering at the base of the plant must be for one and only one specific plant. Therefore, we create a "many-to-one" relationship.

Once you have two or more entities and their attributes defined, it is usually fairly easy to see the relationships between the entities. But to be absolutely certain about the relationships, always talk to your client as he/she is the expert in their field of the business.
Draw an entity relationship diagram (ERD).
The final part of the process is to draw the system using ERD, which by now you should already have done after completing step 9. Perhaps you may need to redraw it better to make it clear how the system will work in the real world.

Sometimes redrawing the diagram is a good idea because you need to make it very clear what the final ERD should look like. Because all design decisions made by the developer will be based on this ERD, if there are any mistakes in the diagram or the relationships are not clearly defined and shown in the ERD, the system will fail to achieve the client's requirements.

What now?

Now that you have an accurate model of the system required by your client, the time has come to make it work.

As a database developer on Microsoft Access, for instance, this means creating tables to represent the entities and the field names for each of the attributes in each entity. Then you define the relationships in Microsoft Access just as you have designed them in your ERD model.

That is all there is to it!

Conclusion

The purpose of data modelling is simply to get it clear in your mind and in the mind of your client exactly what has to be done and how to fulfil those needs through a diagram known as an ERD.

Data modelling may seem like an unnecessary time waster for the experienced developer, but it is actually the most important step in the life of any developer, whether it be a Systems Librarian, a computer programmer, or a manager. It should be the best time for both you, as the developer, and the client to explore new ideas and fully understand the exact purpose of the system so that it will do precisely what it is intended in the real world and nothing less or more.