Sameer Babbar

View Original

If you are in a jungle and do not know what lion looks like what would you do ?

Look for a dictionary ? Hmm and the data dictionary which serves the similar purpose for data is not so complex

I was in conversation with the two very senior executives one from government and other private sector with both in the same room. The conversation almost stalled when the point of data dictionary was raised. There is a lot of complex technical information out there when we talk about data dictionary that confuses everyone. I thought, I will simplify the jargon and put it in a common language that everyone can understand in simple plain English.

What is data dictionary?

If you go to a jungle And see a lion, how do you know it is a lion, and not a monkey, or a cow or a dog?

Well, that's when the data dictionary comes in handy. You have had some information about lion from the past, you have looked at the picture. Using that information, you are able to picture eyes What lion is.

So purpose of data dictionary is to tell you about something that helps you interpret what the “thing” is.

  • It gives you a framework to interpret

  • Helps you measure how big or how small an object say an electrical pole , or a lion is.

  • how that object or element relates to other objects so that you can put in a context say faster, smaller, heavier lighter. So you can define a comparative or relative score or a measure. Say vehicles per linear kilometer of the road

  • Understand hierarchy. In terms of version of software, or in Animal Kingdom, how the hierarchy is arranged species to kingdom.

  • Then the question comes about how it came about in origin? What are the sub parts? (If it's car parts we're talking about The same car part can be used in other cars)

  • Behaviors, how the object in question behaves with respect to the environment it is exposed to

  • Each type of object will have its own behaviors that we can define.

  • custodianship also comes into play, when we talk about data, how it came into origin, the custodian versus ownership are highly contested words. Ownership suggests that someone owns the data and custodianship or guardianship is just a caretaker of the data. Data in organizations comes as a flow, you may be upstream, downstream, or in the middle part of the data flow, but not necessarily entire creation process be yours. For example, if you build a car, you may have information about the build of the car only; how fast the cars are moving on the road, that speed information might be collected by the GPS system of the car, or maybe by the mobile phones are collecting speed, which with the person who's driving the car.

Who owns the data. Now, this is a contentious field, we will leave this discussion for now. It is similar to asking what does it take to make a cup of coffee covered in freakonomics **

How this helps, if you're going to a car repaired, there is a standardization that can take place. So same part can fit into multiple cars. So you can have a catalog about the parts or in terms of hierarchy. If you go up to the scale of car, we have a catalog to look at when you buy a car.

So the purpose of data dictionary is to inform others or communicate intelligently to other people about something or instrumentation that is the standardization how to build it or when you want to investigate or take a deep dive in a cause and effect or a relationship element that you want to investigate further (say fixing a broken down car).

When it comes to data dictionary, different industry domains will have their own lingo or their own dictionary. So the dictionary which is used by health services may not be the same as in government or institution, or location services industry or a legal services industry. Every industry over a period of time will have their own data dictionary. And though technically we are storing, retrieving managing the information, but the context of that information will be different. As in the dictionary we store the words, in data dictionary, we have a different context which will be relevant to the industry that this applies to for the data it uses.

If you want to play in an industry domain, learn its lingo first. If you want help in building a data dictionary for your organisation please contact me on sbabbar(at)sameerbabbar.com

(c) Sameer Babbar

**Freakonomics - A Rogue Economist Explores the Hidden Side of Everything By: Steven D. Levitt, Stephen J. Dubner