Devise a powerful and practical data strategy
Use the 4dDX strategic framework to be data-driven, not just data-busy
What is Data Cataloguing?
As the name suggests, data cataloguing is the practice of identifying all of the business’s data before recording information about those data in an organised inventory. Overall, a data catalogue gives us a clear picture of the whole data landscape. In a sense, it’s like doing a data ‘stock-take’.
Why Catalogue Data?
A data catalogue is a powerful tool which enables people to discover, understand and trust the data they need, when they need it. People who need to use data in their jobs, cannot do their jobs effectively when they struggle to find accurate, complete and trustworthy data. In those circumstances, people often spend more time searching for data than actually using data to generate analyses and valuable insights. When people struggle to find and access trusted data, the organisation can suffer from poor decisions, a slow pace of activity and restrictions on growth and competitiveness.
How Does a Prototype Data Catalogue Work?
A prototype data catalogue needs to encompass just 3-layers of information.
- Layer 1 – Data Relationships.
- Layer 2 – Data Flows.
- Layer 3 – Data Lineages.
Data relationships are simply the data links that exist between the various systems and processes in the business. For example, we may know or see that data moves between our website and CRM system. That’s a data relationship. Think of a data relationship as a data ‘pipe’. At this first stage we’re focused on finding all of these ‘data pipes’, but we’re not yet interested in what’s in those pipes. That comes next…
Data flows are the actual movements of data flowing through each data relationship. When we analyse at data flows, we’re looking inside each of the ‘data pipes’ to see what kinds of data are moving and in which direction. Data flows are defined by their topic or ‘theme’. That is to say, what the data are about. Examples of data topics are ‘customer’, ‘payment’, ‘address’, and so on.
We don’t want to get too detailed, as that would mean we end up taking too long and creating a data dictionary rather than a data catalogue. So for example, while identifying that some data relationships contain data about an ‘address’, we won’t zoom in to a level of detail that recognises ‘house number’, ‘postcode’, and so on. We would however recognise the difference between ‘customer address’ and ‘supplier address’, as they are two different topics of data, albeit similar ones.
Data lineages are a number of data flows arranged in a sequence, according to a specific scenario. We use them to understand the whole ‘upstream’ and ‘downstream’ big-picture of how data moves in particular situation. A very simple example would be a customer online order process, where we would expect to see that the first data flow is the customer’s selection of a product into the basket, the second data flow would be the customers details during checkout, and the third data flow might be the customer’s payment details, before the process completes with a fourth flow of order confirmation data.
In reality, the data lineage would very likely be more complex than that, but the principle is the same. We want to see the sequence in which data flows occur, so that we can understand the ‘upstream’ and ‘downstream’ states of the data involved in the process. Data lineages can be quite time consuming to analyse, so unlike a data catalogue which is created as an holistic foundation of our data understanding, data lineages are typically analysed only when necessary.
How is a Data Catalogue Prototyped?
Quite simply it’s a matter of gathering the right information and knowledge, and then recording and presenting it in ways that are beneficial.
To create a prototype data catalogue, knowledge and information about data is gathered through analysis of IT infrastructure, and through consultation with business and IT stakeholders. And because we’re just prototyping, we can use simple spreadsheets and diagrams. This will allow us to understand a lot more about the size and shape of data in the organisation, and what we need from a data catalogue long-term. From there, we can use what we’ve learnt from the data catalogue prototype to make well-informed decisions about the cost-benefits of using more advanced or specialised data cataloguing tools and technology.