Data dictionary: a concern for data governance and process agility
If you've gone through the process of designing your conceptual data model and developing your controlled list of terms, it's important to document the terminology and data standards you've established for the purposes of managing your information. In this article, we'll introduce you to what a data dictionary is and why it's important for the long-term integrity of your database.
Our intention with this article is to introduce you to a way of thinking about documenting the logic behind the decisions you've made about the design of your database. Although it may seem like a tedious step to document all the terminology and data patterns you've chosen, it will ultimately be useful for future users of your database and the solutions that come from it.
I was motivated to write about this topic because this week we've been working on a consultancy project in which we've migrated data from SharePoint to Dataverse. This obviously has a number of implications, as we needed to switch all the data sources from various apps, flows and reports to Dataverse. In terms of organisation and time optimisation, we created a data dictionary to help us with this solution and implementation. So I'd like to share some tips on how to make an efficient data dictionary.
What’s Dataverse?
Data is at the centre of everything a company does today and fuels the ideas that can drive what it should do tomorrow. To thrive and grow, companies need to collect, analyse, predict, present and report data and do it all with a high level of agility.
Building the data infrastructure to enable business insight can be time-consuming and costly. Data originates from a variety of devices, applications, systems, services and software as a service (SaaS). This large and growing number of sources often consists of multiple data technologies that store different types of data, expose different APIs, and use a mixture of security models. The programmers needed to create these technologies can be expensive and difficult to find. Programmers must often have an in-depth understanding of how to implement, configure, manage, and integrate these different data technologies.
Dataverse addresses these concerns with an easy-to-use, easy-to-manage, compliant, secure, scalable, and globally available SaaS data service. Dataverse empowers organisations to work with any type of data and any type of application and use the data within it to gain insights and drive business action.
As part of the Microsoft Power Platform, Dataverse has three main centres of focus:
- Availability and Scalability
- Security
- Compliance
These were the points that made us migrate from SharePoint to Dataverse. The main reason being security, as Dataverse uses Azure Active Directory identity and access management mechanisms to help ensure that only authorised users can access the environment, data, and reports. And that makes all the difference!
Dataverse uses role-based security to group together a collection of privileges. These security roles can be associated directly with users, or they can be associated with Dataverse teams and business units.
Another very important reason that motivated us to switch was the delegation limits that SharePoint has when it comes to applications. This problem prevents us from filtering and sorting tables with more than 2000 rows.
What’s a data dictionary?
A data dictionary is:
- A way of organising
- Detailing what data, we have
- What they mean and what type they’re
It’s essentially a rule book:
- what data to include in your database,
- how it should be structured,
- how it should be entered, and
- how it should be accessed
This dictionary will be a repository of names, definitions and attributes that provide contextual information about the data in your database. The dictionary describes each database field with a clear definition of what information is captured in that field and the rules for using that field.
When you start to develop your data dictionary and determine the elements you should include, it's important to consider what would be important for someone unfamiliar with your database to know about the data held within. It should be the ideal tool for anyone to understand everything about your dataset.
A data dictionary is fundamental to the sustainability of any database, especially when it is used by several people, as is our case here at Portal Gestão. You may know how to read and interpret your data. However, if you left your organisation tomorrow, would someone else know how to access, read, and interpret the data you do? Your data dictionary will ensure that, no matter what happens, there is documentation that will explain everything anyone needs to know about your database.
How to create a data dictionary?
It can be done in various ways. But the way I'm going to show you today’s in good old Excel.
To create a data dictionary, it's important to first consider a few questions:
- What does each element in your data represent? What is it describing?
- What tables and columns exist?
- What’s the path (URL, for example) to each of them?
- Which columns have unique values?
- Which columns have null values?
- What are the relationships between the tables? What are the primary keys?
- What values are allowed in each column?
- Is this data linked to any flow (Power Automate) that needs to be changed?
Examples:
Identification of tables
Column created:
- Table number
- Dataverse logical name
- Table
- Description
- SharePoint link
- Dataverse link
- Relationships
- Annotations
Flows created
- Flow number
- Flow name
- Link
- Flow objective
Environments in Power Apps
- Environmen
- Solution
- Link
Columns
- Table number
- Table name
- Column name
- Description
- Allowed values
- Alternate Key (used for data modelling)
- Do you allow unique values?
- Does it have null values?
- Choices (column in Dataverse)
- Table link
- Business rules
- Annotations
- Column name in SharePoint
- Logical column name in Dataverse (useful for programming in Power Apps and Power Automate)
Column to delete in Dataverse
- Table number
- Table name
- Column
- Name of the column in the Dataverse schema
- Logical name of the column in Dataverse
- Reason for deletion
Relationship
- Table number 1
- Name of table 1
- Name of column 1
- Modelling of column 1
- Modelling column 2
- Name of column 2
- Name of table 2
- No. of table 2
- Relationship link in Dataverse
- URL of the model on the draw-io
We usually use the website https://app.diagrams.net/ to do the data modelling. In fact, in a normal app-building process, this is one of the first steps we take: defining the data modelling. This way, we have a clear idea of which tables we'll need, which columns, the types of relationships, the primary keys and the data types.
Final considerations
Cataloguing a company's tables and columns is not always an easy task, but when it is done, it becomes a great asset for any company.
With a view to:
- Communication between employees entering and leaving companies,
- Transparency in the meaning and terminology of different departments (integrated work between teams)
- Data management and governance.
Keep your company's data dictionary up to date and you'll see the benefits.
See you in the next articles.