Things to know about data quality

No data is better than bad data. I am not sure if this is a quotation from a known person (definitely not Albert Einstein, nor Peter Drucker), however, I like this saying. Besides, there is a lot of truth in it. From bad data comes bad decisions. And if there is no data, decisions are not based on data (gut feeling anybody?). Taking care of data quality is the next step to leveraging the usefulness of your BIM project data.

I have already told you a lot about data itself (what it is and what data types we have) I’ve also many times mentioned how important good quality data is. But what defines data quality? How to measure it in order to know if the quality is good or not? Why is that even important? I will answer these questions in the following article.

Table of Contents

What is data quality? Data quality attributes

Yet again, let us start with the basics: what is data quality and what does it mean to have good quality data?

Data quality points us to the level to which extent data represents its purpose. In other words:

  1. We have business goals. They might be different: to represent reality, to make decisions based on gathered data or to solve given business tasks.
  2. We collect related data.
  3. We make business decisions that are determined by collected data.

Data quality describes how reliable the collected data is.

To be able to state if the data is good or bad we have to first make measurements and constitute their quality. We measure each of the following attributes:

  • Consistency
  • Accuracy
  • Completeness
  • Format
  • Uniqueness
  • Timeliness.

Let me describe each of them and give examples for better understanding.

Consistency

Consistency measures whether each occurrence of a data point in all sources exists and is the same. Consistent data for room number means the same number in all places: the model, room program spreadsheet and Facility Management software.

This is an essential measure if the project uses multiple data storage. For every data point, we have to establish which storage is the source of truth and communicate it throughout the project. To put it another way, the source of truth is the data master. In case of discrepancies, the source of truth overwrites any other storage with its value.

The metric to evaluate this attribute is the number of inconsistencies.

Room Number example:

Source Consistent data Inconsistent
Room program database AA.1023 AA.1023
Arch. Model AA.1023 AA 1023
MEP Model AA.1023 1023
FM database AA.1023 AA.1023

Unique coding (TFM) example:

Source Consistent data Inconsistent
Revit family -SQZ.231 -SQZ.200
TFM database -SQZ.231 -SQZ.231
FM database -SQZ.231 SQZ.231

Accuracy

Accuracy checks if the data used in the data model reflects reality. In other words, these measures check for correct and incorrect values. If an object is marked as “Load-bearing” but it is a 10 cm plasterboard interior wall, you know that this data point is inaccurate. Often this is about spelling errors that makes data inaccurate.

Metric is the ratio of the number of objects with data issues vs. the number of correct ones.

Comprehensiveness

That tells us if the data is complete, i.e. if all available data points made their way to the database. This measurement goes hand in hand with LOIN (Level of Information Need) and project requirements. Does every object have all the property values required at this stage of the project?

The Metric to evaluate is the number of objects that don’t have all the required values.

Parameter Name Requirement Explanation
TFM17LoSy Required Location to the system
TFM17SyKo Required System code
TFM17SyLn Required Running no. for system code
TFM17KoFk Voluntarily Running no. for instance
comprehensiveness data quality
Example of property requirements and an example of incomprehensive project data that should correspond to the requirements.

Format

Data entries have to meet the required format for a given property. For example, fire resistance has a format of “EIXX”, where XX must be values described in the standard (e.g. EI30, EI60, EI90). Another example is the class of concrete described in Eurocode: C30/37. The agreed date format can also be misleading for the international team (DD.MM.YYYY or MM/DD/YYYY or YYYY-MM-DD).

Metric evaluates the ratio of data with the inappropriate format.

Data label Correct format Incorrect format
Concrete class in Eurocode C30/37 B30, C30-35, C30
Fire resistance EI60, REI60 EI45, EI50, 100
Model Maturity Index MMI350 Mmi 350, 200, m.m.i. 300

Uniqueness

This attribute measures how many duplicates are in our datasets. This measurement is especially important when there is a requirement for unique coding of elements to be used in later stages of the project.

The metric to evaluate this attribute is the number of duplicates.

Duplicated data-Solibri
Example of duplicated property labels in Solibri (marked with yellow and turquoise). Those are internal project data required in EIR.

Timeliness

The data has to be up to date and represent reality for a required period of time. Afterwards, the data is refreshed (be it manually reimported or as an automated process). This is extremely important during the design and construction phase where there are constant changes to our data.

The metric to evaluate this attribute is the number of outdated records.

Timeliness ifc files
Example of IFC files. Does this mean that some of the teams have finished their work or the IFC files have not been uploaded to the CDE?

Why is data quality important?

I like to say that no data is better than bad data. If the project is about to use the created data, the data has to provide valuable information for the involved parties. High-quality data allows for smoother workflows and faster decision-making. On the other hand, low data quality comes with a row of risk factors and unwanted outcomes on our projects. Let me describe some that I find the most common and the most challenging.

Unreliable information = missed trust

When somebody consequently encounters incorrect project data, it has two-fold consequences. At first, if the person relied on data, he or she would make a mistake or poor judgement. This can be a design mistake resulting in a time-consuming redesign, an incorrect quantity surveying leading to an incomplete order and so on. After this happens a few times, the person loses trust in the data. He or she then starts either to spend more time on data quality control or – and that is worse – create and maintain his own data source (uniqueness attribute broken). That leads to people losing more trust in the original data source which leads to fewer data refreshes and eventually to the cease of data timeliness.

Example

On the screenshot, we see elements that have been assigned an incorrect contractor number (K29002 instead of K2902). Now, let’s assume that the contractor K2902 filters all objects under his responsibility and uses the data to:

  • Count how many bathrooms he has to mount on that floor and prepare a work schedule based on that
  • Send the list of door numbers to the door automation supplier
  • Count what types and how many elements he has to deliver on week 43

If he now relies only on this one filter (which in reality he shouldn’t, but I simplify) this might result in:

  • Work schedule that becomes too tight (one more bathroom than planned)
  • Door automation is not programmed for those doors
  • The elements for this bathroom don’t arrive at the construction site at all

Incomplete data

This one is connected with the project requirements. If the design data constitutes a basis for construction and the properties of the objects are incomplete, that makes a contractor unable to make a correct order for the elements to mount on the site. Incomplete data creates risk factors – some contractors order materials they think are sufficient (they might not be) while others might send requests for information and slow down the process.

Duplicated data

There might be duplicates in model objects (two walls placed on one another) or in the properties (two properties with the same value). The first one results in erratic quantity take-offs and ordering lists, while the other might result in a heavier model and incorrect data management. If a duplicated property is used as an input in another place it leads to ambiguity – which property should I use?

Example

Let’s assume that we have an architect model and a window supplier is also delivering his model (because some windows are complex in shapes and requirements). An architect should remove from his models all the windows created in the supplier model. But one window type remained duplicated and it slipped through coordination.

Tinsmith contractor calculates windows to order enough tinware for his job. As a result on the construction site comes 15% more tinware than it was supposed. He is frustrated because there is not enough space for storage on site, so he has to send the rest back and pay double for the freight.

How to perform data quality checks in BIM projects?

For each of the above-mentioned attributes, we can define metrics, perform quality checks and establish a quality compliance threshold. Since every project has different specifics and requirements it is crucial to agree on the hierarchy of data checks and their frequency. The essential metrics can be included in weekly BIM coordination checks, while others might be checked only a couple of times during the project’s lifetime. Or completely skipped.

Let me describe different possible data quality checks using examples from my experience and share my thoughts on what is important for our project. You don’t have to agree with that! Just remember to adjust this to your project needs.

Consistency

The building owner for the Stavanger University Hospital has high requirements regarding the quality of labelling of the technical systems on-site and in the model. The system is based on the Norwegian Object Classification (Tverrfaglig Merkesystem – TFM). This standard provides a description of how to encode each building element.

Objects’ TFM numbers reside in two separate databases: dRofus (requirement database, TFM codes generator and master) and IFC files (TFM number sent from dRofus). Both of the sources are afterwards imported to the single database and transferred to the Facility Management system. That is why the consistency attribute is important here – we want every object from the TFM database to correspond to the same object in the models.

Since there are over a hundred people that manually enter data in both places, it is more than certain that the errors occur. It’s worth emphasizing, it is difficult to check the data here because both databases are enormous (well over 2,5 million objects).

TFM i IFC - consistency check
Excel formula to receive TRUE/FALSE results on which TFM marks exist in the database but not in the model.

Considering that this check is on the interface of two different software, we needed to employ good, old Excel to compare both sources. I exported all objects from both sources into one spreadsheet and compared them using a formula. The results (TRUE/FALSE) are then placed in a dashboard for further analysis and task assignment.

This is quite a cumbersome process, so we did it only twice in the last year and the last check is going to happen just before the end of the design process.

Accuracy

This attribute is mainly checked by designers during Quality Control and BIM Coordination of their model. Well-defined rulesets that are based on norms and standards are the way to go here. In the first place, I would go for standard rules provided by issue-checking software.

Exemplary rules to check for accuracy:

  • Fire resistance must be from an agreed list
  • High walls must be thick enough
  • Openings cannot be too close to the side of the wall
accurracy check
Ruleset example in Solibri to check for accuracy in wall types.

Comprehensiveness

The attribute is also checked in IFC by specific rulesets. The condition here is only: exist/don’t exist. I perform this check mainly for project-defined IFC properties that are required by the client.

Exemplary rules to check for comprehensiveness (Nye SUS example, check if every object has a value):

  • “Control Area”
  • “Model Maturity Index”
  • “Responsible Contractor”
bim data quality check for comprehensiveness
The abovementioned rulesets are implemented into Solibri ruleset manager. We had to create our own sets of rules.

Format

The attribute can be checked in the IFC issue checker (Solibri in our case) or other places such as database. We check for the correct format according to requirements. This step can get comprehensive if you would like to check if each property has the correct value formatting.

Exemplary rules to check for format:

  • Easy one for fire resistance: One of (EI30, EI60, EI90, EI120)
  • More complicated: a regular expression for the TFM code itself: [+][0-9]{6}[=][0-9]{3}[.][0-9]{3,}[:][0-9]{2,}[-][A-Z]{3}[.][0-9]{3,}[T][/][0-9]{3,}$

Uniqueness

Checking for unique BIM objects is a relatively easy task: IFC model checkers have a built-in feature to check for duplicated objects. Checking for unique properties is a harder job. Sometimes software allows only unique values (like dRofus databases for TFM coding), but many times it is a manual and time-consuming job.

In such cases, your project has to define which data is worth digging into and checking for duplicates. IFC checkers could be a good help here since they allow for Information Take-off – this is the easiest way to identify which properties have the most duplicates. You just work your way from that point until you narrow down the result to include a full list of objects with duplicated data. Oftentimes the origin is the wrong IFC export by one of the designers. Send them these articles for help:

Timeliness

Data timeliness is ensured in the process of model and database updates and joining all the sources together. On our project, we have scheduled weekly IFC exports and data imports to all of our data management systems.

Summary

After performing all those checks, you will receive a comprehensive picture of the state of the project. However, this is only a point-in-time picture. If you want to keep an eye on the data quality, a good idea might be to include the results of such a data quality audit in a live dashboard in PowerBI.

To conclude, checking data quality holistically can take much time and effort. On the other hand, many of the attributes are often checked together with BIM coordination. Others can be automated or placed in the dashboard.

As a result, a project receives a full picture of what its data is worth and where to make improvements. With such an audit, the project stakeholders know, what data is already good to use and what should not be taken into account while making business decisions.

Resources

Did you like that post ? Share it with others !

We spend a lot of time and effort creating all of our articles and guides. It would be great if you could take a moment to share this post !

Share:

Comments:

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments

Author:

Download BIM CASE STUDIES:

After reading this guide you will learn:

  • How BIM is used on the biggest projects in Norway
  • What were the challenges for the design team and how were they solved
  • What were the challenges on the construction site and what was our approach to them

Newest articles: