No data is better than bad data. I am not sure if this is a quotation from a known person (definitely not Albert Einstein, nor Peter Drucker), however, I like this saying. Besides, there is a lot of truth in it. From bad data comes bad decisions. And if there is no data, decisions are not based on data (gut feeling anybody?). Taking care of data quality is the next step to leveraging the usefulness of your BIM project data.
I have already told you a lot about data itself (what it is and what data types we have) I’ve also many times mentioned how important good quality data is. But what defines data quality? How to measure it in order to know if the quality is good or not? Why is that even important? I will answer these questions in the following article.
Table of Contents
What is data quality? Data quality attributes
Yet again, let us start with the basics: what is data quality and what does it mean to have good quality data?
Data quality points us to the level to which extent data represents its purpose. In other words:
- We have business goals. They might be different: to represent reality, to make decisions based on gathered data or to solve given business tasks.
- We collect related data.
- We make business decisions that are determined by collected data.
Data quality describes how reliable the collected data is.
To be able to state if the data is good or bad we have to first make measurements and constitute their quality. We measure each of the following attributes:
- Consistency
- Accuracy
- Completeness
- Format
- Uniqueness
- Timeliness.
Let me describe each of them and give examples for better understanding.
Consistency
Consistency measures whether each occurrence of a data point in all sources exists and is the same. Consistent data for room number means the same number in all places: the model, room program spreadsheet and Facility Management software.
This is an essential measure if the project uses multiple data storage. For every data point, we have to establish which storage is the source of truth and communicate it throughout the project. To put it another way, the source of truth is the data master. In case of discrepancies, the source of truth overwrites any other storage with its value.
The metric to evaluate this attribute is the number of inconsistencies.
Room Number example:
Source | Consistent data | Inconsistent |
---|---|---|
Room program database | AA.1023 | AA.1023 |
Arch. Model | AA.1023 | AA 1023 |
MEP Model | AA.1023 | 1023 |
FM database | AA.1023 | AA.1023 |
Unique coding (TFM) example:
Source | Consistent data | Inconsistent |
---|---|---|
Revit family | -SQZ.231 | -SQZ.200 |
TFM database | -SQZ.231 | -SQZ.231 |
FM database | -SQZ.231 | SQZ.231 |
Accuracy
Accuracy checks if the data used in the data model reflects reality. In other words, these measures check for correct and incorrect values. If an object is marked as “Load-bearing” but it is a 10 cm plasterboard interior wall, you know that this data point is inaccurate. Often this is about spelling errors that makes data inaccurate.
Metric is the ratio of the number of objects with data issues vs. the number of correct ones.
Comprehensiveness
That tells us if the data is complete, i.e. if all available data points made their way to the database. This measurement goes hand in hand with LOIN (Level of Information Need) and project requirements. Does every object have all the property values required at this stage of the project?
The Metric to evaluate is the number of objects that don’t have all the required values.
Parameter Name | Requirement | Explanation |
---|---|---|
TFM17LoSy | Required | Location to the system |
TFM17SyKo | Required | System code |
TFM17SyLn | Required | Running no. for system code |
TFM17KoFk | Voluntarily | Running no. for instance |
Format
Data entries have to meet the required format for a given property. For example, fire resistance has a format of “EIXX”, where XX must be values described in the standard (e.g. EI30, EI60, EI90). Another example is the class of concrete described in Eurocode: C30/37. The agreed date format can also be misleading for the international team (DD.MM.YYYY or MM/DD/YYYY or YYYY-MM-DD).
Metric evaluates the ratio of data with the inappropriate format.
Data label | Correct format | Incorrect format |
---|---|---|
Concrete class in Eurocode | C30/37 | B30, C30-35, C30 |
Fire resistance | EI60, REI60 | EI45, EI50, 100 |
Model Maturity Index | MMI350 | Mmi 350, 200, m.m.i. 300 |
Uniqueness
This attribute measures how many duplicates are in our datasets. This measurement is especially important when there is a requirement for unique coding of elements to be used in later stages of the project.
The metric to evaluate this attribute is the number of duplicates.
Timeliness
The data has to be up to date and represent reality for a required period of time. Afterwards, the data is refreshed (be it manually reimported or as an automated process). This is extremely important during the design and construction phase where there are constant changes to our data.
The metric to evaluate this attribute is the number of outdated records.
Why is data quality important?
Unreliable information = missed trust
Example
On the screenshot, we see elements that have been assigned an incorrect contractor number (K29002 instead of K2902). Now, let’s assume that the contractor K2902 filters all objects under his responsibility and uses the data to:
- Count how many bathrooms he has to mount on that floor and prepare a work schedule based on that
- Send the list of door numbers to the door automation supplier
- Count what types and how many elements he has to deliver on week 43
If he now relies only on this one filter (which in reality he shouldn’t, but I simplify) this might result in:
- Work schedule that becomes too tight (one more bathroom than planned)
- Door automation is not programmed for those doors
- The elements for this bathroom don’t arrive at the construction site at all
Incomplete data
Duplicated data
There might be duplicates in model objects (two walls placed on one another) or in the properties (two properties with the same value). The first one results in erratic quantity take-offs and ordering lists, while the other might result in a heavier model and incorrect data management. If a duplicated property is used as an input in another place it leads to ambiguity – which property should I use?
Example
Let’s assume that we have an architect model and a window supplier is also delivering his model (because some windows are complex in shapes and requirements). An architect should remove from his models all the windows created in the supplier model. But one window type remained duplicated and it slipped through coordination.
Tinsmith contractor calculates windows to order enough tinware for his job. As a result on the construction site comes 15% more tinware than it was supposed. He is frustrated because there is not enough space for storage on site, so he has to send the rest back and pay double for the freight.
How to perform data quality checks in BIM projects?
For each of the above-mentioned attributes, we can define metrics, perform quality checks and establish a quality compliance threshold. Since every project has different specifics and requirements it is crucial to agree on the hierarchy of data checks and their frequency. The essential metrics can be included in weekly BIM coordination checks, while others might be checked only a couple of times during the project’s lifetime. Or completely skipped.
Let me describe different possible data quality checks using examples from my experience and share my thoughts on what is important for our project. You don’t have to agree with that! Just remember to adjust this to your project needs.
Consistency
Objects’ TFM numbers reside in two separate databases: dRofus (requirement database, TFM codes generator and master) and IFC files (TFM number sent from dRofus). Both of the sources are afterwards imported to the single database and transferred to the Facility Management system. That is why the consistency attribute is important here – we want every object from the TFM database to correspond to the same object in the models.
Since there are over a hundred people that manually enter data in both places, it is more than certain that the errors occur. It’s worth emphasizing, it is difficult to check the data here because both databases are enormous (well over 2,5 million objects).
Considering that this check is on the interface of two different software, we needed to employ good, old Excel to compare both sources. I exported all objects from both sources into one spreadsheet and compared them using a formula. The results (TRUE/FALSE) are then placed in a dashboard for further analysis and task assignment.
This is quite a cumbersome process, so we did it only twice in the last year and the last check is going to happen just before the end of the design process.
Accuracy
This attribute is mainly checked by designers during Quality Control and BIM Coordination of their model. Well-defined rulesets that are based on norms and standards are the way to go here. In the first place, I would go for standard rules provided by issue-checking software.
Exemplary rules to check for accuracy:
- Fire resistance must be from an agreed list
- High walls must be thick enough
- Openings cannot be too close to the side of the wall
Comprehensiveness
The attribute is also checked in IFC by specific rulesets. The condition here is only: exist/don’t exist. I perform this check mainly for project-defined IFC properties that are required by the client.
Exemplary rules to check for comprehensiveness (Nye SUS example, check if every object has a value):
- “Control Area”
- “Model Maturity Index”
- “Responsible Contractor”
Format
The attribute can be checked in the IFC issue checker (Solibri in our case) or other places such as database. We check for the correct format according to requirements. This step can get comprehensive if you would like to check if each property has the correct value formatting.
Exemplary rules to check for format:
- Easy one for fire resistance: One of (EI30, EI60, EI90, EI120)
- More complicated: a regular expression for the TFM code itself: [+][0-9]{6}[=][0-9]{3}[.][0-9]{3,}[:][0-9]{2,}[-][A-Z]{3}[.][0-9]{3,}[T][/][0-9]{3,}$
Uniqueness
Checking for unique BIM objects is a relatively easy task: IFC model checkers have a built-in feature to check for duplicated objects. Checking for unique properties is a harder job. Sometimes software allows only unique values (like dRofus databases for TFM coding), but many times it is a manual and time-consuming job.
In such cases, your project has to define which data is worth digging into and checking for duplicates. IFC checkers could be a good help here since they allow for Information Take-off – this is the easiest way to identify which properties have the most duplicates. You just work your way from that point until you narrow down the result to include a full list of objects with duplicated data. Oftentimes the origin is the wrong IFC export by one of the designers. Send them these articles for help:
Timeliness
Summary
After performing all those checks, you will receive a comprehensive picture of the state of the project. However, this is only a point-in-time picture. If you want to keep an eye on the data quality, a good idea might be to include the results of such a data quality audit in a live dashboard in PowerBI.
To conclude, checking data quality holistically can take much time and effort. On the other hand, many of the attributes are often checked together with BIM coordination. Others can be automated or placed in the dashboard.
As a result, a project receives a full picture of what its data is worth and where to make improvements. With such an audit, the project stakeholders know, what data is already good to use and what should not be taken into account while making business decisions.