Prevent property errors – Data Types Explained

How many times did you helplessly look at different schedules or take-offs and was wondering if that is even possible to make such mistakes? How many times data field contained information that shouldn’t be there? Sounds familiar? I’ve seen it a lot, especially on bigger projects. I might have a solution to some of such problems – data types.

In the last article about Data Management, I introduced and decomposed the information stored within the BIM models and generally on the construction project. If you haven’t, I recommend checking it prior to reading this one, as I will reuse those terms, without further explanation.

In this entry, I want to take a deeper dive into the subject of Structured Data, especially different Data Types. This is the very next critical topic to understand and use consciously if you want to manage your project data correctly.

Table of Contents

What is data type?

When we create properties, we obviously want some data written in them. Even better would be to get the correct data! 😉 Because of that, we give the labels meaningful names and specify units, for instance, the label “Length [cm]” clearly defines what we need there. If somebody fills it with the value “green”, we know something is wrong. A computer not necessarily though. On the other hand, for us, it doesn’t matter if the label is filled with the value “three” or “3”. But for a machine it does.

In other words, each data point has an attribute that helps to translate software how to understand its value. This, in turn, allow a computer to perform a correct operation on data. Each property in our model has a meaning readable to humans and machines.

To solve the two aforementioned challenges, machines operate on an attribute value called data type. If we assign data type “Numer” to the field “Length”, then the software would not allow us to write neither “green” nor “three”. We would be forced to write in “3” (or any other number naturally). That allows later data processing. A computer cannot perform mathematical operations on values “Two” and “Three”, but it definitely can on “2” and “3”. And does it way better than we do.

Important to note here – please do not misunderstand “data types” with “data units”. Mass, length, resistance and others can still have different data types. This is a whole different term.

The whole point of selecting and using correct data types is to force users entering data to do this correctly. To write “3” where it should be and not “Three”. This might sound a little vague now, but I will give you details and examples in upcoming chapters.

What data types do we have in BIM models?

We, as humans understand the sentence “Door D-01 is left swing, made of aluminium and has fire rating EI30.” A BIM software would rather have it written differently:
Label Value
Category Doors
Name D-01
Width (m) 0,9
Opening direction Left
Material Aluminium
Fire Door True
Fire Rating (EI) 30
Start Date 2022-08-22
End Date 2022-08-26

This is a table with the object’s properties (the magical “I” in BIM). Columns on the left are called labels (or field names). The columns on the right are values that have been assigned a data type. Data type drives how we can define values and what we can do with them. On construction projects, we use mainly these data types:

  • String (text). A text data type is an unrestricted string of characters. It can contain both letters, numbers and special characters and is always treated as text value (i.e. no calculations possible). As in “D-01” value from the table above.
  • Number. Allows only numeric values. This data type can be further limited to values without decimal (30) or with (30,12). The first one is called integer, the latter – float. Calculations are possible in this data type. In the table above – Fire Rating=30 and Width=0,9m.
  • Boolean. It represents the values true and false. Sometimes it can take another form: 1 (true) and 0 (false) or yes(true) and no (false) values. Like in Fire Door = True in the table above. In software, it comes most often as a radio button.
  • Date. Self-explanatory data type. In the table, you will see it in the fields “Start Date” and “End Date” showing the period when the doors should be mounted on site.
  • Enumerated. It contains a predefined set of values to choose from. You can choose only one value for each variable. This is a data type derived from the previous ones since their output can be one of the four mentioned above. Let’s assume that in our example for the label “Material” you could choose a value from a drop-down list. It contained Aluminium, Steel and Glass. This list is called enumerated type.
  • Array (list). It stores several elements of the same type and it is also a derived data type. In our “Material” example, if the doors would be aluminum with glazing we could create an array with values 0 (Aluminium) and 1 (Glass). The length of the array is 2. Typically the choice is made by allowing for the selection of multiple checkboxes.

How to use data types

In any BIM project there are hundreds of thousands of objects, like doors from the example above. They consequently have a hundred or more properties that describe them. That gives us enormous data amount and is a good basis to… data chaos.

When does the chaos begin? In the data entry process. Thus – when we design. While creating objects and setting the value of their parameters. This is where we make mistakes, typos or just forget to put in any data at all.

Correct property management can reduce some of the mistakes. Mainly because there are data types that are more and less prone to user’s entry errors. Let me describe in detail what those are and what the difference between them is. In addition, I’ll present how to correctly manage data types on a BIM project.

String

String is definitely the most popular data type. And the easiest to make mistakes in. Big or a small letter at the beginning? Space or no space? Maybe a dash? Shortcut with a dot or without? We might understand each value as the same type, but unfortunately, for the machine, they constitute a different set of values. Hence, that creates data inconsistency and makes it difficult to query, because we have several separate values instead of one. As a result, such data demands clean-up before processing (you will learn about it in the next entries).

Wrong use of string data type
Example of freely used string data type that created many values that mean the same.

How to use this data type

String is the easiest to create, easiest to input and the most flexible data type. All these features make it the least desired in data management. When you create properties, try to limit this input freedom by using more of the other data types I described below.

Number

Number is a better data type. It allows only for numeric input and a user will receive an error message when trying to insert another value. Working on various projects I realised one common issue most of the projects have. In BIM Authoring Tools, some labels should have numeric input, however, BIM managers still create them as text data types. Why? My guesses are: unawareness, trying to make a teammate’s life easier and trying to minimise errors in transferring data between various software.

Let me describe what can happen if we transfer data between two properties with different data types. Assuming we transfer room data properties between Revit and dRofus database (or any other two software that exchange data). Property “Fresh air requirement” in Revit is defined as a Text type and it is written: 4 033,20 (European typing). dRofus database has data formatting set to English and this field on Room Data Sheet has data type Number. When we try to synchronise those two parameters we receive an error. Why? Because the correct value in dRofus should be: 4,033.20. Had we defined this property in Revit as Number, the software would convert it automatically. But because it is text, it is just sent as we see it, no changes are allowed.

Another, maybe the easier example is having an Excel in European formatting and trying to separate decimals with a dot – you will instantly receive a text data type and won’t be able to perform any calculations.

This is an example of mismatching number formatting between two software. But there are thousands of different examples based on some other discrepancies. I understand BIM managers, who just try to eliminate their headache by assigning “text” to all fields. But this additional work is then just transferred onto a person managing this data.

How to use this data type

Use the number data type in every field, where a numeric value is required. If you are not convinced to use number data type, please at least require a pure number in a property. If you have a unit for a given value (e.g. EI for fire rating or MMI for model maturity), it is better to write the unit in the label and leave the value field for the number. Otherwise, the user will write in: MMI 300, mmi 300, 300MMI, Mmi300, MMI:300, MMI-300 and more. And that again results in additional data clean-up.

Boolean

Boolean is very easy for user entry and for data processing. It is often handy to combine a boolean data field with another one that is unlocked if the value is true so that a user can give it a specific value.

How to use this data type

The catch is with data field creation – the admin has to define the radio box to have values: True/False/null. It cannot be just a checkbox, where True=checked, because the status of the unchecked box is unknown – does it mean “False” or is it just “null” (value not applicable for that object)?
Boolean Revit
Boolean checkboxes in Revit.

Date

Date is also a pretty consistent data type. Here you have to only be aware that users across the globe use a different data format (10/11/2022 or 11/10/2022) which can cause confusion. That’s why the most reasonable is to use the ISO standard: YYYY-MM-DD. In our example, it is: 2022-10-11 (11th October 2022). In that formatting, even the dumbest software can sort it out oldest-youngest 🙂

Enumerated

This data type is extremely convenient to use for data creation. The user is given some predefined choice and his only job is to choose the correct one. 100% data accuracy. On the other hand, its drawback is when the choice has to be more differentiated (rather inconvenient to choose from a long list) or the data point creator hasn’t thought of all possible options and omitted some. A good example of an enumerated type can be a choice of type in the family in Revit. Once you have created a list, you can choose one value that interests you.

Both of those cons can be solved by allowing the user to type in their own value if correct doesn’t exist in the list. Nevertheless, this creates a bigger danger of data inconsistency.

Enumerated dRofus
Enumerated type in Room Data Sheet.
Enumerated Revit
Enumerated type in Revit - choosing Family Type.

How to use this data type

As desired and neat as this data type is, it is practically unavailable to use for properties in the major BIM Authoring Tools. And that’s a pity. Otherwise, we could narrow down lots of properties to be a value chosen from the predefined list. I am not sure though if it isn’t available in any software. I checked for Revit and ArchiCAD. The first requires writing an API, the latter scripting in GDL. Hence – theoretically possible, practically unavailable for “standard” users of this software. If you know of software allowing that or heard about another method of creating it in RVT or AC – let me know, most appreciated!

Array

Similarly to Enumerated, this data type is as well structured and easy to input. The difference between Enumerated and Array is only that here, a user can choose multiple values.

And the same as in Enumerated, this data type is non-existent in BIM design software. The only lists we can create are using Visual Programming and extracting listed values put into an object’s properties.

Summary

In this entry, you have learned about the six most common data types used in data management. You also learned why the easiest-to-entry types, are not the best-to-manage. As a take-away from this article, remember that using correct data types results in data being consistent and makes it easier to analyze.

One last important thing to bear in mind is the size that each data type takes in the database. Generally, the least memory-consuming data type is boolean – it takes only 2 bytes (yes/no). Then there is the number and date. The most memory-consuming data type is a string – depending on the database and programming language it takes ca. 18 bytes + 2 bytes per character. Those numbers look very small at first glance but remember – each object has a hundred or more properties and each model has millions of objects. That gives us hundreds of millions of properties in each model. And that sums up to gigabytes of data and affects model performance.

Finally, as you read recently, two of the most secure data types are mainly not supported by BIM software. Therefore we are stuck with not-so-great-but-still-fine data types. How to live with that? How to check and assure data quality on our projects? Stay with me on this journey through the data world and I will do my best to give you some answers and point directions. 🙂

Resources

Did you like that post ? Share it with others !

We spend a lot of time and effort creating all of our articles and guides. It would be great if you could take a moment to share this post !

Share:

Comments:

Subscribe
Notify of
guest
7 Comments
Oldest
Newest
Inline Feedbacks
View all comments
Tony Fitzpatrick
1 year ago

Thanks guys, this is a great article on the importance of keeping your data input tidy.
It is something I have been big on at our company and have tended to push our team away from free data entry as much as possible and encourage the use of enumerated data input to reduce the incidence of errors.

Also wherever possible we use expressions in ArchiCAD Properties to actually generate a lot of the data automatically so we only need to input it once and ArchiCAD then extracts it wherever relevant.

Tim Hoffeller
1 year ago

Hello Konrad, cool article and perfect overview of why we need structured data!
You asked for tools known for helping the user to enter data correctly, especially enums
Maybe this is interesting for you:
https://www.ekkodale.com/tools/aiaeditor/
A tool for bringing eir requirements into revit
And
https://www.ekkodale.com/leade/
The way to collect as much as possible I outside of the BIM tools and make it available for others in the project.
Greetings, Tim

Mikko Lahti
Mikko Lahti
1 year ago

In Archicad, it is possible to use Enumerated property selectors via custom Archicad properties. The difference to GDL object properties is roughly the same as the difference between Revit’s instance and type parameters: custom properties created this way are per-instance, but quite usable for many use cases and can be translated to ifc properties.

Mikko Lahti
Mikko Lahti
1 year ago
Reply to  Konrad Fugas

https://helpcenter.graphisoft.com/user-guide/76720/

For enumerated values, you can use Option Set data type.

These kind of custom Archicad properties can then be mapped to corresponding ifc property sets and properties for ifc export, https://helpcenter.graphisoft.com/user-guide/137863/#XREF_74699_Custom_IFC

Author:

Download BIM CASE STUDIES:

After reading this guide you will learn:

  • How BIM is used on the biggest projects in Norway
  • What were the challenges for the design team and how were they solved
  • What were the challenges on the construction site and what was our approach to them

Newest articles: