In the last article about Data Management, I introduced and decomposed the information stored within the BIM models and generally on the construction project. If you haven’t, I recommend checking it prior to reading this one, as I will reuse those terms, without further explanation.
In this entry, I want to take a deeper dive into the subject of Structured Data, especially different Data Types. This is the very next critical topic to understand and use consciously if you want to manage your project data correctly.
Table of Contents
What is data type?
When we create properties, we obviously want some data written in them. Even better would be to get the correct data! 😉 Because of that, we give the labels meaningful names and specify units, for instance, the label “Length [cm]” clearly defines what we need there. If somebody fills it with the value “green”, we know something is wrong. A computer not necessarily though. On the other hand, for us, it doesn’t matter if the label is filled with the value “three” or “3”. But for a machine it does.
In other words, each data point has an attribute that helps to translate software how to understand its value. This, in turn, allow a computer to perform a correct operation on data. Each property in our model has a meaning readable to humans and machines.
To solve the two aforementioned challenges, machines operate on an attribute value called data type. If we assign data type “Numer” to the field “Length”, then the software would not allow us to write neither “green” nor “three”. We would be forced to write in “3” (or any other number naturally). That allows later data processing. A computer cannot perform mathematical operations on values “Two” and “Three”, but it definitely can on “2” and “3”. And does it way better than we do.
Important to note here – please do not misunderstand “data types” with “data units”. Mass, length, resistance and others can still have different data types. This is a whole different term.
The whole point of selecting and using correct data types is to force users entering data to do this correctly. To write “3” where it should be and not “Three”. This might sound a little vague now, but I will give you details and examples in upcoming chapters.
What data types do we have in BIM models?
This is a table with the object’s properties (the magical “I” in BIM). Columns on the left are called labels (or field names). The columns on the right are values that have been assigned a data type. Data type drives how we can define values and what we can do with them. On construction projects, we use mainly these data types:
- String (text). A text data type is an unrestricted string of characters. It can contain both letters, numbers and special characters and is always treated as text value (i.e. no calculations possible). As in “D-01” value from the table above.
- Number. Allows only numeric values. This data type can be further limited to values without decimal (30) or with (30,12). The first one is called integer, the latter – float. Calculations are possible in this data type. In the table above – Fire Rating=30 and Width=0,9m.
- Boolean. It represents the values true and false. Sometimes it can take another form: 1 (true) and 0 (false) or yes(true) and no (false) values. Like in Fire Door = True in the table above. In software, it comes most often as a radio button.
- Date. Self-explanatory data type. In the table, you will see it in the fields “Start Date” and “End Date” showing the period when the doors should be mounted on site.
- Enumerated. It contains a predefined set of values to choose from. You can choose only one value for each variable. This is a data type derived from the previous ones since their output can be one of the four mentioned above. Let’s assume that in our example for the label “Material” you could choose a value from a drop-down list. It contained Aluminium, Steel and Glass. This list is called enumerated type.
- Array (list). It stores several elements of the same type and it is also a derived data type. In our “Material” example, if the doors would be aluminum with glazing we could create an array with values 0 (Aluminium) and 1 (Glass). The length of the array is 2. Typically the choice is made by allowing for the selection of multiple checkboxes.
How to use data types
In any BIM project there are hundreds of thousands of objects, like doors from the example above. They consequently have a hundred or more properties that describe them. That gives us enormous data amount and is a good basis to… data chaos.
When does the chaos begin? In the data entry process. Thus – when we design. While creating objects and setting the value of their parameters. This is where we make mistakes, typos or just forget to put in any data at all.
Correct property management can reduce some of the mistakes. Mainly because there are data types that are more and less prone to user’s entry errors. Let me describe in detail what those are and what the difference between them is. In addition, I’ll present how to correctly manage data types on a BIM project.
String
String is definitely the most popular data type. And the easiest to make mistakes in. Big or a small letter at the beginning? Space or no space? Maybe a dash? Shortcut with a dot or without? We might understand each value as the same type, but unfortunately, for the machine, they constitute a different set of values. Hence, that creates data inconsistency and makes it difficult to query, because we have several separate values instead of one. As a result, such data demands clean-up before processing (you will learn about it in the next entries).
How to use this data type
Number
Number is a better data type. It allows only for numeric input and a user will receive an error message when trying to insert another value. Working on various projects I realised one common issue most of the projects have. In BIM Authoring Tools, some labels should have numeric input, however, BIM managers still create them as text data types. Why? My guesses are: unawareness, trying to make a teammate’s life easier and trying to minimise errors in transferring data between various software.
Let me describe what can happen if we transfer data between two properties with different data types. Assuming we transfer room data properties between Revit and dRofus database (or any other two software that exchange data). Property “Fresh air requirement” in Revit is defined as a Text type and it is written: 4 033,20 (European typing). dRofus database has data formatting set to English and this field on Room Data Sheet has data type Number. When we try to synchronise those two parameters we receive an error. Why? Because the correct value in dRofus should be: 4,033.20. Had we defined this property in Revit as Number, the software would convert it automatically. But because it is text, it is just sent as we see it, no changes are allowed.
Another, maybe the easier example is having an Excel in European formatting and trying to separate decimals with a dot – you will instantly receive a text data type and won’t be able to perform any calculations.
This is an example of mismatching number formatting between two software. But there are thousands of different examples based on some other discrepancies. I understand BIM managers, who just try to eliminate their headache by assigning “text” to all fields. But this additional work is then just transferred onto a person managing this data.
How to use this data type
Boolean
How to use this data type
Date
Enumerated
This data type is extremely convenient to use for data creation. The user is given some predefined choice and his only job is to choose the correct one. 100% data accuracy. On the other hand, its drawback is when the choice has to be more differentiated (rather inconvenient to choose from a long list) or the data point creator hasn’t thought of all possible options and omitted some. A good example of an enumerated type can be a choice of type in the family in Revit. Once you have created a list, you can choose one value that interests you.
Both of those cons can be solved by allowing the user to type in their own value if correct doesn’t exist in the list. Nevertheless, this creates a bigger danger of data inconsistency.
How to use this data type
Array
Similarly to Enumerated, this data type is as well structured and easy to input. The difference between Enumerated and Array is only that here, a user can choose multiple values.
And the same as in Enumerated, this data type is non-existent in BIM design software. The only lists we can create are using Visual Programming and extracting listed values put into an object’s properties.
Summary
In this entry, you have learned about the six most common data types used in data management. You also learned why the easiest-to-entry types, are not the best-to-manage. As a take-away from this article, remember that using correct data types results in data being consistent and makes it easier to analyze.
One last important thing to bear in mind is the size that each data type takes in the database. Generally, the least memory-consuming data type is boolean – it takes only 2 bytes (yes/no). Then there is the number and date. The most memory-consuming data type is a string – depending on the database and programming language it takes ca. 18 bytes + 2 bytes per character. Those numbers look very small at first glance but remember – each object has a hundred or more properties and each model has millions of objects. That gives us hundreds of millions of properties in each model. And that sums up to gigabytes of data and affects model performance.
Finally, as you read recently, two of the most secure data types are mainly not supported by BIM software. Therefore we are stuck with not-so-great-but-still-fine data types. How to live with that? How to check and assure data quality on our projects? Stay with me on this journey through the data world and I will do my best to give you some answers and point directions. 🙂
Thanks guys, this is a great article on the importance of keeping your data input tidy.
It is something I have been big on at our company and have tended to push our team away from free data entry as much as possible and encourage the use of enumerated data input to reduce the incidence of errors.
Also wherever possible we use expressions in ArchiCAD Properties to actually generate a lot of the data automatically so we only need to input it once and ArchiCAD then extracts it wherever relevant.
Cool! I’m glad to see people are aware of how important this topic is! Expressions are powerful indeed
Hello Konrad, cool article and perfect overview of why we need structured data!
You asked for tools known for helping the user to enter data correctly, especially enums
Maybe this is interesting for you:
https://www.ekkodale.com/tools/aiaeditor/
A tool for bringing eir requirements into revit
And
https://www.ekkodale.com/leade/
The way to collect as much as possible I outside of the BIM tools and make it available for others in the project.
Greetings, Tim
In Archicad, it is possible to use Enumerated property selectors via custom Archicad properties. The difference to GDL object properties is roughly the same as the difference between Revit’s instance and type parameters: custom properties created this way are per-instance, but quite usable for many use cases and can be translated to ifc properties.
That sounds promising. Do you have a link to a guide on how to set this up?
https://helpcenter.graphisoft.com/user-guide/76720/
For enumerated values, you can use Option Set data type.
These kind of custom Archicad properties can then be mapped to corresponding ifc property sets and properties for ifc export, https://helpcenter.graphisoft.com/user-guide/137863/#XREF_74699_Custom_IFC
Cool, thanks for that! I didn’t know about it 🙂