Introduction
Definition
The Modeler enables you to virtually combine several Virtual Datasets and to transform them by using different data operators (e.g. transformation, union, join & more).
The Modeler is available as a web service on modeler.virtualdataplatform.com.
In this section you will find more general information about best practises.
Best Practices
The following section provides some general and background information about the Modeler and its recommended usage. Furthermore, it gives some hints on how to create Models such that they are clean and set up appropriately.
Virtual Data Modeling
The modeling of Virtual Data Models differs from traditional data modeling in that it often involves combining data from various sources dynamically, incorporating logical processes, and may not be tied to a specific physical storage structure. Virtual data modeling focuses on creating a conceptual representation that facilitates flexible and on-the-fly data retrieval, whereas common data modeling typically emphasizes the design and structure of a static, physically stored database. Virtual Data Models provide a more abstract and dynamic approach, accommodating diverse data origins and supporting real-time data access.
The following core considerations are crucial:
- Bear in mind: VDP facilitates easy user access to any data source.
- Users typically focus on aggregated views rather than sifting through millions of individual data records.
- Source Systems are robust and powerful and usually provide many features. Virtual Data Platform aims to optimize these strengths to achieve the highest possible aggregation for specific needs.
- The objective is not to implement a single Model for many different data retrieval tasks. Each task might have different requirements (e.g. aggregation, or detail view) and should be addressed by its dedicated model.
Practical Considerations
When creating a Model, it can be beneficial to keep the following considerations in mind:
-
Data granularity: Typically, not all data at every granularity level is required for every reporting or analysis task, but on aggregated and enriched level. Therefore, for the given data modeling task, apply filters early to reduce the amount of data, conserve memory, and maintain Model speed and efficiency. When aiming for more granularity, allow the user to define an appropriate filter by selecting the desired 'area.'
-
Architecture consistency: Ensure consistency in building Models. Models involve various steps, from simple data transformations to complex logic modeling. Since data is retrieved from different Source Systems, columns or types may be similar but not equivalent. To match the final required data quality and structure, transformation steps such as renaming columns, deleting columns, and converting columns to specific data types are necessary. It's recommended to perform these steps before applying more complex merge logic or calculations, and to follow a specific order in every Model you create, such as:
- Delete unnecessary columns.
- Rename columns to match the final data structure for data merging.
- Convert data to the appropriate data types.
- After completing these basic steps, proceed with advanced logic. One can think of the traditional 'ETL' process layers in data warehouses. They should also be defined consistently.
-
Description and documentation: Apply consistent naming conventions. For example, if you start with CamelCase, stick to it. If you prefer UPPERCASENAMES, be consistent throughout the modeling process. Additionally, use the description feature in the Modeler to document how and why each operation is performed.
-
Cardinality: Avoid multiplying cardinality. Instead, perform summation by iterating through filter sets. Cardinality in the context of data modeling refers to the count of unique values within a dataset or the distinctiveness of elements in a set. 'Avoid multiplying cardinality' means, not to unnecessarily increase the number of unique values, which could lead to larger and more complex datasets. Instead, the recommendation is to use aggregation and selective filtering to achieve desired analytical outcomes without unnecessarily expanding the uniqueness of values in the dataset. This approach helps to maintain simplicity and efficiency in data modeling processes.