Introduction
Definition
The Modeler enables you to virtually combine several Virtual Datasets and to transform them by using different data operators (e.g. transformation, union, join & more).
The Modeler is available as a web service on modeler.virtualdataplatform.com.
In this section you will find more general information about best practises.
Best Practices
The following section provides some general and background information about the Modeler and its recommended usage. Furthermore, it gives some hints on how to create Models such that they are clean and set up appropriately.
Virtual Data Modeling
The modeling of Virtual Data Models differs from traditional data modeling in that it often involves combining data from various sources dynamically, incorporating logical processes, and may not be tied to a specific physical storage structure. Virtual data modeling focuses on creating a conceptual representation that facilitates flexible and on-the-fly data retrieval, whereas common data modeling typically emphasizes the design and structure of a static, physically stored database. Virtual Data Models provide a more abstract and dynamic approach, accommodating diverse data origins and supporting real-time data access.
The following core considerations are crucial:
- Bear in mind: VDP facilitates easy user access to any data source.
- Users typically focus on aggregated views rather than sifting through millions of individual data records.
- Source Systems are robust and powerful and usually provide many features. Virtual Data Platform aims to optimize these strengths to achieve the highest possible aggregation for specific needs.
- The objective is not to implement a single Model for many different data retrieval tasks. Each task might have different requirements (e.g. aggregation, or detail view) and should be addressed by its dedicated model.
Practical Considerations
When creating a Model, it can be beneficial to keep the following considerations in mind:
-
Data granularity: Typically, not all data at every granularity level is required for every reporting or analysis task, but on aggregated and enriched level. Therefore, for the given data modeling task, apply filters early to reduce the amount of data, conserve memory, and maintain Model speed and efficiency. When aiming for more granularity, allow the user to define an appropriate filter by selecting the desired 'area.'
-
Architecture consistency: Ensure consistency in building Models. Models involve various steps, from simple data transformations to complex logic modeling. Since data is retrieved from different Source Systems, columns or types may be similar but not equivalent. To match the final required data quality and structure, transformation steps such as renaming columns, deleting columns, and converting columns to specific data types are necessary. It's recommended to perform these steps before applying more complex merge logic or calculations, and to follow a specific order in every Model you create, such as:
- Delete unnecessary columns.
- Rename columns to match the final data structure for data merging.
- Convert data to the appropriate data types.
- After completing these basic steps, proceed with advanced logic. One can think of the traditional 'ETL' process layers in data warehouses. They should also be defined consistently.
-
Description and documentation: Apply consistent naming conventions. For example, if you start with CamelCase, stick to it. If you prefer UPPERCASENAMES, be consistent throughout the modeling process. Additionally, use the description feature in the Modeler to document how and why each operation is performed.
-
Cardinality: Avoid multiplying cardinality. Instead, perform summation by iterating through filter sets. Cardinality in the context of data modeling refers to the count of unique values within a dataset or the distinctiveness of elements in a set. 'Avoid multiplying cardinality' means, not to unnecessarily increase the number of unique values, which could lead to larger and more complex datasets. Instead, the recommendation is to use aggregation and selective filtering to achieve desired analytical outcomes without unnecessarily expanding the uniqueness of values in the dataset. This approach helps to maintain simplicity and efficiency in data modeling processes.
Versioning
For documenation, maintenance and compliance compatilibilty or simple for tracking changes it is possible to work with different versions of the models.
Versions Manager
In the menu of the modeler there is a button for opening the Versions Manager:
A window opens with the following information:
- Comp.: Possibility to set a check.
- Title: Provides information about the version of the Model. This is set when saving the Model.
- Description: Offers more detailed information about the version of the Model. This is set when saving the Model.
- Change Date: Displays the time and date when the Model was changed.
- User Id: Identifies the user responsible for the changes.
- Version Type: Indicates the type of the version. Only the published version is publicly available.
- Restore: A button for restoring the version.
Any changes made in the modeler will update the 'current draft' version, which can only be tested within the modeler. When executing a model, the latest published version will always be used. You can revert to an earlier published version at any time, but note that this will override all changes made in the 'current draft' version.
Saving Model Versions
When saving the model there is the possibility to 'save current draft' or 'publish new version'. Draft versions are only available in the Modeler, while published versions are available for everyone (or for the memebers of the Workspace the Model is located to).
There can only be one draft at a time, which is the version opened in the Modeler. If the model is to be saved as a new published version, providing a title is mandatory, while a description is optional. Please ensure that the title and description are meaningful to enhance maintainability and clarity.