Denodo Data Consolidation & Denormalization

Denodo Data Consolidation & Denormalization
Denodo Data Consolidation & Denormalization

As the data modeling process in denodo moves through the conceptual layers of the data warehouse, there is an evolution of the data structure and their associated metadata.

The Base Layer

As the modeling process begins the base layer is the ingestion layer, where the source system data structures are recreated in denodo and field are transformed in denodo Virtual Query Language (VQL) data types. the Business layer is what folks with a traditional data warehousing background would think of as Staging or landing. These base layer views should most closely mirror the technical structure and data characteristics of the input data source and will be the least business friend in their organization, naming, and metadata.

The Semantics layer

The semantics layer is where the major data reorganization, data transformation, and the application of business friend field names and metadata begins. The semantics layer is what folks with a traditional data warehousing background would think of as the Data Warehouse (DW) or Enterprise Data Warehouse (EDW). The semantics layer of the logical data warehouse (LDW) performs serval tasks:

  • Data from multiple input sources are consolidated
  • The model becomes multi-dimensional (Fact and Dimension oriented)
  • Field names and descriptive metadata are changed to meaningful, domain normalized, business-friendly names and descriptions.
  • Domain normalizing business rules and transformations are applied.
  • Serves as a data source for the business layer and reporting layer.

The Business Layer

The business layer, which is considered optional by denodo, is modeled along a more narrow business subject orientation and more specialized business rules are applied. This is what folks with a traditional data warehousing background would think of as a Datamart (DM).

The business layer of the logical data warehouse (LDW) performs serval tasks:

  • Limits and optimizes the data to facilitate business intelligence and report activities concerning a specific line of business or business topic (e.g. Financials, Human Resources, Inventory, Asset management, etc. )
  • Business-specific/customized rules and metadata are applied
  • Supplements the semantic layer and serves as a data source for the reporting layer.
  • Additional data consolidation and data structure denormalization (flattening) may occur in the business layer

The Reporting Layer

The reporting layer, which is considered optional by denodo, is the most customized layer and sees the most reporting topic specialization and specific need transformation. The reporting layer is where a traditional data warehousing may provide customized reporting, or system interface views, interface ETL’s to produce interface files, and reporting team do more of their own development.

The reporting layer of the logical data warehouse (LDW) performs serval tasks:

  • Provides consumer-specific customized rules and metadata
  • Provides consumer-specific data organization/layouts
  • Data is optimized for consumer purposes and may be highly or entirely denormalized to meet consumer needs.

Denodo Best Practices For Base Views

The denodo “Base Layer” in the Logical Data Warehouse (LDW) can be thought of as the Data Staging local layer in a more traditional data warehouse (DW) development pattern.  The Base layer is the level at which the source system data structures are transformed into denodo field types and the source data structures are rendered as created in base views (bv).

Base views (bv), the first step in virtualizing data, are the denodo structures reflecting the source system structure and the second step behind the data source connection and, therefore, are essential elements for the other layers of the Logical Data Warehouse (LDW).  To provide some guidance to facilitate the usefulness and performance of base views here are some best practices:

  • Use consistent Object Naming conventions.  It is strongly recommended that the denodo standard naming conventions be used.
  • Import and or create the Primary Keys (PK), Foreign Keys (FK), and Associations.
  • Have Statistics Collection been set and include all critical fields?
  • Base views, as a rule, should not be cached unless absolutely necessary for reasons of performance.
  • Create indexes on Primary Keys (PK), Surrogate keys, and Foreign Keys (FK)
  • Create performance Indexes to mirror sources system to improve performance.
  • Populate view Metadata properties describing the type and the nature of the data which the view contains. Note:  if you have a governance team, they may want to manage the metadata in the denodo data catalog.
  • Retain the original table name (applying naming convention prefix and field names to facilitate data Lineage traceability.?
  • Use denodo tools against tables, where possible, rather than (Manual) SQL views, Database views, or Stored Procedure.  Denodo cannot rewrite or optimize these objects.
  • Field metadata should be annotated with “Not Used” if the field is always null, blank, or empty.  This saves time and labor when working with levels and researching data issues

Related Reference

Denodo Data Catalog Roles

The denodo catalog provides the data governance and self-service capabilities to supplement the denodo Virtual DataPort (VDP) core capabilities. Six roles provide the ability to assign or deny capabilities with the denodo data catalog and supplement the database, row, and column security and permissions of denodo Virtual DataPort (VDP).

The Tasks The Roles Can PerformDenodo Data Catalog Role Name
Assign categories, tags and custom properties groups to views and web services.data_catalog_classifier
Edit views, web services, and databases. Create, edit and delete tags, categories, custom properties groups, and custom properties.data_catalog_editor
Can do the same as a user with the roles “data_catalog_editor” and “data_catalog_classifier”.data_catalog_manager
Configure personalization options and content search.data_catalog_content_admin
This role can perform any action of all the other data catalog roles.data_catalog_admin
The exporter role can export the results of a query from the Denodo Data Catalog.data_catalog_exporter
denodo Virtualization
denodo Virtualization

Related References

denodo > User Manuals > Denodo Platform New Features Guide

denodo > User Manuals > Data Catalog Guide > Administration

Denodo Model Best Practices For Creation of Associations

What Are Denodo Associations?

In denodo associations follow the same concept as modeling tools, which can be described as an ‘on-demand join.’

Where Should Associations Be Created In the Denodo Model?

You don’t necessarily need to define an Association at every level; usually, the best practice is to apply associations at the following points:

  • On final views published for data consumers, indicating relationships between related views; Especially, on published web services.
  • On the component views below, any derived view that brings together disparate (dissimilar) data sources.  The associations should be defined as Referential Constraints whenever appropriate to aid the optimization engine.
  • On the component views below, any derived view that joins a “Base View from Query” with standard views, since Base Views from Query cannot be rewritten by the denodo optimization engine.  Often Base Views from Query create performance bottlenecks.

These best practices should cover the majority scenarios; beyond these guidelines, it is best to take an ad-hoc approach to create Associations when you see a specific performance/optimization.

Why Are Associations important in Denodo?

In a nutshell, associations performance and the efficiency of the denodo execution optimizer along with other model metadata, such as:  

  • The SQL of the view(s)
  • Table metadata (Table Keys {PK, FK), Virtual Partitions…etc.)
  • Data statistics, which are used by the Cost Based Optimizer (CBO)

Related References

Associations in Denodo

Importing Associations And Joins From A Database Schema in Denodo

A coworker recently asked a question as to whether denodo generated joins automatically from source RDBMS database schema.  After searching, a few snippets of information became obvious.  First, that the subject of inheriting join properties was broader than joins and needed to in modeling associations (joins on demand). Second, that there were some denodo design best practices to be considered to optimize associations.

Does Denodo Automatically Generate Joins From the Source System?

After some research, the short answer is no.

Can Denodo Inherit Accusations From A Logical Model?

The short answer is yes. 

Denodo bridges allow models to be passed to and from other modeling tools, it is possible to have the association build automatically, using the top-down approach design approach and importing a model, at the Interface View level, which is the topmost level of the top-down design process. 

However, below the Interface view level, associations and or joins are created manually by the developer.

Where Should Associations Be Created?

You don’t necessarily need to define an Association at every level, usually, the best practice is to apply associations at following points:

These best practices should cover the majority scenarios, beyond these guidelines it is best to take an ad-hoc approach to create Associations when you see a specific performance/optimization.

Related References

Associations in Denodo

Why Business Intelligence (BI) needs a Semantic Data Model

A semantic data model is a method of organizing and representing corporate data that reflects the meaning and relationships among data items. This method of organizing data helps end users access data autonomously using familiar business terms such as revenue, product, or customer via the BI (business intelligence) and other analytics tools. The use of a semantic model offers a consolidated, unified view of data across the business allowing end-users to obtain valuable insights quickly from large, complex, and diverse data sets.

What is the purpose of semantic data modeling in BI and data virtualization?

A semantic data model sits between a reporting tool and the original database in order to assist end-users with reporting. It is the main entry point for accessing data for most organizations when they are running ad hoc queries or creating reports and dashboards. It facilitates reporting and improvements in various areas, such as:

  • No relationships or joins for end-users to worry about because they’ve already been handled in the semantic data model
  • Data such as invoice data, salesforce data, and inventory data have all been pre-integrated for end-users to consume.
  • Columns have been renamed into user-friendly names such as Invoice Amount as opposed to INVAMT.
  • The model includes powerful time-oriented calculations such as Percentage in sales since last quarter, sales year-to-date, and sales increase year over year.
  • Business logic and calculations are centralized in the semantic data model in order to reduce the risk of incorrect recalculations.
  • Data security can be incorporated. This might include exposing certain measurements to only authorized end-users and/or standard row-level security.

A well-designed semantic data model with agile tooling allows end-users to learn and understand how altering their queries results in different outcomes. It also gives them independence from IT while having confidence that their results are correct.

ETL Vs. EAI

Over recent years, business enterprises relying on accurate and consistent data to make informed decisions have been gravitating towards integration technologies. The subject of Enterprise Application Integration (EAI) and Extraction, Transformation & Loading (ETL) lately seems to pop up in most Enterprise Information Management conversations.

From an architectural perspective, both techniques share a striking similarity. However, they essentially serve different purposes when it comes to information management. We’ve decided to do a little bit of research and establish the differences between the two integration technologies.

Enterprise Application Integration

EAI is an integration framework that consists of technologies and services, allowing for seamless coordination of vital systems, processes, as well as databases across an enterprise.

Simply put, this integration technique simplifies and automates your business processes to a whole new level without necessarily having to make major changes to your existing data structures or applications.

With EAI, your business can integrate essential systems like supply chain management, customer relationship management, business intelligence, enterprise resource planning, and payroll. Well, the linking of these apps can be done at the back end via APIs or the front end GUI.

The systems in question might use different databases, computer languages, exist on different operating systems or older systems that might not be supported by the vendor anymore.

The objective of EAI is to develop a single, unified view of enterprise data and information, as well as ensure the information is correctly stored, transmitted, and reflected. It enables existing applications to communicate and share data in real-time.

Extraction, Transformation & Loading

The general purpose of an ETL system is to extract data out of one or more source databases and then transfer it to a target destination system for better user decision making. Data in the target system is usually presented differently from the sources.

The extracted data goes through the transformation phase, which involves checking for data integrity and converting the data into a proper storage format or structure. It is then moved into other systems for analysis or querying function.

With data loading, it typically involves writing data into the target database destination like data warehouse and operational data store.

ETL can integrate data from multiple systems. The systems we’re talking about in this case are often hosted on separate computer hardware or supported by different vendors.

Differences between ETL and EAI

EAI System

  • Retrieves small amounts of data in one operation and is characterized by a high number of transactions
  • EAI system is utilized for process optimization and workflow
  • The system does not require user involvement after it’s implemented
  • Ensures a bi-directional data flow between the source and target applications
  • Ideal for real-time business data needs
  • Limited data validation
  • Integrating operations is pull, push, and event-driven.

ETL System

  • It is a one-way process of creating a historical record from homogeneous or heterogeneous sources
  • Mainly designed to process large batches of data from source systems
  • Requires extensive user involvement
  • Meta-data driven complex transformations
  • Integrating operation is a pull, query-driven
  • Supports proper profiling and data cleaning
  • Limited messaging capabilities

Both integration technologies are an essential part of EIM, as they provide strong capabilities for business intelligence initiatives and reporting. They can be used differently and sometimes in mutual consolidation.