Denodo Model Best Practices For Creation of Associations

What Are Denodo Associations?

In denodo associations follow the same concept as modeling tools, which can be described as an ‘on-demand join.’

Where Should Associations Be Created In the Denodo Model?

You don’t necessarily need to define an Association at every level; usually, the best practice is to apply associations at the following points:

  • On final views published for data consumers, indicating relationships between related views; Especially, on published web services.
  • On the component views below, any derived view that brings together disparate (dissimilar) data sources.  The associations should be defined as Referential Constraints whenever appropriate to aid the optimization engine.
  • On the component views below, any derived view that joins a “Base View from Query” with standard views, since Base Views from Query cannot be rewritten by the denodo optimization engine.  Often Base Views from Query create performance bottlenecks.

These best practices should cover the majority scenarios; beyond these guidelines, it is best to take an ad-hoc approach to create Associations when you see a specific performance/optimization.

Why Are Associations important in Denodo?

In a nutshell, associations performance and the efficiency of the denodo execution optimizer along with other model metadata, such as:  

  • The SQL of the view(s)
  • Table metadata (Table Keys {PK, FK), Virtual Partitions…etc.)
  • Data statistics, which are used by the Cost Based Optimizer (CBO)

Related References

Associations in Denodo

Importing Associations And Joins From A Database Schema in Denodo

A coworker recently asked a question as to whether denodo generated joins automatically from source RDBMS database schema.  After searching, a few snippets of information became obvious.  First, that the subject of inheriting join properties was broader than joins and needed to in modeling associations (joins on demand). Second, that there were some denodo design best practices to be considered to optimize associations.

Does Denodo Automatically Generate Joins From the Source System?

After some research, the short answer is no.

Can Denodo Inherit Accusations From A Logical Model?

The short answer is yes. 

Denodo bridges allow models to be passed to and from other modeling tools, it is possible to have the association build automatically, using the top-down approach design approach and importing a model, at the Interface View level, which is the topmost level of the top-down design process. 

However, below the Interface view level, associations and or joins are created manually by the developer.

Where Should Associations Be Created?

You don’t necessarily need to define an Association at every level, usually, the best practice is to apply associations at following points:

These best practices should cover the majority scenarios, beyond these guidelines it is best to take an ad-hoc approach to create Associations when you see a specific performance/optimization.

Related References

Associations in Denodo

Why Business Intelligence (BI) needs a Semantic Data Model

A semantic data model is a method of organizing and representing corporate data that reflects the meaning and relationships among data items. This method of organizing data helps end users access data autonomously using familiar business terms such as revenue, product, or customer via the BI (business intelligence) and other analytics tools. The use of a semantic model offers a consolidated, unified view of data across the business allowing end-users to obtain valuable insights quickly from large, complex, and diverse data sets.

What is the purpose of semantic data modeling in BI and data virtualization?

A semantic data model sits between a reporting tool and the original database in order to assist end-users with reporting. It is the main entry point for accessing data for most organizations when they are running ad hoc queries or creating reports and dashboards. It facilitates reporting and improvements in various areas, such as:

  • No relationships or joins for end-users to worry about because they’ve already been handled in the semantic data model
  • Data such as invoice data, salesforce data, and inventory data have all been pre-integrated for end-users to consume.
  • Columns have been renamed into user-friendly names such as Invoice Amount as opposed to INVAMT.
  • The model includes powerful time-oriented calculations such as Percentage in sales since last quarter, sales year-to-date, and sales increase year over year.
  • Business logic and calculations are centralized in the semantic data model in order to reduce the risk of incorrect recalculations.
  • Data security can be incorporated. This might include exposing certain measurements to only authorized end-users and/or standard row-level security.

A well-designed semantic data model with agile tooling allows end-users to learn and understand how altering their queries results in different outcomes. It also gives them independence from IT while having confidence that their results are correct.

Data Modeling – Fact Table Effective Practices

Database Table
Database Table

Here are a few guidelines for modeling and designing fact tables.

Fact Table Effective Practices

  • The table naming convention should identify it as a fact table. For example:
    • Suffix Pattern:
      • <<TableName>>_Fact
      • <<TableName>>_F
    • Prefix Pattern:
      • FACT_<TableName>>
      • F_<TableName>>
    • Must contain a temporal dimension surrogate key (e.g. date dimension)
    • Measures should be nullable – this has an impact on aggregate functions (SUM, COUNT, MIN, MAX, and AVG, etc.)
    • Dimension Surrogate keys (srky) should have a foreign key (FK) constraint
    • Do not place the dimension processing in the fact jobs

Related References

Data Modeling – Dimension Table Effective Practices

Database Table
Database Table

I’ve had these notes laying around for a while, so, I thought I consolidate them here.   So, here are few guidelines to ensure the quality of your dimension table structures.

Dimension Table Effective Practices

  • The table naming convention should identify it as a dimension table. For example:
    • Suffix Pattern:
      • <<TableName>>_Dim
      • <<TableName>>_D
    • Prefix Pattern:
      • Dim_<TableName>>
      • D_<TableName>>
  • Have Primary Key (PK) assigned on table surrogate Key
  • Audit fields – Type 1 dimensions should:
    • Have a Created Date timestamp – When the record was initially created
    • have a Last Update Timestamp – When was the record last updated
  • Job Flow: Do not place the dimension processing in the fact jobs.
  • Every Dimension should have a Zero (0), Unknown, row
  • Fields should be ‘NOT NULL’ replacing nulls with a zero (0) numeric and integer type fields or space ( ‘ ‘ ) for Character type files.
  • Keep dimension processing outside of the fact jobs

Related References

Netezza / PureData – How to add comments on a field

The ‘Comment on Column’ provides the same self-documentation capability as ‘Comment On table’, but drives the capability to the column field level.  This provides an opportunity to describe the purpose, business meaning, and/or source of a field to other developers and users.  The comment code is part of the DDL and can be migrated with the table structure DDL.  The statement can be run independently or working with Aginity for PureData System for Analytics, they can be run as a group, with the table DDL, using the ‘Execute as a Single Batch (Ctrl+F5) command.

Basic ‘COMMENT ON field’ Syntax

  • The basic syntax to add a comment to a column is:

COMMENT ON COLUMN <<Schema.TableName.ColumnName>> IS ‘<<Descriptive Comment>>’;

Example ‘COMMENT ON Field’ Syntax

  • This is example syntax, which would need to be changed and applied to each column field:

COMMENT ON COLUMN time_dim.time_srky IS ‘time_srky is the primary key and is a surrogate key derived from the date business/natural key’;

Related References