Denodo Data Catalog Roles

The denodo catalog provides the data governance and self-service capabilities to supplement the denodo Virtual DataPort (VDP) core capabilities. Six roles provide the ability to assign or deny capabilities with the denodo data catalog and supplement the database, row, and column security and permissions of denodo Virtual DataPort (VDP).

The Tasks The Roles Can PerformDenodo Data Catalog Role Name
Assign categories, tags and custom properties groups to views and web services.data_catalog_classifier
Edit views, web services, and databases. Create, edit and delete tags, categories, custom properties groups, and custom properties.data_catalog_editor
Can do the same as a user with the roles “data_catalog_editor” and “data_catalog_classifier”.data_catalog_manager
Configure personalization options and content search.data_catalog_content_admin
This role can perform any action of all the other data catalog roles.data_catalog_admin
The exporter role can export the results of a query from the Denodo Data Catalog.data_catalog_exporter
denodo Virtualization
denodo Virtualization

Related References

denodo > User Manuals > Denodo Platform New Features Guide

denodo > User Manuals > Data Catalog Guide > Administration

Denodo Model Best Practices For Creation of Associations

What Are Denodo Associations?

In denodo associations follow the same concept as modeling tools, which can be described as an ‘on-demand join.’

Where Should Associations Be Created In the Denodo Model?

You don’t necessarily need to define an Association at every level; usually, the best practice is to apply associations at the following points:

  • On final views published for data consumers, indicating relationships between related views; Especially, on published web services.
  • On the component views below, any derived view that brings together disparate (dissimilar) data sources.  The associations should be defined as Referential Constraints whenever appropriate to aid the optimization engine.
  • On the component views below, any derived view that joins a “Base View from Query” with standard views, since Base Views from Query cannot be rewritten by the denodo optimization engine.  Often Base Views from Query create performance bottlenecks.

These best practices should cover the majority scenarios; beyond these guidelines, it is best to take an ad-hoc approach to create Associations when you see a specific performance/optimization.

Why Are Associations important in Denodo?

In a nutshell, associations performance and the efficiency of the denodo execution optimizer along with other model metadata, such as:  

  • The SQL of the view(s)
  • Table metadata (Table Keys {PK, FK), Virtual Partitions…etc.)
  • Data statistics, which are used by the Cost Based Optimizer (CBO)

Related References

Associations in Denodo

Importing Associations And Joins From A Database Schema in Denodo

A coworker recently asked a question as to whether denodo generated joins automatically from source RDBMS database schema.  After searching, a few snippets of information became obvious.  First, that the subject of inheriting join properties was broader than joins and needed to in modeling associations (joins on demand). Second, that there were some denodo design best practices to be considered to optimize associations.

Does Denodo Automatically Generate Joins From the Source System?

After some research, the short answer is no.

Can Denodo Inherit Accusations From A Logical Model?

The short answer is yes. 

Denodo bridges allow models to be passed to and from other modeling tools, it is possible to have the association build automatically, using the top-down approach design approach and importing a model, at the Interface View level, which is the topmost level of the top-down design process. 

However, below the Interface view level, associations and or joins are created manually by the developer.

Where Should Associations Be Created?

You don’t necessarily need to define an Association at every level, usually, the best practice is to apply associations at following points:

These best practices should cover the majority scenarios, beyond these guidelines it is best to take an ad-hoc approach to create Associations when you see a specific performance/optimization.

Related References

Associations in Denodo

Netezza / PureData – How to add comments on a field

The ‘Comment on Column’ provides the same self-documentation capability as ‘Comment On table’, but drives the capability to the column field level.  This provides an opportunity to describe the purpose, business meaning, and/or source of a field to other developers and users.  The comment code is part of the DDL and can be migrated with the table structure DDL.  The statement can be run independently or working with Aginity for PureData System for Analytics, they can be run as a group, with the table DDL, using the ‘Execute as a Single Batch (Ctrl+F5) command.

Basic ‘COMMENT ON field’ Syntax

  • The basic syntax to add a comment to a column is:

COMMENT ON COLUMN <<Schema.TableName.ColumnName>> IS ‘<<Descriptive Comment>>’;

Example ‘COMMENT ON Field’ Syntax

  • This is example syntax, which would need to be changed and applied to each column field:

COMMENT ON COLUMN time_dim.time_srky IS ‘time_srky is the primary key and is a surrogate key derived from the date business/natural key’;

Related References

What are the Factor Affecting the Selection of Data Warehouse Naming Convention?

The primary factors affecting the choices in the creation of Data Warehouse (DW) naming convention policy standards are the type of implementation, pattern of the implementation, and any preexisting conventions.

Type of implementation

The type of implementation will affect your naming convention choices. Basically, this boils down to, are you working with a Commercial-Off-The-Shelf (COTS) data warehouse or doing a custom build?

Commercial-Off-The-Shelf (COTS)

If it is a Commercial-Off-The-Shelf (COTS) warehouse, which you are modifying and or enhancing, then it is very strongly recommended that you conform to the naming conventions of the COTS product.  However, you may want to add an identifier to the conventions to identify your custom object.

Using this information as an exemplar:

  • FAV = Favinger, Inc. (Company Name – Custom Identifier)
  • GlobalSales = Global Sales (Subject)
  • MV = Materialized View (Object Type)

Suffix Pattern Naming Convention

<<Custom Identifier>>_<<Object Subject Name>>_<<Object Type>>

Example:  FAV_GlobalSales_MV

Prefix Pattern Naming Convention

<<Object Type>>_<<Custom Identifier>>_<<Object Subject Name>>

Example: MV_FAV_GlobalSales

Custom Data Warehouse Build

If you are creating a custom data warehouse from scratch, then you have more flexibility to choose your naming convention.  However, you will still need to take into account a few factors to achieve the maximum benefit from you naming conventions.

  • What is the high level pattern of you design?
  • Are there any preexisting naming conventions?

Data Warehouse Patterns

Your naming convention will need to take into account the overall intent and design pattern of the data warehouse, the objects and naming conventions of each pattern will vary, if for no other reason than the differences in the objects, their purpose, and the depth of their relationships.

High level Pattern of the Data Warehouse Implementation

The high level pattern of you design whether an Operational Data Store (ODS), Enterprise Data Warehouse (EDW), Data Mart (DM) or something else, will need to guide your naming convention, as the depth of logical and/or processing zone of each pattern will vary  and have some industry generally accepted conventions.

Structural Pattern of the Data Warehouse Implementation

The structural pattern of your data warehouse design whether, Snowflake, 3rd Normal Form, or Star Schema, will  need to guide your naming convention, as the depth of relationships each pattern will vary, have some industry generally accepted conventions, and will relate directly to you High level Data Warehouse pattern.

Preexisting Conventions

Often omitted factor of data warehouse naming conventions are the sources of preexisting conventions, which can have significant impacts both from an engineering and political point of view. The sources of these conventions can vary and may or may not be formally documented.

A common source naming convention conflict is with preexisting implementations, which may not even be document.  However, system objects and consumers are familiar will be exposed to these conventions, will need to be taken into account when accessing impacts to systems, political culture, user training, and the creation of a standard convention for your data warehouse.

The Relational Database Management System (RDBMS) in which you intend to build the data warehouse may have generally accepted conventions, which consumers may be familiar and have a preconceived expectations whether expressed or intended).

Change Management

Whatever data warehouse naming convention you chose, the naming conventions along with the data warehouse design patterns assumptions, should be well documented and placed in a managed and readily accessible, change management (CM) repository.

Related Reference

PureData – Table Effective Practices

Here a few tips, which can make a significant difference in the efficiency and effectiveness of developers and users, making information available to them when developing and creating analytic objects.  This information can, also, be very help to data modelers.  While some of these recommendations are not enforced by Netezza/PureData, this fact makes them no less helpful to your community.

Alter table to Identify Primary Keys (PK)

  • Visually helps developers and users to know what the keys primary keys of the table are
  • Primary key information can, also, be imported as metadata by other IBM tools (e.g. InfoSphere, Datastage, Data Architect, Governance Catalog, Aginity, etc.)
  • The query optimizer will use these definitions to define efficient query execution plans

Alter table to Identify Foreign Keys (FK)

  • Illustrate table relationships for developers and users
  • Foreign key information can, also, be imported as metadata by other IBM tools (e.g. InfoSphere, Datastage, Data Architect, Governance Catalog, Aginity, etc.)
  • The query optimizer will use these definitions to define efficient query execution plans

Limit Distribution Key to Non-Updatable Fields

  • This one seems obvious, but this problem occurs regularly if tables and optimizations are not properly planned; Causing an error will be generated, if an update is attempted against a field contained in the distribution of a table.

Use Null on Fields

  • Using ‘Not Null’ whenever the field data and ETL transformation rules can enforce it, helps improve performance by reducing the number of null condition checks performed and reduces storage.

Use Consistent Field Properties

  • Use the same data type and field length in all tables with the same field name reduces the amount of interpretation/conversion required by the system, developers, and report SQL.

Schedule Table Optimizations

  • Work with your DBA’s to determine the best scheduling time, system user, and priority of groom and generate statistics operations. Keep in mind the relationship to when the optimizations occur in relation to when users need to consume the data. All too often, this operation is not performed before users need the performance and/or is driven by DBA choice, without proper consideration to other processing performance needs.  This has proven, especially true, in data warehousing when the DBA does not have Data warehousing experience and/or does not understand the load patterns of the ETL/ELT process.

Related Links