A Denodo virtualization project typically classifies the
project duties of the primary implementation team into four Primary roles.
Denodo Data Virtualization Project Roles
Data Virtualization Architect
Denodo Platform Administrator
Data Virtualization Developer
Denodo Platform Java Programmer
Data Virtualization Internal Support Team
Project Team Member Alignment
While the denodo project is grouped into security permissions and a set of duties, it is import to note that the assignment of the roles can be very dynamic as to their assignment among project team members. Which team member who performs a given role can change the lifecycle of a denodo project. One team member may hold more than one role at any given time or acquire or lose roles based on the needs of the project.
virtualization Project Roles Duties
The knowledge, responsibilities, and duties of a denodo data
virtualization architect, include:
A Deep understanding of denodo security features
and data governance
Define and document5 best practices for users,
roles, and security permissions.
Have a strong understanding of enterprise
Defines data virtualization architecture and
Guides the definition and documentation of the
virtual data model, including, delivery modes, data sources, data combination,
The knowledge, responsibilities, and duties of a Denodo Platform
Denodo Platform Installation and maintenance, such as,
Installs denodo platform servers
Defines denodo platform update and upgrade policies
Creates, edits, and removes environments, clusters, and servs
Manages denodo licenses
Defines denodo platform backup policies
Defines procedures for artifact promotion between environments
Denodo platform configuration and management, such as,
Configures denodo platform server ports
Platform memory configuration and Java Virtual Machine (VM) options
Set the maximum number of concurrent requests
Set up database configuration
Specific cache server
Authentication configuration for users connecting to denodo platform (e.g., LDAP)
Secures (SSL) communications connections of denodo components
Provides connectivity credentials details for clients tools/applications (JDBC, ODBC,,,etc.)
Configuration of resources.
Setup Version Control System (VCS) configuration for denodo
Creates new Virtual Databases
Create Users, roles, and assigns privileges/roles.
Execute diagnostics and monitoring operations, analyzes logs and identifies potentials issues
Manages load balances variables
The Data Virtualization Developer role is divided into the
the knowledge, responsibilities, and duties of a Denodo Data
Virtualization Developer, by sub-role, Include:
The denodo data engineer’s duties include:
Implements the virtual data model construction
Importing data sources and creating base views,
Creating derived views applying combinations and
transformations to the datasets
Writes documentation, defines testing to eliminate
development errors before code promotion to other environments
The denodo business developer’s duties include:
Creates business vies for a specific business
area from derived and/or interface views
Implements data services delivery
The denodo application developer’s duties include:
Creates reporting vies from business views for
reports and or datasets frequently consumed by users
Denodo Platform Java
The Denodo Platform Java Programmer role is an optional,
specialized, role, which:
Creates custom denodo components, such as data sources, stored procedures, and VDP/iTPilot functions.
Implements custom filters in data routines
Tests and debugs any custom components using Denodo4e
Internal Support Team
The denodo data virtualization internal support team’s duties
Access to and knowledge of the use and trouble
of developed solutions
Tools and procedures to manage and support
project users and developers
Denodo provides some general Virtual Dataport naming
convention recommendations and guidance.
First, there is the general guidance for basic Virtual Dataport object
types and, secondly, more detail naming guidance recommends.
data-driven decision making is at the center of all things. The emergence of
data science and machine learning has further reinforced the importance of data
as the most critical commodity in today’s world. From FAAMG (the biggest five
tech companies: Facebook, Amazon, Apple, Microsoft, and Google) to governments
and non-profits, everyone is busy leveraging the power of data to achieve final
goals. Unfortunately, this growing demand for data has exposed the inefficiency
of the current systems to support the ever-growing data needs. This
inefficiency is what led to the evolution of what we today know as Logical Data
What Is a Logical
simple words, a data lake is a data repository that is capable of storing any
data in its original format. As opposed to traditional data sources that
use the ETL (Extract, Transform, and Load) strategy, data lakes work on the ELT
(Extract, Load, and Transform) strategy. This means data does not have to be
first transformed and then loaded, which essentially translates into reduced
time and efforts. Logical data lakes have captured the attention of
millions as they do away with the need to integrate data from different data
repositories. Thus, with this open access to data, companies can now begin to
draw correlations between separate data entities and use this exercise to their
Primary Use Case
Scenarios of Data Lakes
Logical data lakes are a
relatively new concept, and thus, readers can benefit from some knowledge of
how logical data lakes can be used in real-life scenarios.
Experimental Analysis of Data:
Logical data lakes can
play an essential role in the experimental analysis of data to establish its
value. Since data lakes work on the ELT strategy, they grant deftness and speed
to processes during such experiments.
To store and
analyze IoT Data:
Logical data lakes can
efficiently store the Internet of Things type of data. Data lakes are capable
of storing both relational as well as non-relational data. Under logical data
lakes, it is not mandatory to define the structure or schema of the data
stored. Moreover, logical data lakes can run analytics on IoT data and come up
with ways to enhance quality and reduce operational cost.
To improve Customer
Logical data lakes can
methodically combine CRM data with social media analytics to give businesses an
understanding of customer behavior as well as customer churn and its various
To create a Data
Logical data lakes
contain raw data. Data warehouses, on the other hand, store structured and
filtered data. Creating a data lake is the first step in the process of data
warehouse creation. A data lake may also be used to augment a data warehouse.
reporting and analytical function:
Data lakes can also be
used to support the reporting and analytical function in organizations. By
storing maximum data in a single repository, logical data lakes make it easier
to analyze all data to come up with relevant and valuable findings.
A logical data lake is a comparatively new area of study. However, it can be said with certainty that logical data lakes will revolutionize the traditional data theories.
Tuning SQL is one of those skills, which is part art and part science. However, there are a few fundamental approaches, which can help ensure optimal SQL select statement performance.
Structuring your SQL
By Structuring SQL Statements, much performance can be gained through good SQL statement organization and sound logic.
Where Clause Concepts:
Use criteria ordering and Set Theory thinking. SQL can be coupled with set-theory to aid conception of the operations being conducted. Order your selection criteria to execute criteria which arrives at the smallest possible row set first. Doing so reduces the volume of rows to be processed by follow-on operations. This does require an understanding of the data relationships to be effective.
Join Rules (equijoins, etc.)
When constructing your joins, consider these rules:
Join on keys and indexed columns: The efficiency of your program improves when tables are joined based on indexed columns, rather than on non-indexed ones.
Use equijoins (=), whenever possible
Avoid using of sub-queries
Re-write EXISTS and NOT EXISTS subqueries as outer joins
Avoid OUTER Joins on fields containing nulls
Avoid RIGHT OUTER JOINS: Always select FROM your primary table (or derived table) and LEFT OUTER JOIN to auxiliary tables.
Use Joins Instead of Subqueries: A join can be more efficient than a correlated subquery or a subquery using IN. Use caution when specifying ORDER BY with a Join: When the results of a join must be sorted, limiting the ORDER BY to columns of a single table can cause the database to avoid a sort.
Provide Adequate Search Criteria: When possible, provide additional search criteria in the WHERE clause for every table in a join. These criteria are in addition to the join criteria, which are mandatory to avoid Cartesian products
Order of Operations SQL & “PEMDAS”
To improve your SQL, careful attention needs to be paid to the mathematical order of operations; especially, parentheses since they not only set the order of operation but also the boundaries of each subset operation.
PEMAS is “Parentheses, Exponents, Multiplication and Division, and Addition and Subtraction”.
Use parentheses () to group and specify the order of execution. SQL observes the normal rules of arithmetic operator precedence.
If the parentheses are nested, the expression in the innermost pair is evaluated first. If there are several un-nested parentheses, then parentheses are evaluated left to right.
* / %
Multiplication Division Modulus
If there are several, evaluation is left to right.
If there are several, evaluation is left to right.
Index Leveraging (criteria ordering, hints, append, etc.)
Avoid Full Table Scans: within the scope of a SQL statement, there are many conditions that will cause the SQL optimizer to invoke a full-table scan. Avoid Queries:
with NULL Conditions (Is NUll, Is Not NUll)
Against Unindexed Columns
with Like Conditions
with Not Equals Condition (<>, !=, not in)
with use built-in Function (to_char, substr, decode, UPPER)
Use UNION ALL instead of UNION if business rules allow
UNION: Specifies that multiple result sets are to be combined and returned as a single result set. Query optimizer performs extra work to return to avoid duplicate rows.
UNION ALL: Incorporates all rows into the results. This includes duplicates. Query optimizer just needs to concatenate the result sets with no extra work
Use stored procedures instead of ad hoc queries when possible. Stored procedures are precompiled and cached
Avoid cursor use when possible
Select only the rows needed
Use NOLOCK hint in the select statement to avoid blocking
Commit transactions in smaller batches
Whenever possible use tables instead of views
Make sure comparison columns whether using JOIN or WHERE clause are exactly the same data type. For example, if we are comparing Varchar column to nchar columns the query optimizer has to do a CONVERT before comparing the values
Note: You do not necessarily need to remove all full table scans from your query’s execution plan. Tables with few rows, few columns, or thin columns may fit into few database blocks. In this case, a full table scan will always be the most efficient access
Now, I know, this seems like a simple question, but for folks new to Netezza, this question has come up more than a few times. Also, this choice ultimately impacts how performant Netezza will be once you complete your maintenance operations.
As a general guideline, groom operations should be completed first, then followed by statistics operations.
PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza database user documentation, Netezza SQL command reference, GROOM TABLE