A collection of information technology, consulting, and business knowledge
Category: Data Modeling
Data modeling is the process of creating a data model for the data to be stored in a Database. This data model is a conceptual representation of Data objects, The associations between different data objects, The rules.
Data modeling helps in the visual representation of data and enforces business rules, regulatory compliances, and government policies on the data. Data Models ensure consistency in naming conventions, default values, semantics, security while ensuring quality of the data.
Data model emphasizes on what data is needed and how it should be organized instead of what operations need to be performed on the data. Data Model is like architect’s building plan which helps to build a conceptual model and set the relationship between data items.
A coworker recently asked a question as to whether denodo
generated joins automatically from source RDBMS database schema. After searching, a few snippets of
information became obvious. First, that
the subject of inheriting join properties was broader than joins and needed to
in modeling associations (joins on demand). Second, that there were some denodo
design best practices to be considered to optimize associations.
Denodo Automatically Generate Joins From the Source System?
After some research, the short answer is no.
Denodo Inherit Accusations From A Logical Model?
The short answer is yes.
Denodo bridges allow models to be passed to and from other
modeling tools, it is possible to have the association build automatically,
using the top-down approach design approach and importing a model, at the
Interface View level, which is the topmost level of the top-down design
However, below the Interface view level, associations and or joins are created manually by the developer.
A semantic data model is a method of organizing and representing corporate data that reflects the meaning and relationships among data items. This method of organizing data helps end users access data autonomously using familiar business terms such as revenue, product, or customer via the BI (business intelligence) and other analytics tools. The use of a semantic model offers a consolidated, unified view of data across the business allowing end-users to obtain valuable insights quickly from large, complex, and diverse data sets.
What is the purpose
of semantic data modeling in BI and data virtualization?
A semantic data model sits between a reporting tool and the original database in order to assist end-users with reporting. It is the main entry point for accessing data for most organizations when they are running ad hoc queries or creating reports and dashboards. It facilitates reporting and improvements in various areas, such as:
No relationships or joins for end-users to worry about because they’ve already been handled in the semantic data model
Data such as invoice data, salesforce data, and inventory data have all been pre-integrated for end-users to consume.
Columns have been renamed into user-friendly names such as Invoice Amount as opposed to INVAMT.
The model includes powerful time-oriented calculations such as Percentage in sales since last quarter, sales year-to-date, and sales increase year over year.
Business logic and calculations are centralized in the semantic data model in order to reduce the risk of incorrect recalculations.
Data security can be incorporated. This might include exposing certain measurements to only authorized end-users and/or standard row-level security.
A well-designed semantic data model with agile tooling allows end-users to learn and understand how altering their queries results in different outcomes. It also gives them independence from IT while having confidence that their results are correct.
denodo 7.0 saves some manual coding when building the ‘Base Views’ by performing some initial data type conversions from ANSI SQL type to denodo Virtual DataPort data types. So, where is a quick reference mapping to show to what the denodo Virtual DataPort Data Type mappings are:
ANSI SQL types To Virtual DataPort Data types Mapping
ANSI SQL Type
Virtual DataPort Type
BIT VARYING (n)
CHARACTER VARYING (n)
DECIMAL (n, m)
NUMERIC (n, m)
TIMESTAMP WITH TIME ZONE
VARCHAR ( MAX )
ANSI SQL Type Conversion Notes
The function CAST truncates the output when converting a value to a text, when these two conditions are met:
You specify a SQL type with a length for the target data type. E.g. VARCHAR(20).
And, this length is lower than the length of the input value.
When casting a boolean to an integer, true is mapped to 1 and false to 0.
When analyzing individual column data, at its most foundational level, column data can be classified by their fundamental use/characteristics. Granted, when you start rolling up the structure into multiple columns, table structure and table relationship, then other classifications/behaviors, such as keys (primary and foreign), indexes, and distribution come into play. However, many times when working with existing data sets it is essential to understand the nature the existing data to begin the modeling and information governance process.
Column Data Classification
Generally, individual columns can be classified into the classifications:
Identifier — A column/field which is unique to a row and/or can identify related data (e.g., Person ID, National identifier, ). Basically, think primary key and/or foreign key.
Indicator — A column/field, often called a Flag, that has a binary condition (e.g., True or False, Yes or No, Female or Male, Active or Inactive). Frequently used to identify compliance with complex with a specific business rule.
Code — A column/field that has a distinct and defined set of values, often abbreviated (e.g., State Code, Currency Code)
Temporal — A column/field that contains some type date, timestamp, time, interval, or numeric duration data
Quantity — A column/field that contains a numeric value (decimals, integers, etc.) and is not classified as an Identifier or Code (e.g., Price, Amount, Asset Value, Count)
Text — A column/field that contains alphanumeric values, possibly long text, and is not classified as an Identifier or Code (e.g., Name, Address, Long Description, Short Description)
Large Object (LOB)– A column/field that contains data traditional long text fields or binary data like graphics. The large objects can be broadly classified as Character Large Objects (CLOBs), Binary Large Objects (BLOBs), and Double-Byte Character Large Object (DBCLOB or NCLOB).
A Common Data Model (CDM) is a share data structure designed to provide well-formed and standardized data structures within an industry (e.g. medical, Insurance, etc.) or business channel (e.g. Human resource management, Asset Management, etc.), which can be applied to provide organizations a consistent unified view of business information. These common models can be leveraged as accelerators by organizations form the foundation for their information, including SOA interchanges, Mashup, data vitalization, Enterprise Data Model (EDM), business intelligence (BI), and/or to standardize their data models to improve meta data management and data integration practices.
Adding a forging key to tables in Netezza / PureData is a best practice; especially, when working with dimensionally modeled data warehouse structures and with modern governance, integration (including virtualization), presentation semantics (including reporting, business intelligence and analytics).
Foreign Key (FK) Guidelines
A primary key must be defined on the table and fields (or fields) to which you intend to link the foreign key
Avoid using distribution keys as foreign keys
Foreign Key field should not be nullable
Your foreign key link field(s) must be of the same format(s) (e.g. integer to integer, etc.)
Apply standard naming conventions to constraint name:
Please note that foreign key constraints are not enforced in Netezza
Steps to add a Foreign Key
The process for adding foreign keys involves just a few steps:
Verify guidelines above
Alter table add constraint SQL command
Run statistics, which is optional, but strongly recommended
Basic Foreign Key SQL Command Structure
Here is the basic syntax for adding Foreign key:
ALTER TABLE <<Owner>>.<<NAME_OF_TABLE_BEING_ALTERED>>
ADD CONSTRAINT <<Constraint_Name>>_fk<Number>>
FOREIGN KEY (<<Field_Name or Field_Name List>>) REFERENCES <<Owner>>.<<target_FK_Table_Name>.(<<Field_Name or Field_Name List>>) <<On Update | On Delete>> action;
Example Foreign Key SQL Command
This is a simple one field example of the foreign key (FK)
ALTER TABLE Blog.job_stage_fact
ADD CONSTRAINT job_stage_fact_host_dim_fk1
FOREIGN KEY (hostid) REFERENCES Blog.host_dim(hostid) ON DELETE cascade ON UPDATE no action;
PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza database user documentation, Netezza SQL command reference, Alter Table, constraints
A foreign Key (FK) is a constraint that references the unique primary key (PK) of another table.
Facts About Foreign Keys
Foreign Keys act as a cross-reference between tables linking the foreign key (Child record) to the Primary key (parent record) of another table, which establishing a link/relationship between the table keys
Foreign keys are not enforced by all RDBMS
The concept of referential integrity is derived from foreign key theory
Because Foreign keys involve more than one table relationship, their implementation can be more complex than primary keys
A foreign-key constraint implicitly defines an index on the foreign-key column(s) in the child table, however, manually defining a matching index may improve join performance in some database
The SQL, normally, provides the following referential integrity actions for deletions, when enforcing foreign-keys
The deletion of a parent (primary key) record may cause the deletion of corresponding foreign-key records.
Forbids the deletion of a parent (primary key) record, if there are dependent foreign-key records. No Action does not mean to suppress the foreign-key constraint.
The deletion of a parent (primary key) record causes the corresponding foreign-key to be set to null.
The deletion of a record causes the corresponding foreign-keys be set to a default value instead of null upon deletion of a parent (primary key) record