The ‘Comment on Column’ provides the same self-documentation capability as ‘Comment On table’, but drives the capability to the column field level. This provides an opportunity to describe the purpose, business meaning, and/or source of a field to other developers and users. The comment code is part of the DDL and can be migrated with the table structure DDL. The statement can be run independently, or working with Aginity for PureData System for Analytics, they can be run as a group, with the table DDL, using the ‘Execute as a Single Batch (Ctrl+F5) command.
Basic ‘COMMENT ON field’ Syntax
The basic syntax to add a comment to add a comment to a column is:
COMMENT ON COLUMN <<Schema.TableName.ColumnName>> IS ‘<<Descriptive Comment>>’;
Example ‘COMMENT ON Field’ Syntax
This is example syntax, which would need to be changed and applied to each column field:
COMMENT ON COLUMN time_dim.time_srky IS ‘time_srky is the primary key and is a surrogate key derived from the date business/natural key’;
The primary factors affecting the choices in the creation of Data Warehouse (DW) naming convention policy standards are the type of implementation, pattern of the implementation, and any preexisting conventions.
Type of implementation
The type of implementation will affect your naming convention choices. Basically, this boils down to, are you working with a Commercial-Off-The-Shelf (COTS) data warehouse or doing a custom build?
If it is a Commercial-Off-The-Shelf (COTS) warehouse, which you are modifying and or enhancing, then it is very strongly recommended that you conform to the naming conventions of the COTS product. However, you may want to add an identifier to the conventions to identify your custom object.
Using this information as an exemplar:
FAV = Favinger, Inc. (Company Name – Custom Identifier)
If you are creating a custom data warehouse from scratch, then you have more flexibility to choose your naming convention. However, you will still need to take into account a few factors to achieve the maximum benefit from you naming conventions.
What is the high level pattern of you design?
Are there any preexisting naming conventions?
Data Warehouse Patterns
Your naming convention will need to take into account the overall intent and design pattern of the data warehouse, the objects and naming conventions of each pattern will vary, if for no other reason than the differences in the objects, their purpose, and the depth of their relationships.
High level Pattern of the Data Warehouse Implementation
The high level pattern of you design whether an Operational Data Store (ODS), Enterprise Data Warehouse (EDW), Data Mart (DM) or something else, will need to guide your naming convention, as the depth of logical and/or processing zone of each pattern will vary and have some industry generally accepted conventions.
Structural Pattern of the Data Warehouse Implementation
The structural pattern of your data warehouse design whether, Snowflake, 3rd Normal Form, or Star Schema, will need to guide your naming convention, as the depth of relationships each pattern will vary, have some industry generally accepted conventions, and will relate directly to you High level Data Warehouse pattern.
Often omitted factor of data warehouse naming conventions are the sources of preexisting conventions, which can have significant impacts both from an engineering and political point of view. The sources of these conventions can vary and may or may not be formally documented.
A common source naming convention conflict is with preexisting implementations, which may not even be document. However, system objects and consumers are familiar will be exposed to these conventions, will need to be taken into account when accessing impacts to systems, political culture, user training, and the creation of a standard convention for your data warehouse.
The Relational Database Management System (RDBMS) in which you intend to build the data warehouse may have generally accepted conventions, which consumers may be familiar and have a preconceived expectations whether expressed or intended).
Whatever data warehouse naming convention you chose, the naming conventions along with the data warehouse design patterns assumptions, should be well documented and placed in a managed and readily accessible, change management (CM) repository.
Here a few tips, which can make a significant difference in the efficiency and effectiveness of developers and users, making information available to them when developing and creating analytic objects. This information can, also, be very help to data modelers. While some of these recommendations are not enforced by Netezza/PureData, this fact makes them no less helpful to your community.
Alter table to Identify Primary Keys (PK)
Visually helps developers and users to know what the keys primary keys of the table are
Primary key information can, also, be imported as metadata by other IBM tools (e.g. InfoSphere, Datastage, Data Architect, Governance Catalog, Aginity, etc.)
The query optimizer will use these definitions to define efficient query execution plans
Alter table to Identify Foreign Keys (FK)
Illustrate table relationships for developers and users
Foreign key information can, also, be imported as metadata by other IBM tools (e.g. InfoSphere, Datastage, Data Architect, Governance Catalog, Aginity, etc.)
The query optimizer will use these definitions to define efficient query execution plans
Limit Distribution Key to Non-Updatable Fields
This one seems obvious, but this problem occurs regularly if tables and optimizations are not properly planned; Causing an error will be generated, if an update is attempted against a field contained in the distribution of a table.
Use Null on Fields
Using ‘Not Null’ whenever the field data and ETL transformation rules can enforce it, helps improve performance by reducing the number of null condition checks performed and reduces storage.
Use Consistant Field Properties
Use the same data type and field length in all tables with the same field name reduces the amount of interpretation/conversion required by the system, developers, and report SQL.
Schedule Table Optimizations
Work with your DBA’s to determine the best scheduling time, system user, and priority of groom and generate statistics operations. Keep in mind the relationship to when the optimizations occur in relation to when users need to consume the data. All too often, this operation is not performed before users need the performance and/or is driven by DBA choice, without proper consideration to other processing performance needs. This has proven, especially true, in data warehousing when the DBA does not have Data warehousing experience and/or does not understand the load patterns of the ETL/ELT process.
Each application in InfoSphere Information Server Architecture has a primary function, which can by synopsized as follows:
IBM InfoSphere Blueprint Director is aimed at the Information Architect designing solution architectures for information-intensive projects.
IBM Business Analytics software suit, which provides Semantics, Analytics, Reporting, Data Discovery, and Self-Service BI.
Admin workspaces to investigate data, deploy applications, Web services, and monitor schedules and logs.
Data Architect is an enterprise data modeling and integration design tool. You can use it to discover, model, visualize, relate, and standardize diverse and distributed data assets, including dimensional models.
Data Click is an exciting new capability that helps novices and business users retrieve data and provision systems easily in only a few clicks.
DataStage is a data integration tool that enables users to move and transform data between operational, transactional, and analytical target systems.
Discovery is used to identify the transformation rules that have been applied to source system data to populate a target. Once accurately defined, these business objects and transformation rules provide the essential input into information-centric projects.
FastTrack streamlines collaboration between business analysts, data modelers, and developers by capturing and defining business requirements in a common format and, then, transforming that business logic (Source-to-Target-Mapping (STTM)) directly into DataStage ETL jobs.
The Governance Catalog includes business glossary assets (categories, terms, information governance policies, and information governance rules) and information assets.
Information Analyzer provides capabilities to profile and analyze data.
Information Services Director
Information Services Director provides a unified and consistent way to publish and manage shared information services in a service-oriented architecture (SOA).
Metadata Asset Manager
Import, export, and manage common metadata assets in Metadata Repository and across applications
QualityStage provides data cleansing capabilities to help ensure quality and consistency by standardizing, validating, matching, and merging information to create comprehensive and authoritative information.
Deployment tool to move, deploy, and control DataStage and QualityStage assets.
The CRIMC1029E / CRIMC1085E errors may be caused by running the incorrect Infosphere Data Architect installer. If you run the admin installer (launchpad.exe) on 64bit windows with insufficient privileges, the process will throw a CRIMC1029E / CRIMC1085E error.
What the Error Looks Like
CRIMC1029E: Adding plug-in com.ibm.etools.cobol.win32_7.0.921.v20140409_0421 to repository E:\Program Files\IBM\SDPShared failed.
Data modeling is the documenting of data relationships, characteristics, and standards based on its intended use of the data. Data modeling techniques and tools capture and translate complex system designs into easily understood representations of the data creating a blueprint and foundation for information technology development and reengineering.
A data model can be thought of as a diagram that illustrates the relationships between data. Although capturing all the possible relationships in a data model can be very time-intensive, well-documented models allow stakeholders to identify errors and make changes before any programming code has been written.
Data modelers often use multiple models to view the same data and ensure that all processes, entities, relationships, and data flows have been identified.
There are several different approaches to data modeling, including:
Concept Data Model (CDM)
The Concept Data Model (CDM) identifies the high-level information entities, their relationships, and organized in the Entity Relationship Diagram (ERD).
Logical Data Model (LDM)
The Logical Data Model (LDM) defines detail business information (in business terms) within each of the Concept Data Model and is a refinement of the information entities of the Concept Data Model. Logical data models are a non-RDBMS specific business definition of tables, fields, and attributes contained within each information entity from which the Physical Data Model (PDM) and Entity Relationship Diagram (ERD) is produced.
Physical Data Model (PDM)
The Physical Data Model (PDM) provides the actual technical details of the model and database object (e.g. table names, field names, etc.) to facilitate the creation of accurate detail technical designs and actual database creation. Physical Data Models are RDBMS specific definition of the logical model used build database, create deployable DDL statements, and to produce the Entity Relationship Diagram (ERD).