Netezza/ PureData – how to add a primary key

While primary keys (PK) are not enforced within Netezza, they still provide significant value and should be added.  Among the values that adding primary key provides are:

  • Inform tools, which have meta Data import capabilities; for example, Aginity, ETL tools, data modeling tools, Infosphere Data Architect, DataStage and DataQuality, Infosphere Information Server suite of tools (e.g. Governance Console, Information analyzer, etc.).
  • Visually helps developers, data modelers, and users to know what the keys primary keys of the table are, which may not be obvious with the table structure. This is especially true for table utilizing compound keys as the primary key.
  • The query optimizer will use these definitions define efficient query execution plans
  • Identifying Primary Keys information provides migratable self-documenting Meta Data
  • Aides in the facilitation of future data and application enrichment projects

Basic Primary Key syntax

ALTER TABLE <<schema.tablename>> ADD CONSTRAINT <<ConstraintName>> PRIMARY KEY (FieldNames);

Example Primary Key syntax

ALTER TABLEtime_dim ADD CONSTRAINT time_dim_pk PRIMARY KEY (time_srky);

Related References

What are the Factor Affecting the Selection of Data Warehouse Naming Convention?

The primary factors affecting the choices in the creation of Data Warehouse (DW) naming convention policy standards are the type of implementation, pattern of the implementation, and any preexisting conventions.

Type of implementation

The type of implementation will affect your naming convention choices. Basically, this boils down to, are you working with a Commercial-Off-The-Shelf (COTS) data warehouse or doing a custom build?

Commercial-Off-The-Shelf (COTS)

If it is a Commercial-Off-The-Shelf (COTS) warehouse, which you are modifying and or enhancing, then it is very strongly recommended that you conform to the naming conventions of the COTS product.  However, you may want to add an identifier to the conventions to identify your custom object.

Using this information as an exemplar:

  • FAV = Favinger, Inc. (Company Name – Custom Identifier)
  • GlobalSales = Global Sales (Subject)
  • MV = Materialized View (Object Type)

Suffix Pattern Naming Convention

<<Custom Identifier>>_<<Object Subject Name>>_<<Object Type>>

Example:  FAV_GlobalSales_MV

Prefix Pattern Naming Convention

<<Object Type>>_<<Custom Identifier>>_<<Object Subject Name>>

Example: MV_FAV_GlobalSales

Custom Data Warehouse Build

If you are creating a custom data warehouse from scratch, then you have more flexibility to choose your naming convention.  However, you will still need to take into account a few factors to achieve the maximum benefit from you naming conventions.

  • What is the high level pattern of you design?
  • Are there any preexisting naming conventions?

Data Warehouse Patterns

Your naming convention will need to take into account the overall intent and design pattern of the data warehouse, the objects and naming conventions of each pattern will vary, if for no other reason than the differences in the objects, their purpose, and the depth of their relationships.

High level Pattern of the Data Warehouse Implementation

The high level pattern of you design whether an Operational Data Store (ODS), Enterprise Data Warehouse (EDW), Data Mart (DM) or something else, will need to guide your naming convention, as the depth of logical and/or processing zone of each pattern will vary  and have some industry generally accepted conventions.

Structural Pattern of the Data Warehouse Implementation

The structural pattern of your data warehouse design whether, Snowflake, 3rd Normal Form, or Star Schema, will  need to guide your naming convention, as the depth of relationships each pattern will vary, have some industry generally accepted conventions, and will relate directly to you High level Data Warehouse pattern.

Preexisting Conventions

Often omitted factor of data warehouse naming conventions are the sources of preexisting conventions, which can have significant impacts both from an engineering and political point of view. The sources of these conventions can vary and may or may not be formally documented.

A common source naming convention conflict is with preexisting implementations, which may not even be document.  However, system objects and consumers are familiar will be exposed to these conventions, will need to be taken into account when accessing impacts to systems, political culture, user training, and the creation of a standard convention for your data warehouse.

The Relational Database Management System (RDBMS) in which you intend to build the data warehouse may have generally accepted conventions, which consumers may be familiar and have a preconceived expectations whether expressed or intended).

Change Management

Whatever data warehouse naming convention you chose, the naming conventions along with the data warehouse design patterns assumptions, should be well documented and placed in a managed and readily accessible, change management (CM) repository.

Related Reference

IIS Datastage Naming Conventions and Standards

The standardized naming conventions ease the burden on developers switching from one project to another.  Knowing the names and where things are located are very useful to understand jobs or sequences and aides in the time to productivity of new team members.  This is also true for augmentation resources, which may need to be brought in assist your team.

The following tables identify DataStage elements and their standard naming convention.

Job Naming Conventions

Parallel Job <<Application>>_<<job_Name>>_Prl
Server Job <<Application>>_<<job_Name>>_Svr
Sequence <<Application>>_<<job_Name>>_Seq
Extract Job  (Multipart processes only)<<Application>>Src<<job_Name>>_<<Job Type (Prl or Svr)>>
Load (Multipart processes only)<<Application>>Load<<job_Name>>_<<Job Type (Prl or Svr)>>
File Generation/Output (Multipart processes only)<<Application>>_FG<<job_Name>><<Job Type (Prl or Svr)>>

Properties Naming Conventions

Parallel Shared Container <<Application>>_<<job_Name>>_Psc
Parameter <<Application>>_<<job_Name>>_Parm
Server Shared Container <<Application>>_<<job_Name>>_Ssc


Job Processing Stage Naming Conventions

Aggregator Agg_<<PrimaryFunction>>
Change Apply ChAp_<<PrimaryFunction>>
Change Capture ChCp_<<PrimaryFunction>>
Copy Cp_<<PrimaryFunction>>
Funnel Funl_<<PrimaryFunction>>
Join (Inner) InJn_<<PrimaryFunction>>
Join (Left Outer) LOJn_<<PrimaryFunction>>
Join (Right Outer) OJn_<<PrimaryFunction>>
Join (Full Outer) FOJn_<<PrimaryFunction>>
FTP EnterpriseFtp_<<PrimaryFunction>>
Lookup Lkp_<< Value Name or table Name>>
Merge Mrg_<<PrimaryFunction>>
Column ExportCExp_<<PrimaryFunction>>
Column ImportXImp_<<PrimaryFunction>>
Pivot Pivt_<<PrimaryFunction>>
Remove DuplicatesRmDp_<<PrimaryFunction>>
Row GeneratorRGen_<<Job_Name>>
Slowly Changing DimensionSCD__<<PrimaryFunction>>
Sort Srt_<<PrimaryFunction>>
Surrogate Key Generator SKey_<<PrimaryFunction>>

Links Naming Conventions

Reference (Lookup) Ln_Ref_<<Number or Additional descriptor, if needed to form a unique object name>>
Reject (Lookup, File, DB) Ln_Rej_<<Number or Additional descriptor, if needed to form a unique object name>>
Get (Shared Container) Ln_Get_<<Number or Additional descriptor, if needed to form a unique object name>>
Put (Shared Container) Ln_Put_<<Number or Additional descriptor, if needed to form a unique object name>>
Input Ln_In_<<Number or Additional descriptor, if needed to form a unique object name>>
Output Ln_Out_<<Number or Additional descriptor, if needed to form a unique object name>>
Delete Ln_Del_<<Number or Additional descriptor, if needed to forma a unique object name>>
Insert Ln_Ins_<<Number or Additional descriptor, if needed to form a unique object name>>
Update Ln_Upd_<<Number or Additional descriptor, if needed to form a unique object name>>

Database Stage Naming Conventions

RejectRej_<<Table or Object Name>>
Lookup Lkp_<<Table or Object Name>>
Source Src_<<Table or Object Name>>
Target Tgt_<<Table or objectName>>

Transformer Stage Naming Conventions

Loop VariableLv_<<objectName>>
Stage Variable SVSv_<<objectName>>
Transformer (native parallel)Tfm__<<objectName>>

File Stage Naming Conventions

Sequential File SF__<<objectName>>
Complex Flat File CFF__<<objectName>>
File Set FS__<<objectName>>
Parallel dataset DS__<<objectName>>
Lookup File Set LFS__<<objectName>>
External Source XSrc__<<objectName>>
External Target XTgt__<<objectName>>

XML & Real-Time Stages Naming Conventions

RTI InputRTIi_<<objectName>>
RTI OutputRTIo_<<objectName>>
XML InputXMLi_<<objectName>>
XML OutputXMLo_<<objectName>>
XML Transformer XMLt_<<objectName>>

Sequence Object Naming Conventions

Sequence<<Application>>_SEQ_<<Application job Stream Name>>
Master Control Sequence (parent)<<Application>>_SEQ_Master_<<Application job Stream Name>>

Sequence Stage Naming Conventions

Error Handler EH_<<PrimaryFunction>>
Job Activity Job_<<PrimaryFunction>>
Nested Condition NC_<<PrimaryFunction>>
Routine Activity Rtn_<<PrimaryFunction>>
Sequencer (All) SeqAll_<<Identifier>>
Sequencer (Any) SeqAny_<<Identifier>>
Notify Notify_<<PrimaryFunction>>
Terminator ActivityTA__<<PrimaryFunction>>
Sequence Links (Non-messages) Lnk_<<Number or Description>>
Sequence Links (messages) Msg_<<Number or Description>>

Related References

IBM InfoSphere DataStage Migration Checklist

Assuming that your InfoSphere instance has been installed and configured, here is a quick migration checklist to assist in making sure that you have performed the essential tasks.

Major TasksParent-TasksChild-taskCompletion Status
Create Migration Package   
 Create Database scripts  
 Export DataStage components  
 Gather support files  
 Compress migration package  
 Baseline migration package in CM Tool  
 Upload package to target environment  
Deploy Database Components   
 Backup target databases  
 Deploy database components  
 Resolve script errors  
 Create JDBC, ODBC,  and/or TNSNAMES entries  
 Install and Configure RDBMS client on Infosphere server  
 Load configuration and conversion data (if not loaded by ETL)  
Deploy Support Files   
 Create File Structures  
 Surrogate Key Files  
 System Administration Scripts  
 Job Scripts  
 Node Configuration Files  
Deploy DataStage Components   
 Create Project (if required)  
  Configure Project and/or Project Parameters (if required) 
 Import ETL’s into DataStage  
 Update Parameters and Parameter sets (if required)  
  File paths 
  Database names 
  Database credentials 
 Update job properties  
  File paths 
 Compile ETL using Multiple Job Compile  
 Resolve compilation errors  
Smoke Test   
Finalize CM Baseline