Database – What is a foreign key?

Acronyms, Abbreviations, Terms, And Definitions, DDL (Data Definition Language), What is a foreign key
Acronyms, Abbreviations, Terms, And Definitions

Definition of a Foreign Key

  • A foreign Key (FK) is a constraint that references the unique primary key (PK) of another table.

Facts About Foreign Keys

  • Foreign Keys act as a cross-reference between tables linking the foreign key (Child record) to the Primary key (parent record) of another table, which establishing a link/relationship between the table keys
  • Foreign keys are not enforced by all RDBMS
  • The concept of referential integrity is derived from foreign key theory
  • Because Foreign keys involve more than one table relationship, their implementation can be more complex than primary keys
  • A foreign-key constraint implicitly defines an index on the foreign-key column(s) in the child table, however, manually defining a matching index may improve join performance in some database
  • The SQL, normally, provides the following referential integrity actions for deletions, when enforcing foreign-keys

Cascade

  • The deletion of a parent (primary key) record may cause the deletion of corresponding foreign-key records.

No Action

  • Forbids the deletion of a parent (primary key) record, if there are dependent foreign-key records.   No Action does not mean to suppress the foreign-key constraint.

Set null

  • The deletion of a parent (primary key) record causes the corresponding foreign-key to be set to null.

Set default

  • The deletion of a record causes the corresponding foreign-keys be set to a default value instead of null upon deletion of a parent (primary key) record

Related References

Database – What is a Composite Primary Key?

Database Table
Database Table

What is a Composite Primary Key?

A Composite Primary key is Primary key What a primary key, which is defined by having multiple fields (columns) in it.  Like a Primary Key what a composite Primary Key is depends on the database.  Essentially a Composite Primary Key:

  • Is a combination of Fields (columns) which uniquely identifies every row.
  • Is an index in database systems which use indexes for optimization
  • Is a type of table constraint
  • Is applied with a data definition language (DDL) alter command
  • And may define parent-Child relationship between tables

Related References

Database – What is a Primary Key?

Database Table
Database Table

What is a primary Key?

What a primary key is depends, somewhat, on the database.  However, in its simplest form a primary key:

  • Is a field (Column) or combination of Fields (columns) which uniquely identifies every row.
  • Is an index in database systems which use indexes for optimization
  • Is a type of table constraint
  • Is applied with a data definition language (DDL) alter command
  • And, depending on the data model can, define parent-Child relationship between tables

Related References

Data Modeling – Dimension Table Effective Practices

Database Table
Database Table

I’ve had these notes laying around for a while, so, I thought I consolidate them here.   So, here are few guidelines to ensure the quality of your dimension table structures.

Dimension Table Effective Practices

  • The table naming convention should identify it as a dimension table. For example:
    • Suffix Pattern:
      • <<TableName>>_Dim
      • <<TableName>>_D
    • Prefix Pattern:
      • Dim_<TableName>>
      • D_<TableName>>
  • Have Primary Key (PK) assigned on table surrogate Key
  • Audit fields – Type 1 dimensions should:
    • Have a Created Date timestamp – When the record was initially created
    • have a Last Update Timestamp – When was the record last updated
  • Job Flow: Do not place the dimension processing in the fact jobs.
  • Every Dimension should have a Zero (0), Unknown, row
  • Fields should be ‘NOT NULL’ replacing nulls with a zero (0) numeric and integer type fields or space ( ‘ ‘ ) for Character type files.
  • Keep dimension processing outside of the fact jobs

Related References

Infosphere DataStage – Designer Client Repository Structure

Default Repository Structure

When a project is created, there is a default repository structure created for use in the DataStage designer client.

However, some additional organization will be required for most DataStage projects.  Usually, this organization occurs in in these areas:

  • Addition of structure within the “Jobs” folder
  • Addition of a “Parameter Sets” folder
  • Addition of structure within the “Table Definitions” folder
  • Addition of a “Developer Work Area” folder

Repository Structure within the “Jobs” folder

Below is a sample of a folder structure for multiple applications that share a common Repository.  Pattern includes, but does not illustrate all other delivered folders. In addition to the core folder structure, developers can create individual working, test, and in progress folders, which do not migrate, but keep work segregated.

Parameter Sets Folders

The parameter set folders or for two sets of information.

  • First, are the database parameters, which include data connections and the attached parameter sets.
  • The second, for job parameters, which may include parameter sets, for things like e-mail parameters, surrogate key file paths, etc.; which is a best practice, rather creating them as project level parameters.

Table Definitions

The Tables Definition folder have folders added to segregate the imported meta data for source and target system and, in some case, may need folders to logically organize imported meta which may reside within the same database and/or schema, but belong to different logical layer.

Infosphere Datastage – Data type conversions from Netezza to DataStage

Some recent research to eliminate some ETL Data Conversion issues, made me want to make an enhancement to the documentation provided by IBM, to prevent repeating the research.

Netezza data types and their equivalent InfoSphere DataStage data types

Netezza data types

InfoSphere DataStage data types (SQL types)

Notes

 
BYTEINTTinyInt  
SMALLINTSmallInt  
INTInteger  
BIGINTBigInt  
NUMERIC(p, s)Numeric, decimal, doubleDecimal and double are aliases of Numeric 
FLOAT(p)Float  
REALReal  
DOUBLE PRECISIONDouble  
CHAR(n)Char  
VARCHAR(n)VarChar  
NCHAR(n)NChar  
NVARCHAR(n)NVarChar  
BOOLEANBit  
DATEDate  
TIMETime  
TIME WITH TIME ZONEVarCharThe value of time with time zone will be returned without the time zone information. 
TIMESTAMPTimeStamp  
INTERVALVarCharYou cannot load the interval data type from an external table. 
ROWIDBigInt  
TRANSACTION IDBigInt  
DATASLICEInteger  

Related References

Data type conversions from Netezza to DataStage

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Connecting to data sources, Databases, Netezza Performance Server, Netezza connector, Designing jobs by using the Netezza connector, Defining a Netezza connector job, Data type conversions, Data type conversions from Netezza to DataStage