Infosphere DataStage – Boolean Handling for Netezza

Beware when you see this message when working with Boolean in DataStage, the message displays as informational (at list it did for me) not as a warning or an error.  Even though it seems innocuous, what it meant for my job, was the Boolean (‘true’ / ‘false’) was not being interpreted and everything posted to ‘false’.

In DataStage the Netezza ‘Boolean’ field/Data SQL type maps to the ‘Bit’ SQL type, which expects a numeric input of Zero (0) or one (1).  So, my solution (once I detected the problem during unit testing) was to put Transformer Stage logic in place to convert the Boolean input to the expected number value.

Netezza to Datastage Data Type Mapping

Netezza data types

InfoSphere DataStage

data types (SQL types)

Expected Input value

BOOLEANBit0 or 1 (1 = true, 0 = false)

Transformer Stage logic Boolean Handling Logic

A Netezza Boolean field can store: true values, false values, and null. So, some thought should be given to you desired data outcome for nulls

This first example sets a that the nulls are set to a specific value, which can support a specific business rule for null handling and, also, provide null handling for non-nullable fields.  Here we are setting nulls to the numeric value for ‘true’ and all other non-true inputs to ‘false’.

If isnull(Lnk_Src_In.USER_ACTIVE) then 1 Else if Lnk_Src_In.USER_ACTIVE = ‘true’ Then 1 Else 0

These second examples sets a that the nulls are set by the Else value, if your logic direction is correct value and still provides null handling for non-nullable fields.

  • If  Lnk_Src_In.USER_ACTIVE = ‘true’ Then 1 Else 0

  • If  Lnk_Src_In.USER_ACTIVE = ‘False’ Then 0 Else 1

Director Log Message

Message ID

  • IIS-DSEE-TBLD-00008

Message Text

  • <<Link Name Where Message Occurred>>: Numeric string expected. Use default value.

Or something like this:

  • <<Link Name Where Message Occurred>>: Numeric string expected for input column ‘<<Field Name Here>>‘. Use default value.

Related References

Boolean

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza user-defined functions, UDX data types reference information, Supported data types, Boolean

https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.udf.doc/r_udf_boolean_datatype.html

Data types and aliases

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza stored procedures, NZPLSQL statements and grammar, Variables and constants, Data types and aliases

https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.sproc.doc/c_sproc_data_types_aliases.html

Logical data types

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza database user documentation, Netezza SQL basics, Data types, Logical data types

https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.dbu.doc/r_dbuser_data_types_logical.html

Data type conversions from Netezza to DataStage

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Connecting to data sources, Databases, Netezza Performance Server, Netezza connector, Designing jobs by using the Netezza connector, Defining a Netezza connector job, Data type conversions, Data type conversions from Netezza to DataStage

https://www.ibm.com/support/knowledgecenter/en/SSZJPZ_11.5.0/com.ibm.swg.im.iis.conn.netezza.use.doc/topics/nzcc_mappingdatatypes.html

Infosphere Datastage – Data type conversions from Netezza to DataStage

Some recent research to eliminate some ETL Data Conversion issues, made me want to make an enhancement to the documentation provided by IBM, to prevent repeating the research.

Netezza data types and their equivalent InfoSphere DataStage data types

Netezza data types

InfoSphere DataStage data types (SQL types)

Notes

 
BYTEINTTinyInt  
SMALLINTSmallInt  
INTInteger  
BIGINTBigInt  
NUMERIC(p, s)Numeric, decimal, doubleDecimal and double are aliases of Numeric 
FLOAT(p)Float  
REALReal  
DOUBLE PRECISIONDouble  
CHAR(n)Char  
VARCHAR(n)VarChar  
NCHAR(n)NChar  
NVARCHAR(n)NVarChar  
BOOLEANBit  
DATEDate  
TIMETime  
TIME WITH TIME ZONEVarCharThe value of time with time zone will be returned without the time zone information. 
TIMESTAMPTimeStamp  
INTERVALVarCharYou cannot load the interval data type from an external table. 
ROWIDBigInt  
TRANSACTION IDBigInt  
DATASLICEInteger  

Related References

Data type conversions from Netezza to DataStage

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Connecting to data sources, Databases, Netezza Performance Server, Netezza connector, Designing jobs by using the Netezza connector, Defining a Netezza connector job, Data type conversions, Data type conversions from Netezza to DataStage

Netezza / PureData – What is the maximum length of CHAR, VARCHAR, and NVARCHAR fields

During a recent project, I had reason to investigate what the maximum character field links for Netezza to be able to fit the data ETL ‘ed from the source.  The constraints on Netezza character fields are not as long as some other databases, therefore, I thought I would document and a quick post for future reference.   Knowing these limits will help with your ETL design and construction but you may know where to truncate your field input data.

Data typeAlias namesInfoSphere DataStage data types (SQL type)Maximum LengthNotes
CHARCHARACTER, CHAR(n), CHARACTER(n)Char64,000Fixed-length character string, blank padded to length n. If you do not specify n, the default is an unsized CHAR value.
VARCHARCHARACTER VARYING, VARCHAR(n), CHARACTER VARYING(n), CHAR VARYING(n)VarChar, LongVarChar64,000Variable length character string to a maximum length of n. If you do not specify n, the default is an unsized VARCHAR value. There is no blank padding, and the value is stored as entered.
NCHARNATIONAL CHARACTER, NATIONAL CHAR(n), NCHAR(size)NChar, Char,16,000Fixed-length character string, blank padded to length n. If you do not specify n, the default is an unsized NCHAR value.
NVARCHARNATIONAL CHARACTER VARYING, NATIONAL CHAR VARYING, NATIONAL CHAR VARYING(n), NATIONAL CHARACTER VARYING(n), and NVARCHAR(n)VarChar, NVarChar ,LongVarChar, LongNVarChar, VarChar16,000Variable length character string to a maximum length of n. If you do not specify n, the default is an unsized NVARCHAR value.

Related References

Data types and aliases

PureData System for Analytics ,PureData System for Analytics 7.2.1, IBM Netezza stored procedures, NZPLSQL statements and grammar, Variables and constants, Data types and aliases

Data type conversions from DataStage to Netezza

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Connecting to data sources, Databases, Netezza Performance Server, Netezza connector, Designing jobs by using the Netezza connector, Defining a Netezza connector job, Data type conversions, Data type conversions from DataStage to Netezza

Netezza Connector Stage, Table name required warning for User-defined SQL Write mode

Recently, while working at a customer site and I encountered an anomaly in the Netezza Connector stage, when choosing ‘User-defined SQL’ write mode, the ‘Table name’ displays a caution/warning even though a table name should not be required.  If you are using a user-defined SQL statement and/or have parametrized your SQL scripts to make the job reusable, each SQL and/or SQL script would have its own schema and table name being passed in.  After some investigation, a workaround was found, which both allows you to populate table name and leverage with different schema and table names within your SQL statement and/or.

Table Name, User-defined SQL, Warning

You will notice, in a screenshot below the ‘User-defined SQL’, ‘write mode’, the property has been chosen, a parameter has been placed in the ‘User-defined SQL’ property, and ‘Read user-defined SQL from a file’ property has been set to ‘Yes’.  However, the yellow triangle displays on the ‘Table name’ property marking it as a required item.  This, also, occurs when placing SQL statements in the User-defined SQL property, whether reading from a file or not.

Table Name, User-defined SQL, Warning Workaround

After some experimentation, the workaround is straight forward enough.  Basically, give the ‘table name’ property something to read successfully, so it can move on to the user-defined SQL and/or user-defined SQL file script, which the process actually needs to execute. In the screenshot below, the SYSTEM.DEFINITION_SCHEMA._V_DUAL view was used, so, it could be found, then the script file passed by the parameter runs fine.  Another view or table, which the DataStage user has access to, should just as well.

Related References