*DataStage*DSR_SELECT (Action=3); check DataStage is set up correctly in project

Having encountered this DataStage client error in Linux a few times recently, I thought I would document the solution, which has worked for me.

Error Message:

Error calling subroutine: *DataStage*DSR_SELECT (Action=3); check DataStage is set up correctly in project

(Subroutine failed to complete successfully (30107))

Probable Cause of Error

  • NodeAgents has stopped running
  • Insufficient /temp disk space

Triage Approach

To fix this error in Linux:

  • Ensure disk space is available and you may want clean up the /tmp directory of any excel non-required files.
  • Start the NodeAgents.sh, if it is not running

Command to verify Node Agent is running

ps -ef | grep java | grep Agent

Command to Start Node Agent

This example command assumes the shell script is in its normal location, if not you will need to adjust the path.

/opt/IBM/InformationServer/ASBNode/bin/NodeAgents.sh start

Node Agent Logs

These logs may be helpful:

  • asbagent_startup.err
  • asbagent_startup.out

Node Agent Logs Location

This command will get you to where the logs are normally located:

cd /opt/IBM/InformationServer/ASBNode/

Netezza – [SQLCODE=HY000][Native=46] ERROR: External Table : count of bad input rows reached maxerrors limit

While helping a customer we encountered the [SQLCODE=HY000][Native=46] ERROR, which was a new one for me. So here are a few notes to help the next unlucky soul may run into the error.

Netezza Error Reason:

  • [SQLCODE=HY008][Native=51] Operation canceled; [SQLCODE=HY000][Native=46] ERROR: External Table : count of bad input rows reached maxerrors limit

What Does the Error Mean

  • In a nutshell, it means invalid data was submitted and could not be inserted.

What To Do

  • Basically, you need to go to the Netezza logs to see why the rows were rejected and resolve input data error, then resubmit your transactions. The logs are temporary and reused, so, you need to get to them before they are overwritten.

Where Are The Data Logs

  • In linux the logs can be found in /tmp:

For nzload Methods Logs

  • /tmp/database name.table name.nzlog
  • /tmp/database name.table name.nzbad

For External Table Load Logs

  • /tmp/external table name.log
  • /tmp/external table name.bad

Related References

What is the URL for InfoSphere Information Analyzer

This is one of those things, which usually comes up during new installs.   To access the Infosphere information analyzer thin client there are two approaches. First, is via the Infosphere launchpad page and, the second is to go directly to the information analyzer page. Both URLs are provided in below.

InfoSphere LaunchPad URL

https:// <<Host_Server>> : 9443/ibm/iis/launchpad/

Information Analyzer URL 

https:// <<Host_Server>> : 9443/ibm/iis/dq/da/login.jsp

Related References

InfoSphere Datastage – How to Improve Sequential File Performance Using Parallel Environment Variables

While extensive use of sequential files is not best practice, sometimes there is no way around it, due to legacy systems and/or existing processes. However, recently, I have encountered a number of customers who are seeing significant performance issues with sequential file intensive processes. Sometimes it’s the job design, but often when you look at the project configuration they still have the default values. This is a quick and easy thing to check and to adjust to get a quick performance win if they’ve not already been adjusted. These are delivered variables, but should seriously be considered for adjustment in nearly all data stage ETL projects. The adjustment must be based on the amount of available memory, the volume of workload that is sequential file intensive, and the environment you’re working in. Some experiential adjustment may be required, but I have provided a few recommendations below.

Environment Variable Properties

Category NameTypeParameter NamePromptSizeDefault Value
Parallel > Operator SpecificStringAPT_FILE_EXPORT_BUFFER_SIZESequential write buffer sizeAdjustable in 8 KB units.  Recommended values for Dev: 2048; Test & Prod: 4096.128
Parallel > Operator SpecificStringAPT_FILE_IMPORT_BUFFER_SIZESequential read buffer sizeAdjustable in 8 KB units.  Recommended values for Dev: 2048; Test & Prod: 4096.128

Related References

InfoSphere DataStage – How to calculate age in a transformer

Occasionally, there is a need to calculate the between two dates for any number of reasons. For example, the age of a person, of an asset, age of an event.  So, having recently had to think about how to do this in a DataStage Transformer, rather in SQL, I thought it might be good to document a couple of approaches, which can provide the age.  This code does it at the year level, however, if you need the decimal digits or other handling them the rounding within the DecimalToDecimal function can be changed accordingly.

Age Calculation using Julian Date

DecimalToDecimal((JulianDayFromDate(<>) – JulianDayFromDate(Lnk_In_Tfm.PROCESSING_DT) )/365.25, ‘trunc_zero’)

Age Calculation using Julian Date with Null Handling

If a date can be missing from you source input data, then null handling is recommended to prevent job failure.  This code uses 1901-01-01 as the null replacement values, but it can be any date your business requirement stipulates.

DecimalToDecimal((JulianDayFromDate( NullToValue(<>, StringToDate(‘1901-01-01’,”%yyyy-%mm-%dd”) ) )  – JulianDayFromDate(Lnk_In_Tfm.PROCESSING_DT)) /365.25, ‘trunc_zero’)

Calculate Age Using DaysSinceFromDate

DecimalToDecimal(DaysSinceFromDate(<>, <>) /365.25 , ‘trunc_zero’)

Calculate Age Using DaysSinceFromDate with Null Handling

Here is a second example of null handling being applied to the input data.

DecimalToDecimal(DaysSinceFromDate(<>, NullToValue(<< Input date (e.g.Date of Birth) >>, StringToDate(‘1901-01-01’,”%yyyy-%mm-%dd”) ) ) /365.25 , ‘trunc_zero’)

Netezza Connector Stage, Table name required warning for User-defined SQL Write mode

Recently, while working at a customer site and I encountered an anomaly in the Netezza Connector stage, when choosing ‘User-defined SQL’ write mode, the ‘Table name’ displays a caution/warning even though a table name should not be required.  If you are using a user-defined SQL statement and/or have parametrized your SQL scripts to make the job reusable, each SQL and/or SQL script would have its own schema and table name being passed in.  After some investigation, a workaround was found, which both allows you to populate table name and leverage with different schema and table names within your SQL statement and/or.

Table Name, User-defined SQL, Warning

You will notice, in a screenshot below the ‘User-defined SQL’, ‘write mode’, the property has been chosen, a parameter has been placed in the ‘User-defined SQL’ property, and ‘Read user-defined SQL from a file’ property has been set to ‘Yes’.  However, the yellow triangle displays on the ‘Table name’ property marking it as a required item.  This, also, occurs when placing SQL statements in the User-defined SQL property, whether reading from a file or not.

Table Name, User-defined SQL, Warning Workaround

After some experimentation, the workaround is straight forward enough.  Basically, give the ‘table name’ property something to read successfully, so it can move on to the user-defined SQL and/or user-defined SQL file script, which the process actually needs to execute. In the screenshot below, the SYSTEM.DEFINITION_SCHEMA._V_DUAL view was used, so, it could be found, then the script file passed by the parameter runs fine.  Another view or table, which the DataStage user has access to, should just as well.

Related References