Infosphere Datastage – Standard Practice- Sequence Naming Conventions

Standards
Standards

Standard practices help you and other understand your work.  This can be very important when working on large teams, working across team boundaries, or when large complex sets of process and objects may be involved.  When you consider the importance of naming convention, when coupled with standard practice, the benefit should be obvious, but often practice doesn’t execute or document their conventions.  So, these standard naming conventions may help when none exist or you need to assemble your own naming conventions.

<<SomeIdentifier >> = should be replaced with appropriate information

  1. Sequence Object Naming Conventions
EntityConvention
Master Control Sequence (parent)Master_<<Application>>_ <<Application Job Stream Name>>_Seq
Sequence<<Application>>_<<job_Name>>_Seq
  1. Sequence Stage Naming Conventions
EntityConvention
End LoopEL__<<PrimaryFunction>>
Error Handler EH_<<PrimaryFunction>>
Execute CommandEC_<<PrimaryFunction>>
Job Activity Job_<<PrimaryFunction>>
Nested Condition NC_<<PrimaryFunction>>
Notify Notify_<<PrimaryFunction>>
Routine Activity Rtn_<<PrimaryFunction>>
Sequence Links (messages) Msg_<<Number or Description>>
Sequence Links (Non-messages) Lnk_<<Number or Description>>
SequencerSeqr_<<Number or Description>>
Sequencer (All) SeqAll_<<Identifier>>
Sequencer (Any) SeqAny_<<Identifier>>
Start LoopSL__<<PrimaryFunction>>
Terminator ActivityTA__<<PrimaryFunction>>
User VariablesUV__<<PrimaryPurpose>>
Wait For File WFF__<<PrimaryFunction>>

Related References

Infosphere Information Server (IIS) Commonly Used Parameters

Parameters are a very big key in, Infosphere Information Server (IIS), to process flexibility in sequences, DataStage jobs and DataQuality jobs.  Parameterization also help reduce development effort, reduce the number of jobs required, and reuse of jobs, by allowing construction of multi instance jobs, which are essentially reused code.

However, sometimes when starting a project, parameters need to created and doing so from memory, doesn’t always achieve the best results. So here is a quick starter list of parameters, which seem to be commonly encountered.  Hopefully, this list will aide in your parametrization efforts and setup.

TypePromptDescriptionExample
StringDS_DIRDataset Directory Path 
StringDS_LOG_DIRLog File Directory Path 
StringQUOTESQuotes
StringRECIPIENT_EMAILRecipient Email Address 
StringSENDER_EMAILSenders Email Address 
StringSMTP_SERVERSMTP Mail Server Name 
StringSQL_DIRSQL File Directory path 
StringDATE_OFFSETDate Offset Number1
StringDS_ENVIRONMENTDatastage EnvironmentPROD
StringSRC_DIRSource Files Directory 
StringSRC_KEY_GEN_DIRSource Key Generator Files Directory Path 
StringSRC_REJ_DIRSource Reject Files Directory Path 
StringWRK_DIRWorking Directory Path 
StringSRC_TABLESource Table Name 
StringDB_SCHEMADatabase Schema Name 
StringTGT_TBLTarget Table Name 
StringPROC_DTERun Control or processing date 
StringCURR_DTECurrent Date 

Related References

IBM InfoSphere DataStage – Parallel Environment Variables

Most of this list of Parallel Environment Variables can be found in the IBM InfoSphere DataStage, Version 11.5 documentation.  However, I have started to find variables,  which I use and are not included in the IBM list.  So, for simplicity, I will make additions and clarifications to the IBM list, as I run across them, on this page.


Performance Tuning

These environment variable are frequently used in tuning Datastage performance.

  • APT_BUFFER_FREE_RUN (See, also, Buffering)
  • APT_BUFFER_MAXIMUM_MEMORY (See, also,  Buffering)
  • APT_COMPRESS_BOUNDED_FIELDS
  • APT_FILE_IMPORT_BUFFER_SIZE  (See, also, Reading and Writing Files)
  • APT_FILE_EXPORT_BUFFER_SIZE (See, also, Reading and Writing Files)
    TMPDIR (This variable also specifies the directory for Netezza log files on all operating systems.  Setting TMDIR paths is a Netezza best practice.)


Buffering

These environment variable are all concerned with the buffering InfoSphere DataStage performs on stage links to avoid deadlock situations

  • APT_BUFFER_FREE_RUN
  • APT_BUFFER_MAXIMUM_MEMORY
  • APT_BUFFER_MAXIMUM_TIMEOUT
  • APT_BUFFER_DISK_WRITE_INCREMENT
  • APT_BUFFERING_POLICY
  • APT_DISABLE_ROOT_FORKJOIN


Building Custom Stages

These environment variables are concerned with the building of custom operators that form the basis of customized stages.

  • APT_BUILDOP_SOURCE_CHARSET
  • APT_SINGLE_CHAR_CASE
  • DS_OPERATOR_BUILDOP_DIR
  • DS_OPERATOR_WRAPPED_DIR
  • OSH_BUILDOP_CODE
  • OSH_BUILDOP_HEADER
  • OSH_BUILDOP_NO_OPTIMIZE
  • OSH_BUILDOP_OBJECT
  • OSH_BUILDOP_WRAPPER
  • OSH_BUILDOP_XLC_BIN
  • OSH_CBUILDOP_XLC_BIN
  • OSH_STDOUT_MSG


Compiler

These environment variables specify details about the C++ compiler used by InfoSphere DataStage in connection with parallel jobs.

  • APT_COMPILER
  • APT_COMPILEOPT
  • APT_LINKER
  • APT_LINKOPT

DB2 Support

These environment variables are concerned with setting up access to DB2 databases from InfoSphere DataStage.

  • APT_DB2INSTANCE_HOME
  • APT_DB2READ_LOCK_TABLE
  • APT_DBNAME
  • APT_DEBUG_DB2
  • APT_RDBMS_COMMIT_ROWS
  • APT_TIME_ALLOW_24
  • DB2DBDFT

Debugging

These environment variables are concerned with the debugging of InfoSphere DataStage parallel jobs.

  • APT_DEBUG_OPERATOR
  • APT_DEBUG_MODULE_NAMES
  • APT_DEBUG_PARTITION
  • APT_DEBUG_SIGNALS
  • APT_DEBUG_STEP
  • APT_DEBUG_SUBPROC
  • APT_EXECUTION_MODE
  • APT_NO_PM_SIGNAL_HANDLERS
  • APT_PM_DBX
  • APT_PM_SHOW_PIDS
  • APT_PM_XTERM
  • APT_PXDEBUGGER_FORCE_SEQUENTIAL
  • APT_SHOW_LIBLOAD
  • DS_OSH_WRAPPER_DEBUG_CONNECT
  • DS_OSH_WRAPPER_TIMEOUT
  • DS_PXDEBUG

Decimal Support

These environment variables are concerned with support for decimal columns.

  • APT_DECIMAL_INTERM_PRECISION
  • APT_DECIMAL_INTERM_SCALE
  • APT_DECIMAL_INTERM_ROUNDMODE
  • DS_USECOLPREC_FOR_DECIMAL_KEY

Disk I/O

These environment variables are all concerned with when and how InfoSphere DataStage parallel jobs write information to disk.

  • APT_BUFFER_DISK_WRITE_INCREMENT
  • APT_EXPORT_FLUSH_COUNT
  • APT_IO_MAP/APT_IO_NOMAP and APT_BUFFERIO_MAP/APT_BUFFERIO_NOMAP
  • APT_PHYSICAL_DATASET_BLOCK_SIZE

General Job Administration

These environment variables are concerned with details about the running of InfoSphere DataStage and IBM InfoSphere QualityStage® parallel jobs

  • APT_CLOBBER_OUTPUT
  • APT_CONFIG_FILE
  • APT_DISABLE_COMBINATION
  • APT_DONT_COMPRESS_BOUNDED_FIELDS
  • APT_EXECUTION_MODE
  • APT_FILE_EXPORT_ADD_BOM
  • APT_IMPORT_FORCE_QUOTE_DELIM
  • APT_ORCHHOME
  • APT_STARTUP_SCRIPT
  • APT_NO_STARTUP_SCRIPT
  • APT_STARTUP_STATUS
  • DSForceTerminate
  • DSIPC_OPEN_TIMEOUT
  • DSJOB_DOMAIN
  • DSWaitShutdown
  • DS_FORCE_ABORT_AT_WARN_LIMIT
  • DS_LOG_AUTOPURGE_IGNORE_STATUS

Job Monitoring

These environment variables are concerned with the Job Monitor on InfoSphere DataStage.

  • APT_MONITOR_SIZE
  • APT_MONITOR_MINTIME
  • APT_MONITOR_TIME
  • APT_NO_JOBMON
  • APT_PERFORMANCE_DATA

Lookup support

This environment variable is concerned with lookup tables.

  • APT_LUTCREATE_NO_MMAP

Miscellaneous

These environment variables do not fit into the other categories.

  • APT_AGGREGATOR_NULLABLE_OUTPUT
  • APT_COPY_TRANSFORM_OPERATOR
  • APT_DATASET_FLUSH_NOFSYNC
  • APT_DATASET_FLUSH_NOSYNC
  • APT_DATE_CENTURY_BREAK_YEAR
  • APT_DATE_ADD_ROLLOVER
  • APT_DISABLE_FASTALLOC
  • APT_DONT_OPTIMIZE_MODIFY
  • APT_EBCDIC_VERSION
  • APT_FILE_EXPORT_DEFAULTS_TO_CONDUCTOR
  • APT_FIFO_DIRECTORY
  • APT_IMPEXP_ALLOW_ZERO_LENGTH_FIXED_NULL
  • APT_IMPORT_REJECT_INVALID_CHARS
  • APT_IMPORT_REJECT_STRING_FIELD_OVERRUNS
  • APT_INSERT_COPY_BEFORE_MODIFY
  • APT_ISVALID_BACKCOMPAT
  • APT_OLD_BOUNDED_LENGTH
  • APT_OLD_CUTOFF
  • APT_OUTPUT_LOCALE
  • APT_OPERATOR_REGISTRY_PATH
  • APT_OSL_PARAM_ESC_SQUOTE
  • APT_OSL_RESTORE_BACKSLASH
  • APT_PLAYERS_REPORT_IN
  • APT_PM_ACCEPT_CONNECTION_RETRIES
  • APT_PM_ACCEPT_CONNECTION_TIMEOUT
  • APT_PM_NO_SHARED_MEMORY
  • APT_PM_NO_NAMED_PIPES
  • APT_PM_SCORE_DIR
  • APT_PM_STARTUP_CONCURRENCY
  • APT_RECORD_COUNTS
  • APT_RESPATH
  • APT_SHOW_COMPONENT_CALLS
  • APT_STACK_TRACE
  • APT_SURRKEY_BLOCK_WRITE
  • APT_SURRKEY_LOCKSTATE_RETRIES
  • APT_THREAD_SAFE_FAST_ALLOC
  • APT_TRANSFORM_ABORT_ON_CONVERSION_ERROR
  • APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING
  • APT_TRANSFORM_LOOP_WARNING_THRESHOLD
  • APT_TRANSFORM_OPERATOR_DEBUG
  • APT_USE_CRLF
  • APT_VIEWDATA_TEMP_DIR
  • DSAttLockWait
  • DSE_SLAVE_CLOSE_SOCKET_ON_EXEC
  • DSJobStartedMax
  • DSODBC_EXECUTE_MULTI_STMTS_AS_ONE
  • DSODBC_FATAL_ERRORS
  • DSODBC_NEW_FVMARKERS
  • DSODBC_NO_BIGINT_WARNINGS
  • DSODBC_NO_DB2_DELETE_WARNINGS
  • DSODBC_NO_METADATA_WARNINGS
  • DSR_RECORD_DELETE_BEFORE_WRITERAW
  • DSSocketNotifyTimeout
  • DSWaitResetStartup
  • DSWaitStartup
  • DS_API_DEBUG
  • DS_CHANGE_SEQ_PUT_BEHAVIOR
  • DS_EXECUTE_NO_MASKING
  • DS_HOSTNAME_ALIAS
  • DS_IPCPUT_OLD_TIMEOUT_BEHAVIOR
  • DS_LEGACY_AUDIT
  • DS_LOGDETAIL_INVOCATION_OLDFORM
  • DS_LONG_JOBSTATUS_LOCK
  • DS_MAX_PREC_FOR_FLOATS
  • DS_MMAPPATH
  • DS_MMAPSIZE
  • DS_NO_INSTANCE_PURGING
  • DS_OPTIMIZE_FILE_BROWSE
  • DS_SEQ_BYPASS_CHOWN
  • DS_STATUSSTARTED_CHECK
  • DS_TRX_ALLOW_LINKVARIABLES
  • OSH_PRELOAD_LIBS
  • PX_DBCONNECTHOME
  • DS_USE_OLD_STATUS_PURGE
  • DS_USE_OSHSCRIPT
  • DS_USE_SERVER_AUTH_ONLY
  • ODBCBindingOrder
  • ODBCstripCRLF
  • OSHMON_INIT_RETRY
  • OSHMON_TRACE

Network

These environment variables are concerned with the operation of InfoSphere DataStage parallel jobs over a network.

  • APT_DEBUG_ENVIRONMENT
  • APT_PM_ENV_DEBUG
  • APT_PM_ENV_NOCLOBBER
  • APT_IO_MAXIMUM_OUTSTANDING
  • APT_IOMGR_CONNECT_ATTEMPTS
  • APT_PLAYER_CONNECTION_PORT
  • APT_PM_CONDUCTOR_TIMEOUT
  • APT_PM_CONDUCTOR_HOSTNAME
  • APT_PM_CONNECT_USING_NAME
  • APT_PM_NO_TCPIP
  • APT_PM_NODE_TIMEOUT
  • APT_PM_PLAYER_TIMEOUT
  • APT_PM_SHOWRSH
  • APT_PM_STARTUP_PORT
  • APT_RECVBUFSIZE
  • APT_USE_IPV4

National Language Support (NLS)

These environment variables are concerned with InfoSphere DataStage’s implementation of NLS.

  • APT_COLLATION_SEQUENCE
  • APT_COLLATION_STRENGTH
  • APT_ENGLISH_MESSAGES
  • APT_IMPEXP_CHARSET
  • APT_INPUT_CHARSET
  • APT_OS_CHARSET
  • APT_OUTPUT_CHARSET
  • APT_STRING_CHARSET

Oracle Support

These environment variables are concerned with the interaction between InfoSphere DataStage and Oracle databases.

  • APT_ORACLE_LOAD_OPTIONS
  • APT_ORACLE_NO_OPS
  • APT_ORACLE_PRESERVE_BLANKS
  • APT_ORA_IGNORE_CONFIG_FILE_PARALLELISM
  • APT_ORA_WRITE_FILES
  • APT_ORAUPSERT_COMMIT_ROW_INTERVAL
  • APT_ORAUPSERT_COMMIT_TIME_INTERVAL
  • ODBCKeepSemicolon

Partitioning

The following environment variables are concerned with how InfoSphere DataStage automatically partitions data.

  • APT_NO_PART_INSERTION
  • APT_NO_PARTSORT_OPTIMIZATION
  • APT_PARTITION_COUNT
  • APT_PARTITION_NUMBER

Reading and Writing Files

These environment variables are concerned with reading and writing files.

  • APT_DELIMITED_READ_SIZE
  • APT_FILE_IMPORT_BUFFER_SIZE
  • APT_FILE_EXPORT_BUFFER_SIZE
  • APT_IMPORT_FILE_PATTERN_CMD
  • APT_IMPORT_HANDLE_SHORT
  • APT_IMPORT_PATTERN_USES_CAT
  • APT_IMPORT_PATTERN_USES_FILESET_MOUNTED
  • APT_MAX_DELIMITED_READ_SIZE
  • APT_STRING_PADCHAR

Reporting

These environment variables are concerned with various aspects of InfoSphere DataStage jobs reporting their progress.

  • APT_DUMP_SCORE
  • APT_ERROR_CONFIGURATION
  • APT_MSG_FILELINE
  • APT_PM_PLAYER_MEMORY
  • APT_PM_PLAYER_TIMING
  • APT_RECORD_COUNTS
  • OSH_DUMP
  • OSH_ECHO
  • OSH_EXPLAIN
  • OSH_PRINT_SCHEMAS

SAS Support

These environment variables are concerned with InfoSphere DataStage interaction with SAS.

  • APT_HASH_TO_SASHASH
  • APT_NO_SASOUT_INSERT
  • APT_NO_SAS_TRANSFORMS
  • APT_SAS_ACCEPT_ERROR
  • APT_SAS_CHARSET
  • APT_SAS_CHARSET_ABORT
  • APT_SAS_COMMAND
  • APT_SASINT_COMMAND
  • APT_SAS_DEBUG
  • APT_SAS_DEBUG_IO
  • APT_SAS_DEBUG_LEVEL
  • APT_SAS_DEBUG_VERBOSE
  • APT_SAS_S_ARGUMENT
  • APT_SAS_NO_PSDS_USTRING
  • APT_SAS_SCHEMASOURCE_DUMP
  • APT_SAS_SHOW_INFO
  • APT_SAS_TRUNCATION

Sorting

The following environment variables are concerned with how InfoSphere DataStage automatically sorts data.

  • APT_NO_SORT_INSERTION
  • APT_SORT_INSERTION_CHECK_ONLY
  • APT_TSORT_NO_OPTIMIZE_BOUNDED
  • APT_TSORT_STRESS_BLOCKSIZE

Teradata Support

The following environment variables are concerned with InfoSphere DataStage interaction with Teradata databases.

  • APT_TERA_64K_BUFFERS
  • APT_TERA_NO_ERR_CLEANUP
  • APT_TERA_NO_PERM_CHECKS
  • APT_TERA_NO_SQL_CONVERSION
  • APT_TERA_SYNC_DATABASE
  • APT_TERA_SYNC_PASSWORD
  • APT_TERA_SYNC_USER

Transport Blocks

The following environment variables are all concerned with the block size used for the internal transfer of data as jobs run. Some of the settings only apply to fixed length records.

  • APT_AUTO_TRANSPORT_BLOCK_SIZE
  • APT_LATENCY_COEFFICIENT
  • APT_DEFAULT_TRANSPORT_BLOCK_SIZE
  • APT_MAX_TRANSPORT_BLOCK_SIZE
  • APT_MIN_TRANSPORT_BLOCK_SIZE

Related References

TMPDIR

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Connecting to data sources, Databases, Netezza Performance Server, Environment variables: Netezza connector, TMPDIR

Environment Variables

InfoSphere Information Server, InfoSphere Information Server 11.5.0, InfoSphere DataStage and QualityStage, Reference, Parallel Job Reference, Environment Variables

Environment Variables For The Parallel Engine

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Installing, Installing IBM InfoSphere Information Server software, Configuring software, Configuring a parallel processing environment, Setting environment variables for the parallel engine, Environment variables for the parallel engine