InfoSphere DataStage – DataStage Parallel Job Peer Code Review Checklist Template

Peer code review happens during the development phase and focuses on the overall quality and compliance to standards of code and configuration artifacts. However, the hard part of performing a Peer code review isn’t, performing the review, but rather to achieving consistency and thoroughness in the review.   This is where a checklist can contribute significantly, providing a list of things to check and providing a relative weight for the findings.  I hope this template assists with your DataStage job review process.

What is a Peer Review?

A peer review is an examination of a Software Development Life Cycle (SDLC) work product by team members, other than the work Product’s author, to identify defects, omissions, and compliance to standards.  This process provides an opportunity for quality assurance, knowledge sharing, and product improvement early during the SDLC life cycle.

Migration Path Environments for Large Organizations

After describing the migration path environments for different customers, I thought it might be time to define some migration paths and their associated environments.  This is a migration environment pattern I have seen in larger organizations, perhaps, with some variation, but essential variations on a theme.  The definition of each environment is in the table below.

Environment NameDescription
Development (DEV)The Development environment is used for developing the application and the submission of baseline code to the source control system.
System Integration Test (SIT)System integration testing is high-level software testing process in which testers verify that all related systems maintain data integrity and can operate in coordination with other systems in the same environment. The testing process ensures that all subcomponents are integrated successfully to provide expected results.
Software Integration Test (SWIT)Software Integration Test is where software module or component subset testing occurs to verify the functionality and/or usability of a module or component and interaction with associated software models and components.
End-To-End (E2E) TestingEnd-to-End Testing exercises a complete production-like scenario of the software system, it also validates batch and data processing from other upstream/downstream systems (interfaces).
System Acceptance Testing (SAT)System Acceptance Testing is to simulate the business environment, security, and regression tests. System Acceptance Testing is conducted to gain acceptance of all functionality from the user community and meets user requirements, as specified.
Production (PROD)The production environment is the final release environment, where the system will begin its Initial Operating Capability (IOC).
Control (CTRL)Control is the ‘Gold’ standard baseline environment from which migrations and new environments are provisioned.  This environment houses base configurations and metadata.  It is not used for testing.

Related References

Migration Path Environments for Small Organizations

After describing the migration path environments for different customers, I thought it might be time to define some migration paths and their associated environments.  This a migration environment pattern I have, usually, seen in small organizations.  The definition for each environment is in the table below.

Environment NameDescription
Development (DEV)The Development environment is used for developing the application and the submission of baseline code to the source control system.
Quality Assurance (QA):The Quality Assurance environment is used for testing of configuration, performance, application processes, and functionality validation.
Production (PROD)The production environment is the final release environment, where the system will begin its Initial Operating Capability (IOC).

Related References

Data Modeling – What is Data Modeling?

Data modeling is the documenting of data relationships, characteristics, and standards based on its intended use of the data.   Data modeling techniques and tools capture and translate complex system designs into easily understood representations of the data creating a blueprint and foundation for information technology development and reengineering.

A data model can be thought of as a diagram that illustrates the relationships between data. Although capturing all the possible relationships in a data model can be very time-intensive, well-documented models allow stakeholders to identify errors and make changes before any programming code has been written.

Data modelers often use multiple models to view the same data and ensure that all processes, entities, relationships, and data flows have been identified.

There are several different approaches to data modeling, including:

Concept Data Model (CDM)

  • The Concept Data Model (CDM) identifies the high-level information entities, their relationships, and organized in the Entity Relationship Diagram (ERD).

Logical Data Model (LDM)

  • The Logical Data Model (LDM) defines detail business information (in business terms) within each of the Concept Data Model and is a refinement of the information entities of the Concept Data Model.   Logical data models are a non-RDBMS specific business definition of tables, fields, and attributes contained within each information entity from which the Physical Data Model (PDM) and Entity Relationship Diagram (ERD) is produced.

Physical Data Model (PDM)

  • The Physical Data Model (PDM) provides the actual technical details of the model and database object (e.g. table names, field names, etc.) to facilitate the creation of accurate detail technical designs and actual database creation.  Physical Data Models are RDBMS specific definition of the logical model used build database, create deployable DDL statements, and to produce the Entity Relationship Diagram (ERD).

Related References

Information Management Unit Testing

Information management projects generally have the following development work:

  • Data movement software;
  • Data conversion software;
  • Data cleansing routines;
  • Database development DDL; and
  • Business intelligence and reporting analytical solutions.

Module testing validates that each module’s logic satisfies requirements specified in the requirements specification.

Effective  Practices

  1. Should focus on testing individual modules to ensure that they perform to specification, handle all exceptions as expected, and produce the appropriate alerts to satisfy error handling.
  2. Should be performed in the development environment.
  3. Should be conducted by the software developer who develops the code.
  4. Should validate the module’s logic, adherence to functional requirements and adherence to technical standards.
  5. Should ensure that all module source code has been executed and each conditional logic branch followed.
  6. Test data and test results should be recorded and form part of the release package when the code moves to production.
  7. Should include a code review, which should:
  • Focus on reviewing code and test results to provide additional verification that the code conforms to data movement best practices and security requirement; and
  • Verify that test results confirm that all conditional logic paths were followed and that all error messages were tested properly.

Testing Procedures

  1. Review design specification with the designer.
  2. Prepare test plan before coding.
  3. Create test data and document expected test results.
  4. Ensure that test data validate the module’s logic, adherence to functional requirements and adherence to technical standards.
  5. Ensure that test data test all module source code and each conditional logic branch.
  6. Conduct unit test in a personal schema.
  7. Document test results.
  8. Place test data and test results in project documentation repository.
  9. Check code into the code repository.
  10. Participate in code readiness review with Lead Developer.
  11. Schedule code review with appropriate team members.
  12. Assign code review roles as follows:
  • Author, the developer who created the code;
  • Reader, a developer who will read the code during the code review—The reader may also be the author; and
  • Scribe, a developer who will take notes.

Code Review Procedures

  1. Validate that code readiness review has been completed.
  2. Read the code.
  3. Verify that code and test results conform to data movement best practices.
  4. Verify that all conditional logic paths were followed and that all error messages were tested properly.
  5. Verify that coding security vulnerability issues have been addressed.
  6. Verify that test data and test results have been placed in project documentation repository.
  7. Verify that code has been checked into the code repository.
  8. Document action items.

Testing strategies

  1. Unit test data should be created by the developer and should be low volume.
  2. All testing should occur in a developer’s personal schema.

Summary

Unit testing is generally conducted by the developer who develops the code and validates that each module’s logic satisfies requirements specified in the requirements specification.

Where do data models fit in the Software Development Life Cycle (SDLC) Process?

In the classic Software Development Life Cycle (SDLC) process, Data Models are typically initiated, by model type, at key process steps and are maintained as data model detail is added and refinement occurs.

The Concept Data Model (CDM) is, usually, created in the Planning phase.   However,  creation the Concept Data Model can slide forwarded or backward, somewhat, within the System Concept Development, Planning, and Requirements Analysis phases, depending upon whether the application being modeled is a custom development effort or a modification of a Commercial-Off-The-Shelf (COTS) application.  The CDM is maintained, as necessary, through the remainder of the SDLC process.

The Logical Data Model (LDM) is created in the Requirement Analysis phase and is a refinement of the information entities of the Concept Data Model. The LDM is maintained, as necessary, through the remainder of the SDLC process.

The Physical Data Model (PDM) is created in the Design phase to facilitate the creation of accurate detail technical designs and actual database creation. The PDM is maintained, as necessary, through the remainder of the SDLC process.

Related References: