A collection of information technology and consulting knowledge
Category: DataBase Management Systems (DBMS)
A database management system (DBMS) is a software package designed to define, manipulate, retrieve and manage data in a database. A DBMS generally manipulates the data itself, the data format, field names, record structure and file structure. It also defines rules to validate and manipulate this data. A DBMS relieves users of framing programs for data maintenance. Fourth-generation query languages, such as SQL, are used along with the DBMS package to interact with a database. Some other DBMS examples include: MySQL, SQL Server, Oracle, dBASE, FoxPro
Developed by Apache Software Foundation, Apache Derby DB is a completely free, open-source relational database system developed purely with Java. It has multiple advantages that make it a popular choice for Java applications requiring small to medium-sized databases.
With over 15 years in development, Derby DB had time to grow, add new and improve on the existing components. Even though it has an extremely small footprint – only 3.5MB of all JAR files – Derby is a full-featured ANSI SQL database, supporting all the latest SQL standards, transactions, and security factors.
The small footprint adds to its versatility and portability – Derby can easily be embedded into Java applications with almost no performance impact. It’s extremely easy to install and configure, requiring almost no administration afterward. Once implemented, there is no need to further modify or set up the database at the end user’s computer. Alongside the embedded framework, Derby can also be used in a more familiar server mode.
All documentation containing different manuals for specific versions of Derby can be found on their official website, at :
Java is compatible with almost all the different platforms, including Windows, Linux, and MacOS. Since Derby DB is implemented completely in Java, it can be easily transferred without the need for different distribution downloads. It can use all types of Java Virtual Machines as long as they’re properly certified. Apache’s Derby includes the Derby code without any modification to the elemental source code.
Derby supports transactions, which are executed for quick and secure data retrieval from the database as well as referential integrity. Even though the stored procedures are made in Java, in the client/server mode Derby can bind to PHP, Python and Perl programming languages.
All data is encrypted, with support for database triggers to maintain the integrity of the information. Alongside that, custom made functions can be created with any Java library so the users can manipulate the data however they want.
Embedded and Server
Derby’s embedded mode is usually recommended as a beginner-friendly option. The main differences are in who manages the database along with how it’s stored.
When Derby is integrated as a whole and becomes a part of the main program, it acts as a persistent data store and the database is managed through the application. It also runs within the Java Virtual Machine of the application. In this mode, no other user is able to access the database – only the app that it is integrated into. As a result of these limits, the embedded mode is most useful for single-user apps.
If it’s run in server mode, the user starts a Derby network server which is tasked with responding to database requests. Derby runs in a Java Virtual Machine that hosts the server. The database is loaded onto the server, waiting for client applications to connect to it. This is the most typical architecture used by most of the other bigger databases, such as MySQL. Server mode is highly beneficial when more than one user needs to have access to the database across the network.
Derby has to be downloaded and extracted from the .zip package before being used. Downloads can be found at the Apache’s official website:
Interacting with Derby is done through the use of ‘ij’ tool, which is an interactive JDBC scripting program. It can be used for running interactive queries and scripts against a Derby database. The ij tool is run through the command shell.
The initial Derby connection command differs depending on whether it’s going to be run in embedded or server mode.
Today, newfound efficiencies and innovation are key to any business success – small, medium or large. In the rapidly evolving field of data analytics, innovative approaches to handling data are particularly important since data is the most valuable resource any business can have. IBM common SQL Engine is delivering application and query compatibility that is allowing companies to turn their data into actionable insights. This is allowing businesses to unleash the power of their databases without constraints.
But, is this really important?
Yes. Many businesses have accumulated tons of data over the years. This data resides in higher volumes, more locations throughout an enterprise – on-premise and on-cloud –, and in greater variety. Typically, this data should be a huge advantage, providing enterprises with actionable insights. But, often, this doesn’t happen.
IBM Hybrid Data Management.
With such a massive barrel of complex legacy data, many organizations find it confusing to decide what to do with it. Or where to start. The process of migrating all that data into new systems is simply a non-starter. As a solution, enterprises are turning to IBM Db2 – a hybrid, intuitive data approach that marries data and analytics seamlessly. IBM Db2 hybrid data management allows flexible cloud and on-premises deployment of data.
However, such levels of flexibility typically require organizations to rewrite or restructure their queries, and applications that will use the diverse, ever-changing data. These changes may even require you to license new software. This is costly and unfeasible. To bridge this gap, the Common SQL Engine (CSE) comes into play.
How IBM Common SQL Engine is Positioning Db2 for the Future?
The IBM Common SQL Engine inserts a single layer of data abstraction at the very data source. This means that, instead of migrating the data all at once, you can now apply data analytics wherever the data resides – whether on private, public or hybrid cloud – by using the Common SQL Engine as a bridge.
The IBM’s Common SQL Engine provides portability and consistency of SQL commands, meaning that the SQL is functionally portable across multiple implementations. It allows seamless movement of workloads to the cloud and allows for multiplatform integration and configurations regardless of their programming language.
Ideally, the Common SQL Engine is supposed to be the heart of the query and the foundation of application compatibility. But it does so much more!
Its compatibility extends beyond data analytic applications to include security, management, governance, data management, and other functionalities as well.
How does this improve the quality, flexibility, and portability of Db2?
By allowing for integration across multiple platforms, workloads and programming languages, the Common SQL Engine, ultimately, leads to a “data without limits” environment for Db2 hybrid data management family through:
Query and application compatibility
The Common SQL engine (CSE) ensures that users can write a query, and be confident that it will work across the Db2 hybrid data management family of offerings. With the CSE, you can change your data infrastructure and location – on-cloud or on-premises – without having to worry about license costs and application compatibility.
Data virtualization and Integration
The common SQL engine has a built-in data virtualization service that ensures that you can access your data from all your sources. These services position Db2 family of offerings including, IBM Db2 warehouse, IBM Db2, IBM Db2 BigSQL amongst others.
This services also applies to IBM Integrated Analytics System, Teradata, Oracle, Puredata and Microsoft SQL server. Besides, you can work seamlessly with open-source solutions such as HIVE; and cloud sources such as Amazon Redshift. Such levels of integration are unprecedented!
By allowing users to effectively pull data from Db2 data stores and integrate it with data from non-IBM stores using a single query, the common SQL engine places Db2 at an authoritative position as compared to other data stores.
Licensing is one of the hardest nuts to crack, especially for smart organizations who rely on technologies such as the cloud to deliver their services. While application compatibility and data integration will save you time, flexible licensing saves you money, on the spot.
IBM’s common SQL engine allows flexible licensing, meaning that you can purchase one license model and deploy it whenever needed, or as your data architecture evolves. Using IBM’s FlexPoint licensing, you can purchase FlexPoints and use them across all Db2 data management offerings. This is a convenience in one place.
The flexible licensing will not only simplify the adoption and exchange of platform capabilities, but it also positions your business strategically by making it more agile. Your data managers will be able to access the tools needed on the fly, without going through a lethargic and tedious procurement process.
IBM Db2 Data Management Family Is Supported by Common SQL Engine (CSE) .
IBM Db2 is a family of custom, deployable database that allows enterprises to leverage existing investments. IBM Db2 allows businesses to use any type of data from an either structured or unstructured database (or data warehouse). It provides the right data foundation/environment with industry-leading data compression, on-premise and cloud deployment options, modern data security, robust performance for mixed loads and the ability to adjust and scale without redesigning.
The IBM Db2 family enable businesses to adapt, scale quickly and remain competitive without compromising security, risk levels or privacy. It features:
Deployment and flexibility: On-premises, scale-on demand, and private or cloud deployments• Compression and performance
Embedded IoT technology is allowing businesses to act fast on the fly.
Some of these Db2 family offerings that are supported by the common SQL engine include:
Db2 Big SQL
Db2 on Cloud
Db2 Warehouse on Cloud
IBM Integrated Analytics System (IIAS)
Db2 Family Offerings and Beyond
Since the common SQL engine mainly focuses on data federation and propensity, other non-IBM databases can as well plug into the engine for SQL processing. These other 3rd party offerings include:
Watson Data Platform
Microsoft SQL Server
IBM Common SQL engine is allowing organizations to fully use data analytics to future-proof their business, and as well remain agile and competitive. In fact, besides the benefits of having robust tools woven into CSE, this SQL engine offers superior analytics and machine-learning positioning. Data processing can now happen at the speed of light –- 2X to 5X faster. The IBM Common SQL engine adds important capabilities to Db2, including freedom of location, freedom of use, and freedom of assembly.
Data analytics has changed where data is no longer manageable in relational databases only. Data is flowing from various sources which are not of the same format. This means it is not possible to store all data in the same repository. Some are best suited for storing in relational databases, others for Apache Hadoop while others are best suited for NoSQL databases.
During data analyzing, so much time is taken in trying to bring the distributed data together instead of obtaining insights. Db2 Federation has come to the rescue of data analysts. Federation concept in db2 eliminates the need for storing data in different repositories and reduces the hassle of getting insights.
What is DB2 Federation?
DB2 federation is a data integration technology that permits remote database objects to be accessed as local DB2 database objects. This technology connects multiple databases and makes them appear like one database.
How does DB2 federation work?
Federation allows you to access all of your data that is on multiple distributed databases using a single query. When implemented in an organization, this technology can be used to access data that is on any of the organization’s Db2, whether local or in the cloud.
Why use DB2 federation?
So, why should you use the federation? This concept brings data of all formats into one virtual source. With data being retrieved from one virtual source, analyzing it becomes cost-effective and efficient.
What are its primary use cases for DB2 federation?
Merging of various sources of data
DB2 federation facilitates consolidating of data from sources data local and cloud to form one virtual data source. This eliminates the process of migrating data which can be expensive and troublesome.
Increase the capacity of a repository beyond the fixed limits
Physical storage capacity is bound to have a limit which is one reason you may find an organization has distributed its data in various repositories. With federation, the storage is virtual and therefore doesn’t have any limit. This technology can greatly help you if your physical dataset is running low on space.
Linking up to Db2 Warehouse on Cloud
People who use Db2 products can federate data from Db2 on Cloud and Db2 Warehouse on the Cloud. This will give them a joint interface where they can access, add, query, and analyze data without encountering the complex ETL processes. Better still, no additional code will be required to execute all these processes. This makes it easy for people with the low technical know-how to use these products smoothly.
Split data across different servers
At times, you might choose to partition your data. With federation integration technology, partitioned data can be queried with a unified interface. Federation allows you to better balance your workloads, scale precise parts of an app, and create micro-services that work harmoniously.
Generally, db2 federation makes it access data by bringing it together into a single virtual source. This brings about cost and time-saving benefits. When you want to analyze data, you can get insights immediately instead of spending a lot of time querying through repositories.
Over the years have occasionally use the action column feature, however, the last month or so I have found myself using it quite a lot. This is especially true in relation to the tea set and not just in relation to the change capture stage.
The first thing you need to know is, if you want to prevent getting the ‘no action column found’ notice on the target stage, need to ensure that the action column has been coded to be a single character field char (1). Otherwise, the Netezza connector stage will not recognize your field as an action column.
While most developers will commonly work with the action column feature in relation to the change capture stage, it can also be very useful if you have created a field from one or more inputs to tell you what behavior the row requires. I have found that this approach can be very useful and efficient under the right circumstances.
Action column configuration example
Change Code Values Mapping To Action Column
Here’s a quick reference table to provide the interpretation of the change type code to the actual one character action column value to which it will need to be interpreted.
Change Code Type
Change Type Code
Action Column Value
Copy (Data Without Changes)
value for this Change Type
Example Transformer Stage, Derivation
Here is a quick transformer stage derivation coding example to take advantage of the action call capabilities. If you haven’t already handled the removal of the copy rows, you may also want to add a constraint.
The combination I most frequently find myself using is the insert and update combination.
if Lnk_Out_To_Tfm.change_code=1 then ‘I’
Else if Lnk_Out_To_Tfm.change_code=2 then ‘D’
Else if Lnk_Out_To_Tfm.change_code=3 then ‘U’
Home > InfoSphere Information Server 11.7.0 > InfoSphere DataStage and QualityStage > Developing parallel jobs > Introduction to InfoSphere DataStage Balanced Optimization > Job design considerations > Specific considerations for the Netezza connector
I have found myself using this simple, but useful SQL time in recent weeks to research different issues and to help with impact analysis. So, I thought I would post it while I’m thinking about it. It just gives a list of views using a table, which can be handy to know. This SQL is simple and could be converted to an equi-join. I used the like statement mostly because I sometimes want to know if there are other views a similar nature in the same family (by naming convention) of tables.
Select All Fields From The _V_View
This is the simplest form of this SQL to views, which a table.
Select * from _v_view
where DEFINITION like ‘%<<TABLE_NAME>>%’ ;
Select Minimal Fields From The _V_View
This is the version of the SQL, which I normally use, to list the views, which use a table.
Globally, organizations are facing challenges emanating from data issues, including data consolidation, value, heterogeneity, and quality. At the same time, they have to deal with the aspect of Big Data. In other words, consolidating, organizing, and realizing the value of data in an organization has been a challenge over the years. To overcome these challenges, a series of strategies have been devised. For instance, organizations are actively leveraging on methods such as Data Warehouses, Data Marts, and Data Stores to meet their data assets requirements. Unfortunately, the time and resources required to deliver value using these legacy methods is a distressing issue. In most cases, typical Data Warehouses applied for business intelligence (BI) rely on batch processing to consolidate and present data assets. This traditional approach is affected by the latency of information.
As the name suggests, Big Data describes a large volume of data that can either be structured or unstructured. It originates from business processes among other sources. Presently, artificial intelligence, mobile technology, social media, and the Internet of Things (IoT) have become new sources of vast amounts of data. In Big Data, the organization and consolidation matter more than the volume of the data. Ultimately, big data can be analyzed to generate insights that can be crucial in strategic decision making for a business.
Features of Big Data
The term Big Data is relatively new. However, the process of collecting and preserving vast amounts of information for different purposes has been there for decades. Big Data gained momentum recently with the three V’s features that include volume, velocity, and variety.
Volume: First, businesses gather information from a set of sources, such as social media, day-to-day operations, machine to machine data, weblogs, sensors, and so on. Traditionally, storing the data was a challenge. However, the requirement has been made possible by new technologies such as Hadoop.
Velocity: Another defining nature of Big Data is that it flows at an unprecedented rate that requires real-time processing. Organizations are gathering information from RFID tags, sensors, and other objects that need timely processing of data torrents.
Variety: In modern enterprises, information comes in different formats. For instance, a firm can gather numeric and structured data from traditional databases as well as unstructured emails, video, audio, business transactions, and texts.
Complexity: As mentioned above, Big Data comes from diverse sources and in varying formats. In effect, it becomes a challenge to consolidate, match, link, cleanse, or modify this data across an organizational system. Unfortunately, Big Data opportunities can only be explored when an organization successfully correlates relationships and connects multiple data sets to prevent it from spiraling out of control.
Variability: Big Data can have inconsistent flows within periodic peaks. For instance, in social media, a topic can be trending, which can tremendously increase collected data. Variability is also common while dealing with unstructured data.
Big Data Potential and Importance
The vast amount of data collected and preserved on a global scale will keep growing. This fact implies that there is more potential to generate crucial insights from this information. Unfortunately, due to various issues, only a small fraction of this data actually gets analyzed. There is a significant and untapped potential that businesses can explore to make proper and beneficial use of this information.
Analyzing Big Data allows businesses to make timely and effective decisions using raw data. In reality, organizations can gather data from diverse sources and process it to develop insights that can aid in reducing operational costs, production time, innovating new products, and making smarter decisions. Such benefits can be achieved when enterprises combine Big Data with analytic techniques, such as text analytics, predictive analytics, machine learning, natural language processing, data mining and so on.
Big Data Application Areas
Practically, Big Data can be used in nearly all industries. In the financial sector, a significant amount of data is gathered from diverse sources, which requires banks and insurance companies to innovate ways to manage Big Data. This industry aims at understanding and satisfying their customers while meeting regulatory compliance and preventing fraud. In effect, banks can exploit Big Data using advanced analytics to generate insights required to make smart decisions.
In the education sector, Big Data can be employed to make vital improvements on school systems, quality of education and curriculums. For instance, Big Data can be analyzed to assess students’ progress and to design support systems for professors and tutors.
Healthcare providers, on the other hand, collect patients’ records and design various treatment plans. In the healthcare sector, practitioners and service providers are required to offer accurate and timely treatment that is transparent to meet the stringent regulations in the industry and to enhance the quality of life. In this case, Big Data can be managed to uncover insights that can be used to improve the quality of service.
Governments and different authorities can apply analytics to Big Data to create the understanding required to manage social utilities and to develop solutions necessary to solve common problems, such as city congestion, crime, and drug use. However, governments must also consider other issues such as privacy and confidentiality while dealing with Big Data.
In manufacturing and processing, Big Data offers insights that stakeholders can use to efficiently use raw materials to output quality products. Manufacturers can perform analytics on big data to generate ideas that can be used to increase market share, enhance safety, minimize wastage, and solve other challenges faster.
In the retail sector, companies rely heavily on customer loyalty to maintain market share in a highly competitive market. In this case, managing big data can help retailers to understand the best methods to utilize in marketing their products to existing and potential consumers, and also to sustain relationships.
Challenges Handling Big Data
With the introduction of Big Data, the challenge of consolidating and creating value on data assets becomes magnified. Today, organizations are expected to handle increased data velocity, variety, and volume. It is now a business necessity to deal with traditional enterprise data and Big Data. Traditional relational databases are suitable for storing, processing, and managing low-latency data. Big Data has increased volume, variety, and velocity, making it difficult for legacy database systems to efficiently handle it.
Failing to act on this challenge implies that enterprises cannot tap the opportunities presented by data generated from diverse sources, such as machine sensors, weblogs, social media, and so on. On the contrary, organizations that will explore Big Data capabilities amidst its challenges will remain competitive. It is necessary for businesses to integrate diverse systems with Big Data platforms in a meaningful manner, as heterogeneity of data environments continue to increase.
Virtualization involves turning physical computing resources, such as databases and servers into multiple systems. The concept consists of making the function of an IT resource simulated in software, making it identical to the corresponding physical object. Virtualization technique uses abstraction to create a software application to appear and operate like hardware to provide a series of benefits ranging from flexibility, scalability, performance, and reliability.
Typically, virtualization is made possible using virtual machines (VMs) implemented in microprocessors with necessary hardware support and OS-level implementations to enhance computational productivity. VMs offers additional convenience, security, and integrity with little resource overhead.
Benefits of Virtualization
Achieving the economics of wide-scale functional virtualization using available technologies is easy to improve reliability by employing virtualization offered by cloud service providers on fully redundant and standby basis. Traditionally, organizations would deploy several services to operate at a fraction of their capacity to meet increased processing and storage demands. These requirements resulted in increased operating costs and inefficiencies. With the introduction of virtualization, the software can be used to simulate functionalities of hardware. In effect, businesses can outstandingly eliminate the possibility of system failures. At the same time, the technology significantly reduces capital expense components of IT budgets. In future, more resources will be spent on operating, than acquisition expenses. Company funds will be channeled to service providers instead of purchasing expensive equipment and hiring local personnel.
Overall, virtualization enables IT functions across business divisions and industries to be performed more efficiently, flexibly, inexpensively, and productively. The technology meaningfully eliminates expensive traditional implementations.
Apart from reducing capital and operating costs for organizations, virtualization minimizes and eliminates downtime. It also increases IT productivity, responsiveness, and agility. The technology provides faster provisioning of resources and applications. In case of incidents, virtualization allows fast disaster recovery that maintains business continuity.
Types of Virtualization
There are various types of virtualization, such as a server, network, and desktop virtualization.
In server virtualization, more than one operating system runs on a single physical server to increase IT efficiency, reduce costs, achieve timely workload deployment, improve availability and enhance performance.
Network virtualization involves reproducing a physical network to allow applications to run on a virtual system. This type of virtualization provides operational benefits and hardware independence.
In desktop virtualization, desktops and applications are virtualized and delivered to different divisions and branches in a company. Desktop virtualization supports outsourced, offshore, and mobile workers who can access simulate desktop on tablets and iPads.
Characteristics of Virtualization
Some of the features of virtualization that support the efficiency and performance of the technology include:
Partitioning: In virtualization, several applications, database systems, and operating systems are supported by a single physical system since the technology allows partitioning of limited IT resources.
Isolation: Virtual machines can be isolated from the physical systems hosting them. In effect, if a single virtual instance breaks down, the other machine, as well as the host hardware components, will not be affected.
Encapsulation: A virtual machine can be presented as a single file while abstracting other features. This makes it possible for users to identify the VM based on a role it plays.
Data Virtualization – A Solution for Big Data Challenges
Virtualization can be viewed as a strategy that helps derive information value when needed. The technology can be used to add a level of efficiency that makes big data applications a reality. To enjoy the benefits of big data, organizations need to abstract data from different reinforcements. In other words, virtualization can be deployed to provide partitioning, encapsulation, and isolation that abstracts the complexities of Big Data stores to make it easy to integrate data from multiple stores with other data from systems used in an enterprise.
Virtualization enables ease of access to Big Data. The two technologies can be combined and configured using the software. As a result, the approach makes it possible to present an extensive collection of disassociated and structured and unstructured data ranging from application and weblogs, operating system configuration, network flows, security events, to storage metrics.
Virtualization improves storage and analysis capabilities on Big Data. As mentioned earlier, the current traditional relational databases are incapable of addressing growing needs inherent to Big Data. Today, there is an increase in special purpose applications for processing varied and unstructured big data. The tools can be used to extract value from Big Data efficiently while minimizing unnecessary data replication. Virtualization tools also make it possible for enterprises to access numerous data sources by integrating them with legacy relational data centers, data warehouses, and other files that can be used in business intelligence. Ultimately, companies can deploy virtualization to achieve a reliable way to handle complexity, volume, and heterogeneity of information collected from diverse sources. The integrated solutions will also meet other business needs for near-real-time information processing and agility.
In conclusion, it is evident that the value of Big Data comes from processing information gathered from diverse sources in an enterprise. Virtualizing big data offers numerous benefits that cannot be realized while using physical infrastructure and traditional database systems. It provides simplification of Big Data infrastructure that reduces operational costs and time to results. Shortly, Big Data use cares will shift from theoretical possibilities to multiple use patterns that feature powerful analytics and affordable archival of vast datasets. Virtualization will be crucial in exploiting Big Data presented as abstracted data services.
Occasionally, one runs into the problem of hidden field values breaking join criteria. I have had to clean up bad archive and conversion data with hidden characters serval times over the last couple of weeks, so, I thought I might as well capture this note for future use.
I tried the Replace command which is prevalent for Netezza answers to this issue on the web, but my client’s version does not support that command. So, I needed to use the Translate command instead to accomplish it. It took a couple of searches of the usual bad actors to find the character causing the issue, which on this day was chr(0). Here is a quick mockup of the command I used to solve this issue.
Example Select Statement
Here is a quick example select SQL to identify problem rows.
SELECT TRANSLATE(F.BLOGTYPE_CODE, CHR(0), ”) AS BLOGTYPE_CODE, BT.BLOG_TYP_ID, LENGTH(BT.BLOG_TYP_ID) AS LNGTH_BT, LENGTH(F.BLOGTYPE_CODE) AS LNGTH_ BLOGTYPE
FROM BLOGS_TBL F, BLOG_TYPES BT WHERE TRANSLATE(F.BLOGTYPE_CODE, CHR(0), ”) = BT.BLOG_TYP_ID AND LENGTH(BT.BLOG_TYP_ID) <>Length(LENGTH(F.BLOGTYPE_CODE) ;
Example Update Statement
Here is a quick shell update statement to remove the Char(0) characters from the problem field.
Update <<Your Table Name>> A
Set A.<<Your Field Name>> = TRANSLATE(A.<<Your FieldName>>, CHR(0), ”)
where length(A.<<Your Field Name>>) <> Length(A.<<Your FieldName>>) And << Additional criteria>>;