data-driven decision making is at the center of all things. The emergence of
data science and machine learning has further reinforced the importance of data
as the most critical commodity in today’s world. From FAAMG (the biggest five
tech companies: Facebook, Amazon, Apple, Microsoft, and Google) to governments
and non-profits, everyone is busy leveraging the power of data to achieve final
goals. Unfortunately, this growing demand for data has exposed the inefficiency
of the current systems to support the ever-growing data needs. This
inefficiency is what led to the evolution of what we today know as Logical Data
What Is a Logical
simple words, a data lake is a data repository that is capable of storing any
data in its original format. As opposed to traditional data sources that
use the ETL (Extract, Transform, and Load) strategy, data lakes work on the ELT
(Extract, Load, and Transform) strategy. This means data does not have to be
first transformed and then loaded, which essentially translates into reduced
time and efforts. Logical data lakes have captured the attention of
millions as they do away with the need to integrate data from different data
repositories. Thus, with this open access to data, companies can now begin to
draw correlations between separate data entities and use this exercise to their
Primary Use Case
Scenarios of Data Lakes
Logical data lakes are a
relatively new concept, and thus, readers can benefit from some knowledge of
how logical data lakes can be used in real-life scenarios.
Experimental Analysis of Data:
Logical data lakes can
play an essential role in the experimental analysis of data to establish its
value. Since data lakes work on the ELT strategy, they grant deftness and speed
to processes during such experiments.
To store and
analyze IoT Data:
Logical data lakes can
efficiently store the Internet of Things type of data. Data lakes are capable
of storing both relational as well as non-relational data. Under logical data
lakes, it is not mandatory to define the structure or schema of the data
stored. Moreover, logical data lakes can run analytics on IoT data and come up
with ways to enhance quality and reduce operational cost.
To improve Customer
Logical data lakes can
methodically combine CRM data with social media analytics to give businesses an
understanding of customer behavior as well as customer churn and its various
To create a Data
Logical data lakes
contain raw data. Data warehouses, on the other hand, store structured and
filtered data. Creating a data lake is the first step in the process of data
warehouse creation. A data lake may also be used to augment a data warehouse.
reporting and analytical function:
Data lakes can also be
used to support the reporting and analytical function in organizations. By
storing maximum data in a single repository, logical data lakes make it easier
to analyze all data to come up with relevant and valuable findings.
A logical data lake is a comparatively new area of study. However, it can be said with certainty that logical data lakes will revolutionize the traditional data theories.
Today, a business heavily depends on data to gain insights into their processes and operations and to develop new ways to increase market share and profits. In most cases, data required to generate the insights are sourced and located in diverse places, which requires reliable access mechanism. Currently, data warehousing and data virtualization are two principal techniques used to store and access the sources of critical data in a company. Each approach offers various capabilities and can be deployed for particular use cases as described in this article.
A data warehouse is designed and developed to secure host historical data from different sources. In effect, this technique protects data sources from performance degradation caused by the impact of sophisticated analytics and enormous demands for reports. Today, various tools and platforms have been developed for data warehouse automation in companies. They can be deployed to quicken development, automate testing, maintenance, and other steps involved in data warehousing. In a data warehouse, data is stored as a series of snapshots, where a record represents data at a particular time. In effect, companies can analyze data warehouse snapshots to compare data between different periods. The results are converted into insights required to make crucial business decisions.
Moreover, a data warehouse is optimized for other functions, such as data retrieval. The technology duplicates data to allow database de-normalization that enhances query performance. The solution is further deployed to create an enterprise data warehouse (EDW) used to service the entire organization.
Features of a Data Warehouse
A data warehouse is subject-oriented, and it is designed to help entities analyze data. For instance, a company can start a data warehouse focused on sales to learn more about sales data. Analytics on this warehouse can help establish insights such as the best customer for the period. The data warehouse is subject oriented since it can be defined based on a subject matter.
A data warehouse is integrated. Data from various sources is first out into a consistent format. The process requires the firm to resolve some challenges, such as naming conflicts and inconsistencies on units of measure.
A data warehouse in nonvolatile. In effect, data entered into the warehouse should not change after it is stored. This feature increases accuracy and integrity in data warehousing.
A data warehouse is time variant since it focuses on data changes over time. Data warehousing discovers trends in business by using large amounts of historical data. In effect, a typical operation in a data warehouse scans millions of rows to return an output.
A data warehouse is designed and developed to handle ad hoc queries. In most cases, organizations may not predict the amount of workload of a data warehouse. Therefore, it is recommendable to optimize the data warehouse to perform optimally over any possible query operation.
A data warehouse is regularly updated by the ETL process using bulk data modification techniques. Therefore, end users cannot directly update the data warehouse.
Advantages of Data Warehousing
The primary motivation for developing a data warehouse is to provide timely information required for decision making in an organization. A business intelligence data warehouse serves as an initial checkpoint for crucial business data. When a company stores its data in a data warehouse, tracking it becomes natural. The technology allows users to perform quick searches to be able to retrieve and analyze static data.
Another driver for companies investing in data warehouses involves integrating data from disparate sources. This capability adds value to operational applications like customer relationship management systems. A well-integrated warehouse allows the solution to translate information to a more usable and straightforward format, making it easy for users to understand the business data.
The technology also allows organizations to perform a series of analysis on data.
A data warehouse reduces the cost to access historical data in an organization.
Data warehousing provides standardization of data across an organization. Moreover, it helps identify and eliminate errors. Before loading data, the solution shows inconsistencies to users and corrects them.
A data warehouse also improves the turnaround time for analysis and report generation.
The technology makes it easy for users to access and share data. A user can conduct a quick search on a data warehouse to find and analyze static data without wasting time.
Data warehousing removes informational processing load from transaction-oriented databases.
Disadvantages of Data Warehousing
While data warehousing technology is undoubtedly beneficial to many organizations, not all data warehouses are relevant to a business. In some cases, a data warehouse can be expensive to scale and maintain.
Preparing a data warehouse is time-consuming since it requires users to input raw data, which has to be achieved manually.
A data warehouse is not a perfect choice for handing unstructured and complex raw data. Moreover, it faces difficulties incompatibility. Depending on the data sources, companies may require a business intelligence team to ensure compatibility is achieved for data coming from sources running distinct operating systems and programs.
The technology requires a maintenance cost to continue working correctly. The solution needs to be updated with latest features that might be costly. Regularly maintaining a data warehouse will need a business to spend more on top of the initial investment.
A data warehouse use can be limited due to information privacy and confidentiality issues. In most cases, businesses collect and store sensitive data belonging to their clients. Viewing it is only allowed to individual employees, which limits the benefits offered by a data warehouse.
Data Warehousing Use Case
There are a series of ways organizations use data warehouses. Businesses can optimize the technology for performance by identifying the type of data warehouse they have.
A data warehouses can be used by an organization that is struggling to report efficiently on business operations and activities. The solution makes it possible to access the required data
A data warehouse is necessary for an organization where data is copied separately by different divisions for analysis in spreadsheets that are not consistent with one another.
Data warehousing is crucial in organizations where uncertainties about data accuracy are causing executives to question the veracity of reports.
A data warehouse is crucial for business intelligence acceleration. The technology delivers rapid data insights to analysts at different scales, concurrency, and without requiring manual tuning or optimization of a database.
Data virtualization technology does not require transfer or storage of data. Instead, users employ a combination of application programming interfaces (APIs) and metadata (data about data) to interface with data in different sources. Users use joined queries to gain access to the original data sources. In other words, data virtualization offers a simplified and integrated view to business data in real-time as requested by business users, applications, and analytics. In effect, the technology makes it possible to integrate data from distinct sources, formats, and locations, without replication. It creates a unified virtual data layer that delivers data services to support users and various business applications.
Data virtualization performs many of the same data integration functions, that is, extract, transform, and load, data replication, and federation. It leverages modern technology to deliver real-time data integration with agility, low cost, and high speed. In effect, data virtualization eliminates traditional data integration and reduces the need for replicated data warehouses and data marts in most cases.
Capabilities and Benefits of Data Virtualization
There are various benefits of implementing data virtualization in an organization.
Firstly, data virtualization allows access and leverage of all information that helps a firm achieve a competitive advantage. The solution offers a unified virtual layer that abstracts the underlying source complexity and presents disparate data sources as a single source.
Data virtualization is cheaper since it does not require actual hardware devices to be installed. In other words, organizations no longer need to purchase and dedicate a lot of IT resources and additional monetary investment to create on-site resources, similar to the one used in a data warehouse.
Data virtualization allows speedy deployment of resources. In this solution, resource provisioning is fast and straightforward. Organizations are not required to set up physical machines or to create local networks or install other IT components. Users have a single point of access to a virtual environment that can be distributed to the entire company.
Data virtualization is an energy-efficient system since the solution does not require additional local hardware and software. Therefore, an organization will not be required to install cooling systems.
Disadvantages of Data Virtualization
Data virtualization creates a security risk. In the modern world, having information is a cheap way to make money. In effect, company data is frequently targeted by hackers. Implementing data virtualization from disparate sources may give an opportunity to malicious users to steal critical information and use it for monetary gain.
Data virtualization requires a series of channels or links that must work in cohesion to perform the intended task. In this cases, all data sources should be available for virtualization to work effectively.
Data Virtualization Use Cases
Companies that rely on business intelligence require data virtualization for rapid prototyping to meet immediate business needs. Data virtualization can create a real-time reporting solution that unifies access to multiple internal databases.
Provisioning data services for single-view applications, such as in customer service and call center applications require data virtualization.
End of Support for IBM InfoSphere Information Server 9.1.0
IBM InfoSphere Information Server 9.1.0 will reach End of Support on 2018-09-30. If you are still on the InfoSphere Information Server (IIS) 9.1.0, I hope you have a plan to migrate to an 11-series version soon. InfoSphere Information Server (IIS) 11.7 would be worth considering if you don’t already own an 11-series license. InfoSphere Information Server (IIS) 11.7 will allow you to take advantage of the evolving thin client tools and other capabilities in the 2018 release pipeline without needing to perform another upgrade.
IBM Support, End of support notification: InfoSphere Information Server 9.1.0
During the course of the week, the discussion happened regarding the different places where a person might read the DataStage and QualityStage logs in InfoSphere. I hadn’t really thought about it, but here are a few places that come to mind:
IBM InfoSphere DataStage and QualityStage Operations Console
IBM InfoSphere DataStage and QualityStageDirector client
IBM InfoSphere DataStage and QualityStageDesigner client by pressing Ctrl+L
While investigating a recent Infosphere Information Server (IIS), Datastage, Essbase Connect error I found the explanations of the probable causes of the error not to be terribly meaningful. So, now that I have run our error to ground, I thought it might be nice to jot down a quick note of the potential cause of the ‘Client Commands are Currently Not Being Accepted’ error, which I gleaned from the process.
Error Message Id
An error occurred while processing the request on the server. The error information is 1051544 (message on contacting or from application:[<<DateTimeStamp>>]Local////3544/Error(1013204) Client Commands are Currently Not Being Accepted.
Possible Causes of The Error
This Error is a problem with access to the Essbase object or accessing the security within the Essbase Object. This can be a result of multiple issues, such as:
Object doesn’t exist – The Essbase object didn’t exist in the location specified,
Communications – the location is unavailable or cannot be reached,
Path Security – Security gets in the way to access the Essbase object location
Essbase Security – Security within the Essbase object does not support the user or filter being submitted. Also, the Essbase object security may be corrupted or incomplete.
Essbase Object Structure – the Essbase object was not properly structured to support the filter or the Essbase filter is malformed for the current structure.
IBM Knowledge Center, InfoSphere Information Server 11.7.0, Connecting to data sources, Enterprise applications, IBM InfoSphere Information Server Pack for Hyperion Essbase
When you are controlling a chain of sequences in the job stream and taking advantage of reusable (multiple instances) jobs it is useful to be able to pass the Invocation ID from the master controlling sequence and have it passed down and assigned to the job run. This can easily be done with needing to manual enter the values in each of the sequences, by leveraging the DSJobInvocationId variable. For this to work:
The job must have ‘Allow Multiple Instance’ enabled
The Invocation Id must be provided in the Parent sequence must have the Invocation Name entered
The receiving child sequence will have the invocation variable entered
At runtime, a DataStage invocation id instance of the multi-instance job will generate with its own logs.
This approach allows for the reuse of job and the assignment of meaningful instance extension names, which are managed for a single point of entry in the object tree.