Using Logical Data Lakes

Today, data-driven decision making is at the center of all things. The emergence of data science and machine learning has further reinforced the importance of data as the most critical commodity in today’s world. From FAAMG (the biggest five tech companies: Facebook, Amazon, Apple, Microsoft, and Google) to governments and non-profits, everyone is busy leveraging the power of data to achieve final goals. Unfortunately, this growing demand for data has exposed the inefficiency of the current systems to support the ever-growing data needs. This inefficiency is what led to the evolution of what we today know as Logical Data Lakes.

What Is a Logical Data Lake?

In simple words, a data lake is a data repository that is capable of storing any data in its original format. As opposed to traditional data sources that use the ETL (Extract, Transform, and Load) strategy, data lakes work on the ELT (Extract, Load, and Transform) strategy. This means data does not have to be first transformed and then loaded, which essentially translates into reduced time and efforts. Logical data lakes have captured the attention of millions as they do away with the need to integrate data from different data repositories. Thus, with this open access to data, companies can now begin to draw correlations between separate data entities and use this exercise to their advantage.

Primary Use Case Scenarios of Data Lakes

Logical data lakes are a relatively new concept, and thus, readers can benefit from some knowledge of how logical data lakes can be used in real-life scenarios.

To conduct Experimental Analysis of Data:

  • Logical data lakes can play an essential role in the experimental analysis of data to establish its value. Since data lakes work on the ELT strategy, they grant deftness and speed to processes during such experiments.

To store and analyze IoT Data:

  • Logical data lakes can efficiently store the Internet of Things type of data. Data lakes are capable of storing both relational as well as non-relational data. Under logical data lakes, it is not mandatory to define the structure or schema of the data stored. Moreover, logical data lakes can run analytics on IoT data and come up with ways to enhance quality and reduce operational cost.

To improve Customer Interaction:

  • Logical data lakes can methodically combine CRM data with social media analytics to give businesses an understanding of customer behavior as well as customer churn and its various causes.

To create a Data Warehouse:

  • Logical data lakes contain raw data. Data warehouses, on the other hand, store structured and filtered data. Creating a data lake is the first step in the process of data warehouse creation. A data lake may also be used to augment a data warehouse.

To support reporting and analytical function:

  • Data lakes can also be used to support the reporting and analytical function in organizations. By storing maximum data in a single repository, logical data lakes make it easier to analyze all data to come up with relevant and valuable findings.

A logical data lake is a comparatively new area of study. However, it can be said with certainty that logical data lakes will revolutionize the traditional data theories.

Related References

Infosphere Information Server (IIS) Component Alignment

Infosphere Information Server SDLC Alignment
Infosphere Information Server SDLC Alignment

In recent history, I have been asked several times to describe where different IIS components fit in the Software Development Lifecycle (SDLC) process.  The graphic above, list most of the more important IIS components in their relative SDLC relationships. However, it is important to note that that these are not absolutes. Many applications may cross boundaries depending on the practices of the individual company, the application spurt licensed by the company, and/or the applications implemented by the company.  For example, many components will participate in the sustainment phase of SDLC, although I did not list him in that role. This is especially true, if you’re using the governance tools (e.g. governance catalog ) and supporting your sustainment activities with modeling and development tools, such as, data architect.

Related References

InfoSphere Information Server (IIS) Component Descriptions

Infosphere Information Server LaunchPad Page
Infosphere Information Server LaunchPad Page

Each IIS component has a primary function in the InfoSphere architecture, which can be synopsized as follows:

ApplicationFunction
Blueprint DirectorIBM InfoSphere Blueprint Director is aimed at the Information Architect designing solution architectures for information-intensive projects.
Cognos (If purchased)Governance Dashboard (Framework Manager Model provided by IBM), Semantics, Analytics, and Reporting
Data Architect Data Architect is an enterprise data modeling and integration design tool. You can use it to discover, model, visualize, relate, and standardize diverse and distributed data assets, including dimensional models.
Data Click Data Click is an exciting new capability that helps novices and business users retrieve data and provision systems easily in only a few clicks.
DatastageDataStage is a data integration tool that enables users to move and transform data between operational, transactional, and analytical target systems.
DiscoveryDiscovery is used to identify the transformation rules that have been applied to source system data to populate a target. Once accurately defined, these business objects and transformation rules provide the essential input into information-centric projects.
FasttrackFastTrack streamlines collaboration between business analysts, data modelers, and developers by capturing and defining business requirements in a common format and, then, transforming that business logic (Source-to-Target-Mapping (STTM)) directly into DataStage ETL jobs.
Glossary AnywhereBusiness Glossary Anywhere, its companion module, augments Governance Catalog with more ease-of-use and extensibility features.
Governance CatalogThe Governance Catalog includes business glossary assets (categories, terms, information governance policies, and information governance rules) and information assets.
Information AnalyzerInformation Analyzer provides capabilities to profile and analyze data.
Information Services DirectorInformation Services Director provides a unified and consistent way to publish and manage shared information services in a service-oriented architecture (SOA).
Metadata Asset ManagerImport, export, and manage common metadata assets in Metadata Repository and across applications
Operations ConsoleAdmin workspaces to investigate data, deploy applications, Web services, and monitor schedules and logs.
QualitystageQualityStage provides data cleansing capabilities to help ensure quality and consistency by standardizing, validating, matching, and merging information to create comprehensive and authoritative information.
Server ManagerDeployment tool to move, deploy and control DataStage and QualityStage assets.

Related References