A semantic data model is a method of organizing and representing corporate data that reflects the meaning and relationships among data items. This method of organizing data helps end users access data autonomously using familiar business terms such as revenue, product, or customer via the BI (business intelligence) and other analytics tools. The use of a semantic model offers a consolidated, unified view of data across the business allowing end-users to obtain valuable insights quickly from large, complex, and diverse data sets.
What is the purpose
of semantic data modeling in BI and data virtualization?
A semantic data model sits between a reporting tool and the original database in order to assist end-users with reporting. It is the main entry point for accessing data for most organizations when they are running ad hoc queries or creating reports and dashboards. It facilitates reporting and improvements in various areas, such as:
No relationships or joins for end-users to worry about because they’ve already been handled in the semantic data model
Data such as invoice data, salesforce data, and inventory data have all been pre-integrated for end-users to consume.
Columns have been renamed into user-friendly names such as Invoice Amount as opposed to INVAMT.
The model includes powerful time-oriented calculations such as Percentage in sales since last quarter, sales year-to-date, and sales increase year over year.
Business logic and calculations are centralized in the semantic data model in order to reduce the risk of incorrect recalculations.
Data security can be incorporated. This might include exposing certain measurements to only authorized end-users and/or standard row-level security.
A well-designed semantic data model with agile tooling allows end-users to learn and understand how altering their queries results in different outcomes. It also gives them independence from IT while having confidence that their results are correct.
When analyzing individual column data, at its most foundational level, column data can be classified by their fundamental use/characteristics. Granted, when you start rolling up the structure into multiple columns, table structure and table relationship, then other classifications/behaviors, such as keys (primary and foreign), indexes, and distribution come into play. However, many times when working with existing data sets it is essential to understand the nature the existing data to begin the modeling and information governance process.
Column Data Classification
Generally, individual columns can be classified into the classifications:
Identifier — A column/field which is unique to a row and/or can identify related data (e.g., Person ID, National identifier, ). Basically, think primary key and/or foreign key.
Indicator — A column/field, often called a Flag, that has a binary condition (e.g., True or False, Yes or No, Female or Male, Active or Inactive). Frequently used to identify compliance with complex with a specific business rule.
Code — A column/field that has a distinct and defined set of values, often abbreviated (e.g., State Code, Currency Code)
Temporal — A column/field that contains some type date, timestamp, time, interval, or numeric duration data
Quantity — A column/field that contains a numeric value (decimals, integers, etc.) and is not classified as an Identifier or Code (e.g., Price, Amount, Asset Value, Count)
Text — A column/field that contains alphanumeric values, possibly long text, and is not classified as an Identifier or Code (e.g., Name, Address, Long Description, Short Description)
Large Object (LOB)– A column/field that contains data traditional long text fields or binary data like graphics. The large objects can be broadly classified as Character Large Objects (CLOBs), Binary Large Objects (BLOBs), and Double-Byte Character Large Object (DBCLOB or NCLOB).
A Common Data Model (CDM) is a share data structure designed to provide well-formed and standardized data structures within an industry (e.g. medical, Insurance, etc.) or business channel (e.g. Human resource management, Asset Management, etc.), which can be applied to provide organizations a consistent unified view of business information. These common models can be leveraged as accelerators by organizations form the foundation for their information, including SOA interchanges, Mashup, data vitalization, Enterprise Data Model (EDM), business intelligence (BI), and/or to standardize their data models to improve meta data management and data integration practices.
Data modeling is the documenting of data relationships, characteristics, and standards based on its intended use of the data. Data modeling techniques and tools capture and translate complex system designs into easily understood representations of the data creating a blueprint and foundation for information technology development and reengineering.
A data model can be thought of as a diagram that illustrates the relationships between data. Although capturing all the possible relationships in a data model can be very time-intensive, well-documented models allow stakeholders to identify errors and make changes before any programming code has been written.
Data modelers often use multiple models to view the same data and ensure that all processes, entities, relationships, and data flows have been identified.
There are several different approaches to data modeling, including:
Concept Data Model (CDM)
The Concept Data Model (CDM) identifies the high-level information entities, their relationships, and organized in the Entity Relationship Diagram (ERD).
Logical Data Model (LDM)
The Logical Data Model (LDM) defines detail business information (in business terms) within each of the Concept Data Model and is a refinement of the information entities of the Concept Data Model. Logical data models are a non-RDBMS specific business definition of tables, fields, and attributes contained within each information entity from which the Physical Data Model (PDM) and Entity Relationship Diagram (ERD) is produced.
Physical Data Model (PDM)
The Physical Data Model (PDM) provides the actual technical details of the model and database object (e.g. table names, field names, etc.) to facilitate the creation of accurate detail technical designs and actual database creation. Physical Data Models are RDBMS specific definition of the logical model used build database, create deployable DDL statements, and to produce the Entity Relationship Diagram (ERD).
In the classic Software Development Life Cycle (SDLC) process, Data Models are typically initiated, by model type, at key process steps and are maintained as data model detail is added and refinement occurs.
The Concept Data Model (CDM) is, usually, created in the Planning phase. However, creation the Concept Data Model can slide forwarded or backward, somewhat, within the System Concept Development, Planning, and Requirements Analysis phases, depending upon whether the application being modeled is a custom development effort or a modification of a Commercial-Off-The-Shelf (COTS) application. The CDM is maintained, as necessary, through the remainder of the SDLC process.
The Logical Data Model (LDM) is created in the Requirement Analysis phase and is a refinement of the information entities of the Concept Data Model. The LDM is maintained, as necessary, through the remainder of the SDLC process.
The Physical Data Model (PDM) is created in the Design phase to facilitate the creation of accurate detail technical designs and actual database creation. The PDM is maintained, as necessary, through the remainder of the SDLC process.