PostgreSQL – Useful Links

PostgreSQLUseful Links

Here are some PostgreSQL references, which may be useful.

Reference Type Link
PostgreSQL Home page & Download https://www.postgresql.org/
PostgreSQL Online Documentation https://www.postgresql.org/docs/manuals/
Citus – Scalability and Parallelism Extension https://github.com/citusdata
Free PostgreSQL Training https://www.enterprisedb.com/free-postgres-training

Related References

The Business Industries Using PostgreSQL

PostgreSQL is an open-source database, which was released in 1996. So, PostgreSQL has been around a long time.   So, among the many companies and industries which know they are using PostgreSQL, many others are using PostgreSQL and don’t know it because it is embedded as the foundation in some other application’s software architecture.  

I hadn’t paid much attaint to PostgreSQL even though it as been on the list leading databases used by business for years.  Mostly I have been focused on the databases my customer were using (Oracle, DB2, Microsoft SQL Server, and MySQL/MariaDB).  However, during a recent meeting I was surprised to learn that io had been using and administering PostrgresSQL embedded as part of another software vendors application, which made me take the time to pay attention to PostgreSQL. Especially, who is using PostgreSQL and what opportunities that may provide for evolving my career? 

The Industries Using PostgreSQL

According to enlyft, the major using the PostgreSQL are Computer Software and Information Technology And services companies.  

  PostgreSQL Consumers Information

Here is the  link to enlyft page, which provides additional information companies and industries using PostgreSQL:

Related References

The Most Important Databases In Business

If you are looking to learn a Structured Query Language (SQL) skills and have marketable database knowledge, then it is time to talk about which are the most important business databases.

Why Learn The Most Important Databases Used By Business Are?

Well, I assume that you are reading this article with an eye toward improving your employment prospects.   If improving your employment prospects is what you are after, then why waste time learning databases skills, learning advanced SQL language chrematistics of a database, or get after a certification, which will have a small market?  The most important or most used databases, if you like, to business will likely be where the most work opportunities will be no matter whether you plan to be a company employee, independent contractor, consultant or like me have moved through all of these at one time or another.

Focus On Market Opportunity

The top five databases have a solid 85% of the market, so, this allows a person to build their skill is a database which will have employment opportunities.  While there are non-business users of databases of other databases not discussed here, most of us will, thorough out our careers, be working in work for or with business to earn our living.  

The Top Five Database Most Popular In Businesses

Now to the heart of this article, in 2019 the top five databases used in a business are listed in order of most popular to least popular are:

Database Name Percent of market
Oracle 25.31%
MySQL 24.04%
Microsoft SQL Server 20.39%
PostgreSQL 9.06%
MongoDB 7.71%

Related References

OLTP vs Data Warehousing

OLTP Versus Data Warehousing

I’ve tried to explain the difference between OLTP systems and a Data Warehouse to my managers many times, as I’ve worked at a hospital as a Data Warehouse Manager/data analyst for many years. Why was the list that came from the operational applications different than the one that came from the Data Warehouse? Why couldn’t I just get a list of patients that were laying in the hospital right now from the Data Warehouse? So I explained, and explained again, and explained to another manager, and another. You get the picture.
In this article I will explain this very same thing to you. So you know  how to explain this to your manager. Or, if you are a manager, you might understand what your data analyst can and cannot give you.

OLTP

OLTP stands for OLine Transactional Processing. With other words: getting your data directly from the operational systems to make reports. An operational system is a system that is used for the day to day processes.
For example: When a patient checks in, his or her information gets entered into a Patient Information System. The doctor put scheduled tests, a diagnoses and a treatment plan in there as well. Doctors, nurses and other people working with patients use this system on a daily basis to enter and get detailed information on their patients.
The way the data is stored within operational systems is so the data can be used efficiently by the people working directly on the product, or with the patient in this case.

Data Warehousing

A Data Warehouse is a big database that fills itself with data from operational systems. It is used solely for reporting and analytical purposes. No one uses this data for day to day operations. The beauty of a Data Warehouse is, among others, that you can combine the data from the different operational systems. You can actually combine the number of patients in a department with the number of nurses for example. You can see how far a doctor is behind schedule and find the cause of that by looking at the patients. Does he run late with elderly patients? Is there a particular diagnoses that takes more time? Or does he just oversleep a lot? You can use this information to look at the past, see trends, so you can plan for the future.

The difference between OLTP and Data Warehousing

This is how a Data Warehouse works:

The data gets entered into the operational systems. Then the ETL processes Extract this data from these systems, Transforms the data so it will fit neatly into the Data Warehouse, and then Loads it into the Data Warehouse. After that reports are formed with a reporting tool, from the data that lies in the Data Warehouse.

This is how OLTP works:

Reports are directly made from the data inside the database of the operational systems. Some operational systems come with their own reporting tool, but you can always use a standalone reporting tool to make reports form the operational databases.

Pro’s and Con’s

Data Warehousing

Pro’s:

  • There is no strain on the operational systems during business hours
    • As you can schedule the ETL processes to run during the hours the least amount of people are using the operational system, you won’t disturb the operational processes. And when you need to run a large query, the operational systems won’t be affected, as you are working directly on the Data Warehouse database.
  • Data from different systems can be combined
    • It is possible to combine finance and productivity data for example. As the ETL process transforms the data so it can be combined.
  • Data is optimized for making queries and reports
    • You use different data in reports than you use on a day to day base. A Data Warehouse is built for this. For instance: most Data Warehouses have a separate date table where the weekday, day, month and year is saved. You can make a query to derive the weekday from a date, but that takes processing time. By using a separate table like this you’ll save time and decrease the strain on the database.
  • Data is saved longer than in the source systems
    • The source systems need to have their old records deleted when they are no longer used in the day to day operations. So they get deleted to gain performance.

Con’s:

  • You always look at the past
    • A Data Warehouse is updated once a night, or even just once a week. That means that you never have the latest data. Staying with the hospital example: you never knew how many patients are in the hospital are right now. Or what surgeon didn’t show up on time this morning.
  • You don’t have all the data
    • A Data Warehouse is built for discovering trends, showing the big picture. The little details, the ones not used in trends, get discarded during the ETL process.
  • Data isn’t the same as the data in the source systems
    • Because the data is older than those of the source systems it will always be a little different. But also because of the Transformation step in the ETL process, data will be a little different. It doesn’t mean one or the other is wrong. It’s just a different way of looking at the data. For example: the Data Warehouse at the hospital excluded all transactions that were marked as cancelled. If you try to get the same reports from both systems, and don’t exclude the cancelled transactions in the source system, you’ll get different results.

online transactional processing (OLTP)

Pro’s

  • You get real time data
    • If someone is entering a new record now, you’ll see it right away in your report. No delays.
  • You’ve got all the details
    • You have access to all the details that the employees have entered into the system. No grouping, no skipping records, just all the raw data that’s available.

Con’s

  • You are putting strain on an application during business hours.
    • When you are making a large query, you can take processing space that would otherwise be available to the people that need to work with this system for their day to day operations. And if you make an error, by for instance forgetting to put a date filter on your query, you could even bring the system down so no one can use it anymore.
  • You can’t compare the data with data from other sources.
    • Even when the systems are similar. Like an HR system and a payroll system that use each other to work. Data is always going to be different because it is granulated on a different level, or not all data is relevant for both systems.
  • You don’t have access to old data
    • To keep the applications at peak performance, old data, that’s irrelevant to day to day operations is deleted.
  • Data is optimized to suit day to day operations
    • And not for report making. This means you’ll have to get creative with your queries to get the data you need.

So what method should you use?

That all depends on what you need at that moment. If you need detailed information about things that are happening now, you should use OLTP.
If you are looking for trends, or insights on a higher level, you should use a Data Warehouse.

 Related References

Databases – Database Isolation Level Cross Reference

Database Type Isolation Levels Cross Reference
Database And Tables

 

Here is a table quick reference of some common database and/or connection types, which use connection level isolation and the equivalent isolation levels. This quick reference may prove useful as a job aid reference, when working with and making decisions about isolation level usage.

Database isolation levels

Data sources

Most restrictive isolation level

More restrictive isolation level

Less restrictive isolation level

Least restrictive isolation level

Amazon SimpleDB

Serializable Repeatable read Read committed Read Uncommitted

dashDB

Repeatable read Read stability Cursor stability Uncommitted read

DB2® family of products

Repeatable read Read stability* Cursor stability Uncommitted read

Informix®

Repeatable read Repeatable read Cursor stability Dirty read

JDBC

Serializable Repeatable read Read committed Read Uncommitted

MariaDB

Serializable Repeatable read Read committed Read Uncommitted

Microsoft SQL Server

Serializable Repeatable read Read committed Read Uncommitted

MySQL

Serializable Repeatable read Read committed Read Uncommitted

ODBC

Serializable Repeatable read Read committed Read Uncommitted

Oracle

Serializable Serializable Read committed Read committed

PostgreSQL

Serializable Repeatable read Read committed Read committed

Sybase

Level 3 Level 3 Level 1 Level 0

 

Related References

 

Structured Query Language (SQL) Tuning

Tuning SQL is one of those skills, which is part art and part science.  However, there are a few fundamental approaches, which can help ensure optimal SQL select statement performance.

Structuring your SQL

By Structuring SQL Statements, much performance can be gained through good SQL statement organization and sound logic.

Where Clause Concepts:

Use criteria ordering and Set Theory thinking.  SQL  can be coupled with set-theory to aid conception of the operations being conducted. Order your selection criteria to execute criteria which arrives at the smallest possible row set first. Doing so reduces the volume of rows to be processed by follow-on operations. This does require an understanding of the data relationships to be effective.

Join Rules (equijoins, etc.)

When  constructing your joins, consider these rules:

  • Join on keys and indexed columns: The efficiency of your program improves when tables are joined based on indexed columns, rather than on non-indexed ones.
  • Use equijoins (=), whenever possible
  • Avoid using of sub-queries
  • Re-write EXISTS and NOT EXISTS subqueries as outer joins
  • Avoid OUTER Joins on fields containing nulls
  • Avoid RIGHT OUTER JOINS: Always select FROM your primary table (or derived table) and LEFT OUTER JOIN to auxiliary tables.
  • Use Joins Instead of Subqueries: A join can be more efficient than a correlated subquery or a subquery using IN. Use caution when specifying ORDER BY with a Join: When the results of a join must be sorted, limiting the ORDER BY to columns of a single table can cause the database to avoid a sort.
  • Provide Adequate Search Criteria: When possible, provide additional search criteria in the WHERE clause for every table in a join. These criteria are in addition to the join criteria, which are mandatory to avoid Cartesian products

Order of Operations SQL & “PEMDAS”

To improve your SQL, careful attention needs to be paid to the mathematical order of operations; especially, parentheses since they not only set the order of operation but also the boundaries of each subset operation.

  • PEMAS is “Parentheses, Exponents, Multiplication and Division, and Addition and Subtraction”.
  • Use parentheses () to group and specify the order of execution. SQL observes the normal rules of arithmetic operator precedence.
PrecedenceOperator(s)Operation(s)Notes
1( )ParenthesesIf the parentheses are nested, the expression in the innermost pair is evaluated first. If there are several un-nested parentheses, then parentheses are evaluated left to right.
2*
/
%
Multiplication
Division
Modulus
If there are several, evaluation is left to right.
3+
Addition
Subtraction
If there are several, evaluation is left to right.

Index Leveraging (criteria ordering, hints, append, etc.)

  • Avoid Full Table Scans: within the scope of a SQL statement, there are many conditions that will cause the SQL optimizer to invoke a full-table scan.  Avoid Queries:
  • with NULL Conditions (Is NUll, Is Not NUll)
  • Against Unindexed Columns
  • with Like Conditions
  • with Not Equals Condition (<>, !=, not in)
  • with use built-in Function (to_char, substr, decode, UPPER)
  • Use UNION ALL instead of UNION if business rules allow
  • UNION: Specifies that multiple result sets are to be combined and returned as a single result set. Query optimizer performs extra work to return to avoid duplicate rows.
  • UNION ALL: Incorporates all rows into the results. This includes duplicates. Query optimizer just needs to concatenate the result sets with no extra work
  • Use stored procedures instead of ad hoc queries when possible. Stored procedures are precompiled and cached
  • Avoid cursor use when possible
  • Select only the rows needed
  • Use NOLOCK hint in the select statement to avoid blocking
  • Commit transactions in smaller batches
  • Whenever possible use tables instead of views
  • Make sure comparison columns whether using JOIN or WHERE clause are exactly the same data type. For example, if we are comparing Varchar column to nchar columns the query optimizer has to do a CONVERT before comparing the values

Note: You do not necessarily need to remove all full table scans from your query’s execution plan. Tables with few rows, few columns, or thin columns may fit into few database blocks. In this case, a full table scan will always be the most efficient access