20 May 2024

Sample Queries using Explain Analyze and Explain Plan

Reference

Before reading this page, please familiarize yourself with the Guidelines for Structuring a Data Query using the JDBC V3.

Overview

In the PostgreSQL Explain Analyze This is a profiling tool for your queries that will show you where the time is spent on your query and why. The ANALYZE option causes the statement to be actually executed, not only planned. The statistics listed depend on the query type and the execution engine used. It provides provides execution statistics such as the execution times and row counts. and Explain Plan This is a quick, simple lightweight SQL tool that shows you a compiled plan in tabular form without executing it. It is a tool or function provided by the DBMS (Database Management System) that makes an execution plan visible. This will show index usage and how the scans of the exection will appear. examples provided on this page, the following terminology is utilized:

Terminology	Usage
INDEX Scan	This technique involves scanning data based on a particular index. If a condition is present and the corresponding property is indexed, this scanning method should be employed.
FULL Scan	This approach entails a comprehensive scan of the entire related table, If there is a filter fetching only some of the table data, considers adding a related index to improve performance. Executing Explain Analyze provides details of the time it takes to retrieve records from the table and how many records are fetched in that time. For example: (time=6ms rows=514). Efforts should be made to reduce query execution time by leveraging indexes. Additionally, optimize the query's table definition to minimize the number of rows returned, ensuring all filtering conditions are properly applied. Refer to Data Query Guidelines for additional tips.
Broadcast Table (collocated joins)	When dealing with a small table intended for joining with others, it is advisable to define it as a broadcast table. In this scenario, rather than transferring all the data to the client and executing the join there, the join operation occurs within each partition, minimizing unnecessary data movement. Refer to Broadcast Objects for additional information.

Terminology

Usage

INDEX Scan

This technique involves scanning data based on a particular index. If a condition is present and the corresponding property is indexed, this scanning method should be employed.

FULL Scan

This approach entails a comprehensive scan of the entire related table, If there is a filter fetching only some of the table data, considers adding a related index to improve performance.

Executing Explain Analyze provides details of the time it takes to retrieve records from the table and how many records are fetched in that time. For example: (time=6ms rows=514).
Efforts should be made to reduce query execution time by leveraging indexes. Additionally, optimize the query's table definition to minimize the number of rows returned, ensuring all filtering conditions are properly applied. Refer to Data Query Guidelines for additional tips.

Broadcast Table (collocated joins)

When dealing with a small table intended for joining with others, it is advisable to define it as a broadcast table. In this scenario, rather than transferring all the data to the client and executing the join there, the join operation occurs within each partition, minimizing unnecessary data movement. Refer to Broadcast Objects for additional information.

To streamline data processing and boost query performance, adhere to these principles: maintain a lean row count in your tables and leverage indexes effectively for filtering data

If you have multiple conditions on a table, the index chosen will be the one with the minimal size between basic indexes. Refer to Indexing Overview for additional information.

Sample Queries

Below are several examples of queries using PostgreSQL Explain Analyze and Explain Plan. Each query is followed by its analysis.

Query 1:

This query retrieves all Customers who have the lastName = 'Levi' and displays the result in descending order by firstName field, NULLS LAST for order tells us that if firstName will be null, such values will be shown last in response. As far as there is EQUALS index on lastName field, index is used for getting data.

explain analyze for select * from  "com.gs.Customer" where lastName = Levi order by firsName DESC NULLS LAST

PLAN (Explain Analyze) Analysis:
order by [firsName DESC-nulls-last]    (time=1ms rows=2) 
	table= com.gs.Customer    (time=1ms rows=2)
		Server-Side Execution:
			projections= [birthday, firsName, id, lastName]
			filter= (lastName = Levi)
			access= INDEX SCAN
				partitions: [1, 2]
					selected index:
						- (lastName = Levi), size=1

Query 2:

This Explain Analyze shows that there will be two FULL SCANS on tables Product and Purchase and all result sets from both will be taken to the client side and Equi JOIN (hash join algorithm will be used for match comparison) will be performed on the client side.

explain analyze for select P.id as product_id, P.name, P.price, sum(PU This is the unit of packaging and deployment in the GigaSpaces Data Grid, and is essentially the main GigaSpaces service. The Processing Unit (PU) itself is typically deployed onto the Service Grid. When a Processing Unit is deployed, a Processing Unit instance is the actual runtime entity..amount) as purchase_amount, sum(PU.amount)*P.price as total from "com.gs.Purchase" 
as PU LEFT JOIN "com.gs.Product" as P ON PU.productId=P.id group by P.id,P.name,P.price

PLAN (Explain Analyze) Analysis:
projections= [id, name, price, purchase_amount, TIMES(purchase_amount, price)]    (time=8ms rows=10)
	aggregate by [SUM(amount) AS purchase_amount]    (time=8ms rows=10)
	group by [id, name, price]
	join (type= LEFT, algorithm= Hash join, Equi Join)    (time=7ms rows=514)
	  │ conditions= EQUALS(productId, id)
	  │ dynamic collocated join disabled (candidate=com.gs.Product)
	  	└─┌projections= [amount, productId]    (time=6ms rows=514)
	      │  table= com.gs.Purchase
	      │    Server-Side Execution:
	      │      projections= [amount, productId]
	      │      access= FULL SCAN
	      └projections= [id, name, price]    (time=1ms rows=10)
	      table= com.gs.Product
		  Server-Side Execution:
		  projections= [id, name, price]
		  access= FULL SCAN

Query 3:

In this Explain Analyze, the Department table is a broadcast table. This means that every partition in our distributed system has a full copy of the Department table. The query fetches products whose prices are above a certain threshold and retrieves the related department details. If the expensive products were only in a few departments, we could fetch only the relevant department details. Additionally, if we add a condition on the department, the server can filter the products, reducing the amount of data sent to the client. If the Department table was not broadcast, we couldn't do this server-side filtering and would have to send all the data to the client and filter it there.

explain analyze for select  D.name as Department, P.name,  P.price from (Select name,price,depId  from "com.gs.Product"  where price > 100 ) 
AS P  Join "com.gs.Department" D  ON P.depId=D.id
				
PLAN (Explain Analyze) Analysis:
join (type= INNER, algorithm= Hash join, Equi Join)    (time=11ms rows=6)
 │ projections= [name0, name, price]
 │ conditions= EQUALS(depId, id)
 │ static collocated join (broadcast={com.gs.Department})
 │ dynamic collocated join disabled (candidate=com.gs.Product)
 └─┌projections= [name, price, depId]    (time=9ms rows=6)
   │  table= com.gs.Product
   │    Server-Side Execution:
   │      projections= [name, price, depId]
   │      filter= (price > 100.0)
   │      access= INDEX SCAN
   │        partitions: [1, 2]
   │          selected index:
   │            - (price > 100.0), size=unknown
   └table= com.gs.Department    (time=0ms rows=4)
		Server-Side Execution:
		projections= [id, name]
		access= FULL SCAN

Query 4:

This plan shows us, that there is INDEX scan for the Product table by using the price field, that should have ORDERED index (otherwise index would not be used). This INDEX scan was executed for 1 ms and respond with 6 rows to the client side. There was FULL SCAN for table Department, it was executed for 1 ms and respond with 4 rows to the client side. And then JOIN operation was performed using hash join algorithm for comparing the id field from Department with depId field from Product. This operation took 1 ms and in response 2 rows were retrieved.

explain analyze for select D.id, D.name  from "com.gs.Department" as D  WHERE EXISTS (select P.name  from "com.gs.Product" as 
P where  P.depId=D.id and P.price > 100?)
					
PLAN (Explain Analyze) Analysis:
join (type= SEMI, algorithm= Hash join, Equi Join)    (time=1ms rows=2)
 │ conditions= EQUALS(id, depId)
 └─┌table= com.gs.Department    (time=0ms rows=4)
   │  Server-Side Execution:
   │    projections= [id, name]
   │    access= FULL SCAN
   └table= com.gs.Product    (time=1ms rows=6)
		Server-Side Execution:
		projections= [depId, id, name, price]
		filter= (price > 100.0)
		access= INDEX SCAN
		partitions: [1, 2]
		selected index:
		- (price > 100.0), size=unknown

Query 1:

explain plan for select * from  "com.gs.Customer" where lastName = ? order by firsName DESC NULLS LAST

PLAN (Explain Plan) Analysis:
order by [firsName DESC-nulls-last]
	table= com.gs.Customer
		Server-Side Execution:
			projections= [birthday, firsName, id, lastName]
			filter= (lastName = Levi)
			access= INDEX SCAN
				partitions: [1, 2]
					selected index:
					- (lastName = Levi), size=1

Query 2:

explain plan for select P.id as product_id, P.name, P.price, sum(P.amount) as purchase_amount, sum(PU.amount)*P.price as total from "com.gs.Purchase" as 
PU  LEFT JOIN "com.gs.Product" as P ON P.productId=P.id group by P.id,P.name,P.price

PLAN (Explain Plan) Analysis:
projections= [id, name, price, purchase_amount, TIMES(purchase_amount, price)]
	aggregate by [SUM(amount) AS purchase_amount]
	group by [id, name, price]
		join (type= LEFT, algorithm= Hash join, Equi Join)
		  │ conditions= EQUALS(productId, id)
		  │ dynamic collocated join disabled (candidate=com.gs.Product)
		  └─┌projections= [amount, productId]
			│  table= com.gs.Purchase
			│    Server-Side Execution:
			│      projections= [amount, productId]
			│      access= FULL SCAN
			└projections= [id, name, price]
			  table= com.gs.Product
			  Server-Side Execution:
			  projections= [id, name, price]
		  	  access= FULL SCAN

Query 3:

explain plan for select  D.name as Department, P.name,  P.price from (Select name,price,depId  from "com.gs.Product"  where price > ? ) AS P  Join "com.gs.Department" D  ON P.depId=D.id
						
PLAN (Explain Plan) Analysis:
join (type= INNER, algorithm= Hash join, Equi Join)
  │ projections= [name0, name, price]
  │ conditions= EQUALS(depId, id)
  │ static collocated join (broadcast={com.gs.Department})
  │ dynamic collocated join disabled (candidate=com.gs.Product)
  └─┌projections= [name, price, depId]
    │  table= com.gs.Product
	│    Server-Side Execution:
	│      projections= [name, price, depId]
	│      filter= (price > 100.0)
	│      access= INDEX SCAN
	│        partitions: [1, 2]
	│          selected index:
	│            - (price > 100.0), size=unknown
	└table= com.gs.Department
		Server-Side Execution:
		projections= [id, name]
		access= FULL SCAN

Query 4:

Select D.id, D.name  from "com.gs.Department" as D  WHERE EXISTS (select P.name  from "com.gs.Product" as P where  P.depId=D.id and P.price > ?)

PLAN (Explain Plan) Analysis:
join (type= SEMI, algorithm= Hash join, Equi Join)
  │ conditions= EQUALS(id, depId)
  └─┌table= com.gs.Department
	│  Server-Side Execution:
	│    projections= [id, name]
	│    access= FULL SCAN
	└table= com.gs.Product
		Server-Side Execution:
		projections= [depId, id, name, price]
		filter= (price > 100.0)
		access= INDEX SCAN
		partitions: [1, 2]
		selected index:
		- (price > 100.0), size=unknown