Sample Queries using Explain Analyze and Explain Plan

Reference

Before reading this page, please familiarize yourself with the Guidelines for Structuring a Data Query using the JDBC V3.

Overview

In the PostgreSQL Explain AnalyzeClosed and Explain PlanClosed examples provided on this page, the following terminology is utilized:

Terminology Usage
INDEX Scan This technique involves scanning data based on a particular index. If a condition is present and the corresponding property is indexed, this scanning method should be employed.
FULL Scan

This approach entails a comprehensive scan of the entire related table, If there is a filter fetching only some of the table data, considers adding a related index to improve performance.

  • Executing Explain Analyze provides details of the time it takes to retrieve records from the table and how many records are fetched in that time. For example: (time=6ms rows=514).

  • Efforts should be made to reduce query execution time by leveraging indexes. Additionally, optimize the query's table definition to minimize the number of rows returned, ensuring all filtering conditions are properly applied. Refer to Data Query Guidelines for additional tips.

Broadcast Table (collocated joins) When dealing with a small table intended for joining with others, it is advisable to define it as a broadcast table. In this scenario, rather than transferring all the data to the client and executing the join there, the join operation occurs within each partition, minimizing unnecessary data movement. Refer to Broadcast Objects for additional information.

To streamline data processing and boost query performance, adhere to these principles: maintain a lean row count in your tables and leverage indexes effectively for filtering data

If you have multiple conditions on a table, the index chosen will be the one with the minimal size between basic indexes. Refer to Indexing Overview for additional information.

Sample Queries

Below are several examples of queries using PostgreSQL Explain Analyze and Explain Plan. Each query is followed by its analysis.

Query 1:

This query retrieves all Customers who have the lastName = 'Levi' and displays the result in descending order by firstName field, NULLS LAST for order tells us that if firstName will be null, such values will be shown last in response. As far as there is EQUALS index on lastName field, index is used for getting data.

explain analyze for select * from "com.gs.Customer" where lastName = Levi order by firsName DESC NULLS LAST PLAN (Explain Analyze) Analysis: order by [firsName DESC-nulls-last] (time=1ms rows=2) table= com.gs.Customer (time=1ms rows=2) Server-Side Execution: projections= [birthday, firsName, id, lastName] filter= (lastName = Levi) access= INDEX SCAN partitions: [1, 2] selected index: - (lastName = Levi), size=1

 

Query 2:

This Explain Analyze shows that there will be two FULL SCANS on tables Product and Purchase and all result sets from both will be taken to the client side and Equi JOIN (hash join algorithm will be used for match comparison) will be performed on the client side.

explain analyze for select P.id as product_id, P.name, P.price, sum(PU This is the unit of packaging and deployment in the GigaSpaces Data Grid, and is essentially the main GigaSpaces service. The Processing Unit (PU) itself is typically deployed onto the Service Grid. When a Processing Unit is deployed, a Processing Unit instance is the actual runtime entity..amount) as purchase_amount, sum(PU.amount)*P.price as total from "com.gs.Purchase" as PU LEFT JOIN "com.gs.Product" as P ON PU.productId=P.id group by P.id,P.name,P.price PLAN (Explain Analyze) Analysis: projections= [id, name, price, purchase_amount, TIMES(purchase_amount, price)] (time=8ms rows=10) aggregate by [SUM(amount) AS purchase_amount] (time=8ms rows=10) group by [id, name, price] join (type= LEFT, algorithm= Hash join, Equi Join) (time=7ms rows=514) │ conditions= EQUALS(productId, id) │ dynamic collocated join disabled (candidate=com.gs.Product) └─┌projections= [amount, productId] (time=6ms rows=514)table= com.gs.Purchase │ Server-Side Execution: │ projections= [amount, productId] │ access= FULL SCAN └projections= [id, name, price] (time=1ms rows=10) table= com.gs.Product Server-Side Execution: projections= [id, name, price] access= FULL SCAN

Query 3:

In this Explain Analyze, the Department table is a broadcast table. This means that every partition in our distributed system has a full copy of the Department table. The query fetches products whose prices are above a certain threshold and retrieves the related department details. If the expensive products were only in a few departments, we could fetch only the relevant department details. Additionally, if we add a condition on the department, the server can filter the products, reducing the amount of data sent to the client. If the Department table was not broadcast, we couldn't do this server-side filtering and would have to send all the data to the client and filter it there.

explain analyze for select D.name as Department, P.name, P.price from (Select name,price,depId from "com.gs.Product" where price > 100 ) AS P Join "com.gs.Department" D ON P.depId=D.id PLAN (Explain Analyze) Analysis: join (type= INNER, algorithm= Hash join, Equi Join) (time=11ms rows=6) │ projections= [name0, name, price] │ conditions= EQUALS(depId, id) │ static collocated join (broadcast={com.gs.Department}) │ dynamic collocated join disabled (candidate=com.gs.Product) └─┌projections= [name, price, depId] (time=9ms rows=6)table= com.gs.Product │ Server-Side Execution: │ projections= [name, price, depId] │ filter= (price > 100.0) │ access= INDEX SCAN │ partitions: [1, 2] │ selected index: │ - (price > 100.0), size=unknown └table= com.gs.Department (time=0ms rows=4) Server-Side Execution: projections= [id, name] access= FULL SCAN

Query 4:

This plan shows us, that there is INDEX scan for the Product table by using the price field, that should have ORDERED index (otherwise index would not be used). This INDEX scan was executed for 1 ms and respond with 6 rows to the client side. There was FULL SCAN for table Department, it was executed for 1 ms and respond with 4 rows to the client side. And then JOIN operation was performed using hash join algorithm for comparing the id field from Department with depId field from Product. This operation took 1 ms and in response 2 rows were retrieved.

explain analyze for select D.id, D.name from "com.gs.Department" as D WHERE EXISTS (select P.name from "com.gs.Product" as P where P.depId=D.id and P.price > 100?) PLAN (Explain Analyze) Analysis: join (type= SEMI, algorithm= Hash join, Equi Join) (time=1ms rows=2) │ conditions= EQUALS(id, depId) └─┌table= com.gs.Department (time=0ms rows=4) │ Server-Side Execution: │ projections= [id, name] │ access= FULL SCAN └table= com.gs.Product (time=1ms rows=6) Server-Side Execution: projections= [depId, id, name, price] filter= (price > 100.0) access= INDEX SCAN partitions: [1, 2] selected index: - (price > 100.0), size=unknown