XAP

Data Query Language (DQL) Guidelines

Guidelines for Structuring a Data Query using the JDBC V3

The purpose of these tips is to enable fetching from each partition a minimal number of objects and filtering out what we can at the partition level.

If one of two table being joined is very small, instead of distributing it between partitions, we can have a copy of it in each partition enabling us to cut the data that is brought in from the larger table.

  • Have an understanding of the service will be asking the query against and determine if more conditions should be added per table.

  • Preparing an execution plan takes time. By using a prepared statement instead of hard-coded parameters we can reuse the same plan as it is kept in the cache.

  • Use Explain Analyze for debugging/troubleshooting to understand the execution flow and expected time-line. This will also help to improve your query.

  • Make sure each table's conditions are included in the select query that goes with it. This way, each table will have its conditions stated clearly, helping to get only the necessary records.

  • Avoid a full scan of tables by adding indexes which allow faster access. Choose indexes per each table according to condition.

  • When conditions are related to 2 tables as t1.f1 > t2.f3 add the condition to the join: ON ti.f1=t2.f2 and t1.f1 > t2.f3

  • Consider using broadcast tables for small tables. Also, use colocated filters and colocated joins where suitable. To allow colocated joins with small tables. A colocated join will copy the smaller table in join to all partitions - consider if the small table of the join is indeed small. In the future, routingClosed The mechanism that is in charge of routing the objects into and out of the corresponding partitions. The routing is based on a designated attribute inside the objects that are written to the Space, called the Routing Index. properties will also be used to optimize joins.

  • Imagine you have large amount of data that is split into several smaller amounts, called partitions. Now, let's say you want to combine information from two different tables, but both of these tables are also big and split into partitions.

    Normally, when you're combining data from two tables, you can filter out the records you don't need based on certain conditions. But if these tables are both split into partitions and spread out, it becomes tricky. You can't directly filter out records from each partition based on the conditions you want to apply for the join. Instead, you have to bring all the data from these partitions to your client and then do the filtering there. It's like gathering all the puzzle pieces onto your table before you can start putting the puzzle together.

 

The following table shows how to enhance basic database queries using a GigaSpaces query using JDBCClosed Java DataBase Connectivity. This is an application programming interface (API) for the Java programming language, which defines how a client may access a database. V3:

Database Query GigaSpaces Query

Select t1.f1, t1,f2, t2.f3,t2.f4,t2.f5

from table1 t1

Join table2 t2

On t1.f1=t2.f1

Where t1.f2=X and t2.f3<Y

Select t1.f1, t1,f2, t2.f3,t2.f4 ,t2.f5 From (Select f1, f2 from table1 where f2=?) AS t1

Join (Select f1,f2,f3 from table2 where f3<?) t2

On t1.f1=t2.f1

 

In the case where the tables are large and condition placed upon table 1 cut out most of the rows

 

 

 

 

 

 

 

 

  1. Select f1 from t1 where f2=? -> list of f1

  2. Select t1.f1, t1,f2, t2.f3,t2.f4 ,t2.f5

From (Select f1, f2 from table1 where f2=?) AS t1

Join (Select f1,f2,f3 from table2 where f1 in (list of f1) and f3<?) t2

On t1.f1=t2.f1

The 'in' query should be replaced with a constant number of ? which is the max number expected. It should also be set with values from query 1 and a placeholder such as (1,2,3,0,0,0)