18 March 2024

Application

This section lists helpful recommendations for tuning your application when using GigaSpaces to boost its performance, and improving its scalability.

Better Data Model
Use readById
Use Paging
Use Delta Update
Design Your Space Where GigaSpaces data is stored. It is the logical cache that holds data objects in memory and might also hold them in layered in tiering. Data is hosted from multiple SoRs, consolidated as a unified data model. Class
Make proper use of Indexes
Use Asynchronous Operation
Use Delta Read
Co-locate Data and Business Logic
Intelligent Partitioning
Blocking Take and Thread Consumption
Use Batch Operations
Use Transactions Cautiously
Query Optimizations
Use an Embedded Space if Possible
Distribute Data and User Requests among Several Partitions
Memory Usage Considerations
Using prepareTemplate for Efficient Query Execution
Determine Cache Size
Determine Database Connection Pools
Benchmarking Your Tuning

Better Data Model

Consider embedded relationship model when possible instead of having separate space objects.

Use readById

This will will avoid query the data. Will provide access directly to the data without any broadcast calls to the entire cluster.

Use Paging

Use the IteratorBuilder when accessing large amount of objects.

Use Delta Update

Consider the Change operation when updating space objects.

Design Your Space Class

Pay attention to the size of your space class – do you really need all this information in the space? The bigger your space objects, the longer it takes to move them around, store them to disk, and fetch them back. Consider replacing a heavyweight blob field with a simple string URL, and use it later for fetching on demand. Contact GigaSpaces support for an example of this pattern. If you are using user-defined classes for your Space Class fields, try efficiently implementing java.io.Externalizable with these classes. This will reduce the amount of data transferred over the network, saving both time and memory space. Use binary Serialization with large collections.

Make Proper Use of Indexes

GigaSpaces includes a sophisticated built-in real-time indexing engine (regardless whether the space is persistent or not) that maintains a hash and btree like indexes for each indexed Space Class attribute. If you store a large number of Space objects from the same class type in the space, consider defining one or more indexes for attributes used with template matching or SQL Query. Defining indexes will improve the read/take/readMultiple/takeMultiple/clear/count operations response time significantly. Remember, indexes impact write and take operations response time, so choose your indexed fields carefully - each index has an overhead. GigaSpaces support index for equality , comparison (bigger/less than) queries and support Regular Index for a specific field and a Compound Index for multiple fields. Indexes can be defined for space class root level object or for a nested field allowing you to query different type of objects ("join") using the same query without any performance penalty. For bigger/less than/between queries use the Extended index.

Use Asynchronous Operation

Consider using asyncChange , ONE WAY write modifier , etc when available.

Use Delta Read

Consider using query projections to retrieve only the specific portions needed when reading space objects.

Co-locate Data and Business Logic

Implement Task / Distributed Task to be used with the execute operation or use colocated notify/polling container to move processing business logic to the data side. This will avoid serialization and network usage.

Intelligent Partitioning

Partition data based on the business logic and not based on some unique value. This will allow the collocated logic to access its data without any network calls. If needed , run a local cache/local view to store "reference data" within each PU This is the unit of packaging and deployment in the GigaSpaces Data Grid, and is essentially the main GigaSpaces service. The Processing Unit (PU) itself is typically deployed onto the Service Grid. When a Processing Unit is deployed, a Processing Unit instance is the actual runtime entity. instance together with the transnational data.

Blocking Take and Thread Consumption

When performing blocking operations – read or take with timeout >0, it is recommended to set the operation timeout for short duration (5-30 seconds), and not to FOREVER. This allows the space's internal thread pool to balance the different requests without exhausting all pending operations thread pool.

Use Batch Operations

Batch operations (writeMultiple, readMultiple, takeMultiple, clear, change) perform actions on groups of space objects in one call. Instead of paying a penalty for every space object (remote call, database access, ...) you pay it only once. Try to design your hot spots around batch operations - this can drastically improve your application performance, up to ten to fifty times faster.

Use Transactions Cautiously

Each transaction has an overhead. Do not use read under a transaction if you do not have a very good reason to do so. Use non-transactional read instead. This reduces database access for persistent spaces and eliminates transaction locks. If you really need to do some operations inside a transaction, use batch operations with transactions.

Query Optimizations

When using the or logical operation together with and operations as part of your query (JDBC Java DataBase Connectivity. This is an application programming interface (API) for the Java programming language, which defines how a client may access a database. , JavaSpaces with SQLQuery) you can speed up the query execution by having the and conditions added to each or condition. For example:

select uid,* from table where (A = 'X' or A = 'Y') and (B > '2000-10-1' and B < '2003-11-1')

would be executed much faster when changing it to be:

select uid,* from table where (A = 'X' and B > '2000-10-1' and B < '2003-11-1')
or (A = 'Y' and B > '2000-10-1' and B < '2003-11-1')

Use an Embedded Space if Possible

If you access the space from a single JVM Java Virtual Machine. A virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode., or access it a large number of times from one JVM, use the embedded space mode. This eliminates the overhead of remote calls to the space. The slower your network compared to other resources (for example, a disk), the greater will be the noticeable improvement.

Distribute Data and User Requests among Several Partitions

A single machine is always limited in the amount of data and user requests it can handle. You can use Data-Partitioning to distribute the data and the calculation co-located with each partition. In more advanced scenarios you should use the Master-Worker pattern to distribute the data and the calculation in a different ratio.

Memory Usage Considerations

Here are several guidelines to reduce the client and space server memory footprint:

In order to reduce memory consumption, you can store multiple long/integer space object attribute values as part of a long/integer array. If you have lots of space objects this will improve the server footprint.
Use indexes only for attributes used for matching. Make sure your space uses the -1 value for the space implicit indexing property. This will ensure that indexes will be created upon request only.
Make sure the statistics filter is turned off.
Make sure all space workers are turned off.
Encapsulates all non-indexed field into an inner custom class and have all primitive class (Integer,Long,..) fields as part of the inner class with primitive types (int, long).
Replace string space object fields with a custom implementation, which only supports basic ascii subset (backed with byte).
Replaced string fields with a small number of possible (source for instance) values with enum.
Indexes footprint can be reduced by directing the system to work in Economy mode. The downside of working in Economy mode is a performance penalty of up to 15% in embedded space operations. In order to work in Economy mode set space-config.engine.use_economy_hashmap=true.

Using prepareTemplate for Efficient Query Execution

The prepareTemplate method creates a precompiled SQL template and stores it in a preparedTemplate object. A prepared template is the result object of the GigaSpace.prepareTemplate. call. The returned result includes a GigaSpaces internal representation of the template object that does not need to undergo any inspection before it is sent to the GigaSpaces server.

The template returns an object you can use for subsequent matching.

Determine Cache Size

When using persistent space and reusing data, you must take caching into account. The cache manager caches space objects for use and performs an LRU Last Recently Used. This is a common caching strategy. It defines the policy to evict elements from the cache to make room for new elements when the cache is full, meaning it discards the least recently used items first. (Least Recent Use) based cleanup on the cache. When searching for a space object , the cache is searched first. Set the cache size to the number of Space objects that your environment can reasonably contain as resident in the VM heap size. This will prevent unnecessary queries on your database. If you want the cache size to be based on the JVM running the space you may use the memory usage options.

Determine Database Connection Pools

When using persistent space and a large number of users/threads access the space concurrently, each of them requires a database connection. Set enough connections in the connection pool so that users won't be blocked. You should calculate the number of concurrent requests the space needs to handle based on the number of users that will access the space simultaneously.

Benchmarking Your Tuning

The Benchmark View provides a user interface for benchmarking the space.

For more details, refer to: