18 March 2024

Paging Support with the Space Iterator API

If a collection of entries must be returned from the Space Where GigaSpaces data is stored. It is the logical cache that holds data objects in memory and might also hold them in layered in tiering. Data is hosted from multiple SoRs, consolidated as a unified data model., this is usually carried out using one of the readMultiple overloads in GigaSpace. However, if there are a lot of matching entries, you may encounter several problems:

Memory usage - Both the server and client must allocate enough memory for the entire result set.
Latency - Because all the results are returned in a single batch, the client must wait until the final result arrives before it can process the first one.

A better approach is to create an iterator that iterates over the matching entries one at a time. Under the hood, the server returns the results in batches, and when the client's buffer is exhausted the next batch is implicitly returned from the server.

Using the Space Iterator API

Use the GigaSpaceiterator(template) method to create an iterator of all the objects in the Space that match the template (either SQLQuery or template). This results in a SpaceIterator<T>, which implements both Iterator<T> and Iterable<T>, and can be used to iterate the results. For example:

private void demoForEach(GigaSpace gigaSpace) {
    SQLQuery<MySpaceClass> query = new SQLQuery(MySpaceClass.class,"lastName = 'Smith'");
    AtomicInteger counter = new AtomicInteger();
    SpaceIterator<MySpaceClass> spaceIterator = gigaSpace.iterator(query);			
    spaceIterator.forEach((e) -> System.out.println(counter.incrementAndGet() + " " + e.getLastName()));
}

Iterator Types

The Space Iterator API has two implementations, CURSOR and PREFETCH_UIDS. In theory, each iterator hasNext()/next() invocation requires at least one remote call, which is extremely inefficient. In practice, the iterator implementation uses different techniques to improve performance. Each iterator type uses a different optimization approach, and the advantages and disadvantages of each implementation is provided below.

The CURSOR implementation is the default for all the parameters, but you can manually define PREFETCH_UIDS values for one or more of the parameters as necessary.

SpaceIteratorType.CURSOR

The CURSOR implementation has the following characteristics:

The server-side iterator returns a batch of entries instead of a single entry. That batch is used implicitly to optimize hasNext()/next(). Users can control the batch size using SpaceIteratorConfiguration.setBatchSize().
When the Space iterator is initialized, it asynchronously requests a batch from all partitions, so that:
- As soon as any partition returns a result, the iterator can start serving entries.
- After that batch is consumed, the iterator can continue serving entries from other batches that arrived in the meantime.
When the Space iterator starts consuming entries from a partition batch, it implicitly sends an asynchronous request in the background to that partition for the next batch, which further reduces the time waiting for entries.

The CURSOR implementation has a short latency until the first entry is service, and requires only a small memory footprint on the client side. Both of these are indepentent of the number of matching entries. In addition, the CURSOR implementation involves only a small workload on the Space, because a single batch is fetched from each partition at any give time.

If a primary Space fails, this implementation is not tolerant of failover, and entries will no longer be fetched from that partition. However, the Space iterator will continue to serve entries from other partitions.

SpaceIteratorType.PREFETCH_UIDS

The PREFETCH_UIDS implementation iterates over entries by first fetching all matching entries UIDs, and then fetching the actual entries in batches according to UID.

This implementation is tolerant of failover; if a primary Space fails, it can continue to fetch entries from that partition when the backup takes over.

Summary of Implementation Characteristics

When deciding which implementation is preferable for your environment, you can reference the following table to determine which Space iterator type best suits your needs.

	CURSOR	PREFETCH_UIDS
Low latency before first entry is served	Yes	No
Small memory footprint on client side	Yes	No
Light workload on Space	Yes	No
Supports primary/backup failover	No	Yes

Configuring the Space Iterator

Use the SpaceIteratorConfiguration class to modify the Space Iterator parameters, as shown in this syntax.

SpaceIterator<T> spaceIterator = gigaSpace.iterator(T template, SpaceIteratorConfiguration spaceIteratorConfiguration);

The available API parameters are described below.

Iterator Type

You can define the iterator implementation using the com.gs.iterator.type parameter. The default value is CURSOR.

Batch Size

Under the hood, the Space iterator API uses batching to fetch entries from the server. The API then serves the entries from the batch to the client on demand. The parameter is setBatchSize, and the default value is 1000.

Read Modifiers

Use the setReadModifiers parameter to configure the read isolation level for the Space iterator. The default value is REPEATABLE_READ.

For more information, see the Read Modifiers topic.

Maximum Inactivity Duration

When the client initiates the Space iterator, an iterator is created on the server side. In order to know when to close the iterator that was opened on the server side, there is a timer that measures the activity of the client-side iterator. If the client-side iterator is inactive for a specific period of time, the server-side iterator closes automatically. This parameter, setMaxInactiveDuration, has a default value of 1 minute.

The Space iterator implements the java Closeable interface. On close() invocations, the server-side iterators are also closed.

When using SpaceIteratorType.PREFETCH.UIDS, setMaxInactiveDuration is not supported.

Automatic Activity Renewal

The Space iterator on the server side is active throughout the lifetime of the iterator on the client side. When the Space iterator is initialized on the client side, a background thread starts that periodically renews the activity duration for the iterator on the server side. The period for this background task is half of the Maximum Inactivity Duration value; when using the default value for the setMaxInactiveDuration parameter, the task is triggered every 30 seconds.

Transactions

Iterating through the matched set does not lock the object. Objects that are under transaction and match the specified template aren't included as part of the matched set.

Space Iterator Configuration Examples

See the following example of how to use the SpaceIteratorConfiguration class. In this example, each parameter is configured with a value other than the default.

gigaSpace.iterator(template, 
new SpaceIteratorConfiguration()
.setBatchSize(500)
.setIteratorType(SpaceIteratorType.CURSOR)
.setReadModifiers(ReadModifiers.DIRTY_READ)
.setMaxInactiveDuration(Duration.ofMinutes(5)));

In addition, you can see how to start the Space iterator with the default configuration.

//Starting the iterator with default configuration:
gigaSpace.iterator(template);

//Only batch size
gigaSpace.iterator(template, batchSize)

//Batch size and read modifiers
gigaSpace.iterator(template, batchSize, readModifier)

Limitations

The Space iterator only supports simple SQL queries. For more information about the differences between simple and complex queries, see the SQL Query API section of the SQL Query API topic.