22 July 2024

Tiered Storage Configuration for Service Grid

For a Tiered Storage Automatically assigns data to different categories of storage types based on considerations of cost, performance, availability, and recovery. overview refer to Intelligent Tiering Overview.

Initial Load

The initial loading of data into each tier can be a time-consuming process, starting with the System of Record or other external data sources. Additionally, data in both tiers can be updated during the normal course of operations during the day.

During the normal use of Tiered Storage, priority data is stored in the Hot tier, and other data (except transient) is stored in the Warm tier Part of GigaSpaces Tiered Storage Mechanism. Warm tier is SSD (recommended media) storage which is mostly used for data that is read-only and is less frequently used.. If there is data in the Warm tier, it will be faster to recover it rather than recovering it from an external database.

Initial load from external data source will be performed as usual if the local Database (DB) is empty (empty = no metadata).
If the local DB is not empty, the data of the primary will be loaded from the local disk and backup will perform recovery from primary (delete disk data in init).
We use ZooKeeper (ZK) persistent to store the last primary of each partition and where each instance should be located, so election will be performed as before.
pu This is the unit of packaging and deployment in the GigaSpaces Data Grid, and is essentially the main GigaSpaces service. The Processing Unit (PU) itself is typically deployed onto the Service Grid. When a Processing Unit is deployed, a Processing Unit instance is the actual runtime entity..autogenerated-instance-sla="true" to allow auto generation of sla per instance.
In the case of deploy/undeploy with a different number of partitions, an API needs to be called to clean the internal database and configuration, otherwise the initial load will fail.

Using data persistency, data in both tiers is stored in a system repository. When the data is undeployed, the user has an option to persist the data. Subsequent redeployment of the data is performed at optimum speed.

Log: Source of Data

In the primary Space Where GigaSpaces data is stored. It is the logical cache that holds data objects in memory and might also hold them in layered in tiering. Data is hosted from multiple SoRs, consolidated as a unified data model. log:

Recovery from tier layer

Data source recovery

Entries found in warm tier:

Entries inserted to hot tier:

Total Time:

This data can be compared with primary log of the initial run Recovery from external data source:

Entries found in data source

Entries inserted to Space:

Entries ignored:

Total Time:

Transient Objects

Transient objects are not kept in SQLite but can still be mirrored to an external source.
- In this case, after undeploy+deploy, the transient objects will not be in the Space.

If it is required to load from an external source, delete pu metadata & data.

Tiered Storage Configuration for Space Types

Registering a Space Type with Tiered Storage Configuration

In the Java example below, take note of the highlighted section.

SpaceTypeDescriptor typeDescriptorDoc1 = 
   new SpaceTypeDescriptorBuilder("SpaceTypeExample")
	.addFixedProperty("id", Integer.class)
	.addFixedProperty("field1", String.class)
	.addFixedProperty("field2", String.class)
	.addFixedProperty("field3", String.class)
	.addFixedProperty("expireDate", LocalDateTime.class)
	.idProperty("id")
	.supportsDynamicProperties(false)
	.setTieredStorageTableConfig(
		new TieredStorageTableConfig()
			>.setName("SpaceTypeExample")// set to space type name
			>.setCriteria("field1='1'")// see examples below
	)
	.create()

Tiered Storage Policies in Type Registration

Example

Hot/Warm Tiers Distribution

new TieredStorageTableConfig() .setName("SpaceTypeExample") .setCriteria("field1='1'") Note: Supported operators: = != >= <= and or

Criteria-Based

Hot Tier Part of GigaSpaces Tiered Storage Mechanism. Hot Tier is RAM and is used for priority data and read/write operations that require the fasted access	Matching provided criteria
Warm Tier	All

new TieredStorageTableConfig() .setName("SpaceTypeExample") .setTimeColumn("expireDate") .setPeriod(Duration.ofMinutes(60))

Time-Based

Hot Tier	Matching condition expireDate+60min<now
Warm Tier	All

Note: Only the initial value of the expireDate column will define Hot Tier retention of an object.

new TieredStorageTableConfig() .setName("SpaceTypeExample") .setTransient(true)

Memory Only

Hot Tier	All
Warm Tier	None

null (*) or no call to: .setTieredStorageTableConfi

Warm Tier Only

Hot Tier	None
Warm Tier	All

new TieredStorageTableConfig() .setName("SpaceTypeExample") .setCriteria("all")

Hot and Warm

Hot Tier	All
Warm Tier	All

Underploy-Deploy

In the case of deploying using a different topology:

The Undeployed Processing Units (PUs)

The undeployed services can be viewed using the GET /pus/undeployed REST API REpresentational State Transfer. Application Programming Interface An API, or application programming interface, is a set of rules that define how applications or devices can connect to and communicate with each other. A REST API is an API that conforms to the design principles of the REST, or representational state transfer architectural style..

An example of sample output is shown below.

 [
				{
				"name": "space-10-SNAPSHOT",
				"unDeployedAt": "2021-08-03T12:18:25.437",
				"isPersistent": true,
				"gracefulShutdown": true,
				"lastPrimaries": [
				{
				"partitionId": 1,
				"insatnceId": "1_1"
				}
				],
				"spaceInstancesHosts": [
				{
				"instanceId": "1_0",
				"host": "127.0.1.1"
				},
				{
				"instanceId": "1_1",
				"host": "127.0.1.1"
				}
				],
				"schema": "partitioned",
				"numOfInstances": 1,
				"backupsPerPartition": 1
				}

Deleting Previous Data and Metadata

Safe Undeploy: Ensure all Data in Disk is Updated

Added options to undeploy: DrainMode and DrainTimeout

NONE - make no attempt to wait for all data to be drained before shutting down
REQUIRED - make an attempt to wait for all data to be drained before shutting down,
shutdown even if drain wasn't completed before drainTimeout
ATTEMPT - make an attempt to wait for all data to be drained before shutting down,
shutdown will not occur if drain wasn't completed before drainTimeout

Admin API:

pu.undeployAsync(new UndeployOptions(DrainMode.ATTEMPT,60000L));

CLI:

./gs.sh service undeploy --drain-mode=ATTEMPT --drain-timeout=60000ms test.

If drain was not completed with mode attempt, the PU will be at quiesce status.

Even if undeploy did not drain all, data is considered as consistent and will recover from disk in the case where it is not empty. The same goes for failover with async replication.

Undeploy - REST

The data of undeployed services can be deleted using the DELETE /pus/undeployed/{id} REST REpresentational State Transfer. Application Programming Interface An API, or application programming interface, is a set of rules that define how applications or devices can connect to and communicate with each other. A REST API is an API that conforms to the design principles of the REST, or representational state transfer architectural style. API.

Limitations and Recommendations

Supported	Not Supported
Broadcast Table: Recommended all (both disk and cache). At this stage, primitive property types and selected Java classes are supported in Tiered Storage. Only native value types (integer, string, double, byte, date types, array, etc.). SpaceDocument: Only fixed properties. The number of backups per partition is zero or one. Transaction Support: Local transactions are supported Local View. Kubernetes An open-source container orchestration system for automating software deployment, scaling, and management of containerized applications.. Dynamic Indexes.	Storage Optimization. Tiered Storage policies cannot be changed after Space Type registration. Inheritance. FIFO FIFO is an acronym for first in, first out, a method for organizing the manipulation of a data structure where the oldest entry, or "head" of the queue, is processed first.. Lease (for non-transient types). Cannot have both time and criteria for the same type. Compound Indexes (id is the exception). Scale In/Out.

Supported

Not Supported

Broadcast Table: Recommended all (both disk and cache).
At this stage, primitive property types and selected Java classes are supported in Tiered Storage.
Only native value types (integer, string, double, byte, date types, array, etc.).
SpaceDocument: Only fixed properties.
The number of backups per partition is zero or one.
Transaction Support: Local transactions are supported Local View.
Kubernetes An open-source container orchestration system for automating software deployment, scaling, and management of containerized applications..
Dynamic Indexes.

Storage Optimization.
Tiered Storage policies cannot be changed after Space Type registration.
Inheritance.
FIFO FIFO is an acronym for first in, first out, a method for organizing the manipulation of a data structure where the oldest entry, or "head" of the queue, is processed first..
Lease (for non-transient types). Cannot have both time and criteria for the same type. Compound Indexes (id is the exception).
Scale In/Out.