04 June 2024

JVM Settings and Tuning

A basic Space Where GigaSpaces data is stored. It is the logical cache that holds data objects in memory and might also hold them in layered in tiering. Data is hosted from multiple SoRs, consolidated as a unified data model. instance JVM Java Virtual Machine. A virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. usually holds the following in memory: Space entries, Indexes, Space class metadata, transaction, replication redolog, leases data and statistics and temporal data.

In most cases, the applications that use GigaSpaces are leveraging machines with very fast CPUs, where the amount of temporary objects created is relatively large for the JVM garbage collector to handle with its default settings. This means careful tuning of the JVM is very important to ensure stable and flawless behavior of the application.

JDK 17

GigaSpaces code is now compiled for Java 17, therefore this is the minimal required version that can be used as client.

Space Memory Footprint

It may be necessary to calculate the Space Object Footprint. For instructions on how to do this for XAP GigaSpaces eXtreme Application Platform. Provides a powerful solution for data processing, launching, and running digital services, refer to Capacity Planning.

A Compound Index can be used with AND queries to speed up the query execution time. This approach combines multiple fields into a single index. Using a Compound Index avoids having multiple indexes on multiple fields, which in turn can reduce the index footprint.

There are several types of storage optimization:

If there is still not enough room in RAM, consider using Tiered Storage. This option is not available for XAP.NET

Customized Initial Load

The default Space Data source Initial Load behavior loads all Space class data into each partition, and later filters out irrelevant objects. This is the default behavior for NONE numerical routing The mechanism that is in charge of routing the objects into and out of the corresponding partitions. The routing is based on a designated attribute inside the objects that are written to the Space, called the Routing Index. keys. For numerical routing keys, each partition will only load its related data. This activity may introduce large amount of garbage to be collected. You can use the SQL MOD query to fetch only the relevant data items to be loaded into each partition, which speeds up the initial load time and drastically reduces the amount of garbage generated during this process. Refer to Custom Initial Load Queries for additional information.

Redo Log Sizing

Redolog is a structure that stores all Space operations until the relevant replication targets acknowledge that they have received them.

There is a single Redolog for all targets that tracks the last key each target has acknowledged (ACK).
Targets can be backup, mirror, local views, durable notification and WAN GW.
The Redolog can be stored in both memory and on disk.
The difference between max capacity and max memory capacity defined is the amount of data that might be written to disk.

The amount of redo log data depends on the following:

Amount of in-flight activity
Backup performance
Primary backup connectivity (long disconnection means a lot of redo log data in memory).

The redo logs swap over to the hard disk at some point, therefore is it recommended to place its location on an SSD drive. Do not use a regular hard drive to store redo log data. The redo log data footprint is similar to the actual raw data footprint without indexes.

It is recommended that for target as a backup or local view, to keep the redo log in memory. You need to take into account the redo log sizing when you are planning what will be in the RAM.

For additional information refer to Controlling the Replication Redo Log.

JVM Basic Settings

This section provides examples of the JVM settings that are recommended for applications that generate A large number of temporary objects. In such situations, you afford long pauses due to garbage collection Garbage collection (GC) is a form of automatic memory management. The garbage collector attempts to reclain memory that was allocated by the program, but is not longer referenced; such memory is called garbage. activity. These settings are appropriate for cases where you are running a IMDG In-Memory Data Grid. A simple to deploy, highly distributed, and cost-effective solution for accelerating and scaling services and applications. It is a high throughput and low latency data fabric that minimizes access to high-latency, hard-disk-drive-based or solid-state-drive-based data storage. The application and the data co-locate in the same memory space, reducing data movement over the network and providing both data and application scalability., or when the business logic and the data grid are co-located. For example, a data grid with co-located polling/notify containers, task executors, or service remoting.

Soft References in Java 17

Unlike in previous versions of Java, in Java 17 soft references cannot be set as they are handled internally.

Optimizing Memory Usage

In Java 17, the -XX:+UseCompressedOops option is now enabled by default. It is an important feature for optimizing memory usage in Java applications, especially those with large heaps. It provides memory efficiency benefits without significant performance impact.

MaxDirectMemorySize

The -XX:MaxDirectMemorySize option in Java 17 allows you to specify the maximum amount of memory that can be allocated for direct buffers used by the java.nio package, such as ByteBuffer.allocateDirect(). This memory is allocated outside of the Java heap, often by the operating system, and is not subject to garbage collection by the JVM.

Purpose

Direct memory is used for operations that require interaction with native code or I/O operations, such as reading or writing data from files, network sockets, or using certain APIs like NIO channels. By default, the maximum direct memory size is limited by the maximum heap size (-Xmx), but -XX:MaxDirectMemorySize allows you to specify a separate limit.

Usage

Specifying Maximum Direct Memory Size: You can specify the maximum amount of direct memory in bytes, kilobytes, megabytes, or gigabytes. For example: sh java -XX:MaxDirectMemorySize=1G -jar your-application.jar

Default Value: If you don't specify -XX:MaxDirectMemorySize, the maximum direct memory size defaults to the same value as the maximum heap size (-Xmx).

Considerations

Impact on System Resources: Direct memory is allocated outside of the Java heap and is managed by the operating system. Allocating too much direct memory can impact system resources, potentially leading to out-of-memory errors or decreased performance.

Use Cases: Use direct memory for operations that require interaction with native code or I/O operations. Examples include reading or writing large files, working with network sockets, or using NIO channels.

Conclusion

The -XX:MaxDirectMemorySize option allows you to specify the maximum amount of memory that can be allocated for direct buffers in Java. It's useful for controlling the usage of direct memory, especially in scenarios involving I/O operations or interaction with native code. Be mindful of the potential impact on system resources and allocate direct memory judiciously based on your application's requirements.

Extra Memory

Extra memory is the memory required for NIO direct memory buffers, JIT code cache, classloaders, Socket Buffers (receive/send), JNI, and GC internal info. Direct memory buffer usage for Socket Buffer utilization on the GSC Grid Service Container. This provides an isolated runtime for one (or more) processing unit (PU) instance and exposes its state to the GSM. side:

com.gs.transport_protocol.lrmi.maxBufferSize X com.gs.transport_protocol.lrmi.max-threads

For example - with the default maxBufferSize size and 100 threads:

64k X 100 = 6400KB = 6.4MB

With large objects and batch operations (readMultiple, writeMultiple, Space Iterator) increasing the maxBufferSize may improve system performance.

Capturing Detailed Garbage Collection Statistics

In Java 17, the default Garbage Collector (GC) is the G1 Garbage Collector. This makes it a good starting point for most applications unless specific requirements dictate otherwise.

As with any Java application, the Garbage Collector (GC) should be fine-tuned and care should be taken to select a GC algorithm that meets requirements. It is recommended to set verbose GC and analyze GC files to optimally tune the system

For more information about garbage collection in Java-based systems, see Oracle’s Garbage Collection Tuning Guide for Java 17.

Selecting the Appropriate GC Algorithm:

To enable a specific GC algorithm, you can use JVM options when starting your Java application.

Type Options Recommendation Rationale Configuration

1. General Purpose Use G1 GC Balanced performance and pause times. Suitable for a wide range of applications. No special configuration needed as it is the default.

Type Options	Recommendation	Rationale	Configuration
1. General Purpose Use	G1 GC	Balanced performance and pause times. Suitable for a wide range of applications.	No special configuration needed as it is the default.
2, Low Latency Requirements	ZGC or Shenandoah GC	Both provide very low pause times, suitable for applications where latency is a critical factor.	ZGC: Enable with `-XX:+UseZGC` Shenandoah: Enable with `-XX:+UseShenandoahGC`
3. Throughput-Centric Applications	Parallel GC	Maximizes throughput, suitable for batch processing or applications where throughput is more critical than pause times.	Enable with `-XX:+UseParallelGC`

2, Low Latency Requirements

ZGC or Shenandoah GC

Both provide very low pause times, suitable for applications where latency is a critical factor.

ZGC: Enable with -XX:+UseZGC

Shenandoah: Enable with -XX:+UseShenandoahGC

3. Throughput-Centric Applications Parallel GC Maximizes throughput, suitable for batch processing or applications where throughput is more critical than pause times. Enable with -XX:+UseParallelGC

From the options above, GigaSpaces recommends using option 2

How to Define a Verbose Garbage Collection

Following is an example followed by an explanation of how to define a verbose garbage collection including disabling explicit GC collection calls and out of memory.

GigaSpaces strongly recommends that ALL the options explained below should be set.

#!/bin/bash
LOG_DIR="/path/to/logs"                                                                                                                                                                             JAVA_OPTS="-Xlog:gc*:file=${LOG_DIR}/gc_%p.log:tags,uptime,time,level:filecount=10,filesize=10M"

Explanation of Options

-Xlog:gc*: Enables logging of all garbage collection-related events.
file=/path/to/logs/gc_%p.log: Specifies the path and name pattern for the log file, where %p is replaced with the PID.
tags,uptime,time,level: Specifies the tags to include in the log entries:
- tags: Include tags in the log output.
- uptime: Include the JVM uptime in the log entries.
- time: Include the timestamp in the log entries.
- level: Include the log level (e.g., info, warning, error).

filecount=10: Specifies the number of log files to keep.
filesize=10M: Specifies the maximum size of each log file.

-XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/heapdumps/

Detailed Explanation of Options

-XX:+DisableExplicitGC:
- This option disables explicit garbage collection calls (System.gc()) made by the application or third-party libraries. This can help in avoiding performance issues caused by frequent full GCs triggered explicitly.

-XX:+HeapDumpOnOutOfMemoryError:
- This option enables the JVM to generate a heap dump when it encounters an OutOfMemoryError. A heap dump is a snapshot of the JVM’s memory at a specific point in time and is useful for diagnosing memory leaks and other memory-related issues.
-XX:HeapDumpPath=/path/to/heapdumps/:
- This option specifies the directory where the heap dump file should be saved. Make sure the specified directory exists and the JVM has write permissions to this directory.
-Dgs.gc.collectionTimeThresholdWarning=1000:
- This specifies that if GC takes longer than 1 sec a warning will be added in the proccess log.