GigaSpaces Considerations

The Runtime Environment (GSA, LUS, GSM and GSC)

In a dynamic environment where you want to start GSCs and GSMs remotely, manually, or dynamically, the GSA is the only component that should be running on the machine hosting the GigaSpaces runtime environment. This lightweight service acts as an agent and starts a GSCClosed Grid Service Container. This provides an isolated runtime for one (or more) processing unit (PU) instance and exposes its state to the GSM./GSMClosed Grid Service Manager. This is is a service grid component that manages a set of Grid Service Containers (GSCs). A GSM has an API for deploying/undeploying Processing Units. When a GSM is instructed to deploy a Processing Unit, it finds an appropriate, available GSC and tells that GSC to run an instance of that Processing Unit. It then continuously monitors that Processing Unit instance to verify that it is alive, and that the SLA is not breached./LUSClosed Lookup Service. This service provides a mechanism for services to discover each other. Each service can query the lookup service for other services, and register itself in the lookup service so other services may find it. when needed.

Plan the initial number of GSCs and GSMs based on the application memory footprint, and the amount of processing you might need. The most basic deployment should include 2 GSMs (running on different machines), 2 lookup services (running on different machines), and 2 GSCs (running on each machine). These host your data Grid or any other application components (services, web servers, MirrorClosed Performs the replication of changes to the target table or accumulation of source table changes used to replicate changes to the target table at a later time. If you have implemented bidirectional replication in your environment, mirroring can occur to and from both the source and target tables.) that you deploy.

In general, the total amount of GSCs that should be running across the machines hosting the system depends on:

The recommended number of GSCs a machine should host is half of the amount of total CPU cores, each having no more than a 10GB maximum heap size.

Configuring the Runtime Environment

JVM parameters (system properties, heap settings, etc.) that are shared between all components are best set using the GS_OPTIONS_EXT environment variable. Specific GSAClosed Grid Service Agent. This is a process manager that can spawn and manage Service Grid processes (Operating System level processes) such as The Grid Service Manager, The Grid Service Container, and The Lookup Service. Typically, the GSA is started with the hosting machine's startup. Using the agent, you can bootstrap the entire cluster very easily, and start and stop additional GSCs, GSMs and lookup services at will. JVM parameters can be easily passed using GS_GSA_OPTIONS that is appended toGS_OPTIONS_EXT. As good practice, add all the component environment variables ( GS_GSA_OPTIONS, GS_GSM_OPTIONS, GS_GSC_OPTIONS, GS_LUS_OPTIONS) to the GSA script or a wrapper script, and the values will be passed to the corresponding components.

#Wrapper Script
export GS_GSA_OPTIONS='-Xmx256m'
export GS_GSC_OPTIONS='-Xmx2048m'
export GS_GSM_OPTIONS='-Xmx1024m'
export GS_LUS_OPTIONS='-Xmx1024m'

#call gs-agent.sh
. ./gs-agent.sh
@rem Wrapper Script
@set GS_GSA_OPTIONS=-Xmx256m
@set GS_GSC_OPTIONS=-Xmx2048m
@set GS_GSM_OPTIONS=-Xmx1024m
@set GS_LUS_OPTIONS=-Xmx1024m

@rem call gs-agent.bat
call gs-agent.bat

The above LUS configuration will serve up to 50 partitions running on 100 GSCs. For larger environments, you must increase the heap size and perform GC tuning.

Running Multiple Groups

You may have a set of LUS/GSM managing GSCs associated to a specific group. To "break" your network into 2 groups, start the GigaSpaces runtime environment as follows:

export GS_LOOKUP_GROUPS=GroupX
gs-agent.sh --lus=1 
export GS_LOOKUP_GROUPS=GroupX
gs-agent.sh --gsc=4
export GS_LOOKUP_GROUPS=GroupX
gs-agent.sh --lus=1 --gsm=1 
export GS_LOOKUP_GROUPS=GroupY
gs-agent.sh --gsc=2
export GS_LOOKUP_GROUPS=GroupX
gs deploy-space -cluster schema=partitioned total_members=4 spaceX
export GS_LOOKUP_GROUPS=GroupY
gs deploy-space -cluster schema=partitioned total_members=2 spaceY

Running Multiple Locators

You may have a set of LUS/GSM managing GSCs associated to a specific locator. To "break" your network into 2 groups using different lookup locators, start the GigaSpaces runtime environment as follows:

export GS_LUS_OPTIONS=-Dcom.sun.jini.reggie.initialUnicastDiscoveryPort=8888
exportGS_LOOKUP_LOCATORS=127.0.0.1:8888
export GS_OPTIONS_EXT=-Dcom.gs.multicast.enabled=false
gs-agent.sh --lus=1 --gsm=1 
export GS_LOOKUP_LOCATORS=127.0.0.1:8888
export GS_OPTIONS_EXT=-Dcom.gs.multicast.enabled=false
gs-agent.sh --gsc-4
export GS_LUS_OPTIONS=-Dcom.sun.jini.reggie.initialUnicastDiscoveryPort=9999
export GS_LOOKUP_LOCATORS=127.0.0.1:8888
export GS_OPTIONS_EXT=-Dcom.gs.multicast.enabled=false
gs-agent.sh --lus=1 --gsm=1 
export GS_LOOKUP_LOCATORS=127.0.0.1:9999
export GS_OPTIONS_EXT=-Dcom.gs.multicast.enabled=false
gs-agent.sh --gsc=2
export GS_LOOKUP_LOCATORS=127.0.0.1:8888
gs deploy-space -cluster schema=partitioned total_members=4 spaceX
export GS_LOOKUP_LOCATORS=127.0.0.1:9999
gs deploy-space -cluster schema=partitioned total_members=2 spaceY

In addition to the Lookup Service, there is an alternative way to export the SpaceClosed Where GigaSpaces data is stored. It is the logical cache that holds data objects in memory and might also hold them in layered in tiering. Data is hosted from multiple SoRs, consolidated as a unified data model. proxy, via the RMI registry (JNDI). It is started by default within any JVM running a GSC/GSM. By default, the port used is 10098 and above. This option should be used only in special cases where there is no way to use the default Lookup Service. Since this is the usual RMI registry, it suffers from known problems, such as being non-distributed, non-highly-available, etc.

The Lookup Service runs by default as a standalone JVM process started by the GSA. You can also embed it to run together with the GSM. In general, you should run two Lookup Services per system. Running more than two Lpokup Services may cause increased overhead due to the chatting and heartbeat mechanism performed between the services and the lookup service, to signal the existence of the service.

Zones

Zones allows you to "label" a running GSC(s) before starting it. The zone should be used to isolate applications and a data grid running on the same network. It has been designed to allow users to deploy a processing unitClosed This is the unit of packaging and deployment in the GigaSpaces Data Grid, and is essentially the main GigaSpaces service. The Processing Unit (PU) itself is typically deployed onto the Service Grid. When a Processing Unit is deployed, a Processing Unit instance is the actual runtime entity. to specific set of GSCs, where they all share the same set of LUSs and GSMs.

image

The Zone property can be used for example to deploy your data grid into a specific GSC(s) labeled with specific zone(s). The zone is specified prior to the GSC startup, and cannot be changed after the GSC has been started.

Verify that you have an adequate number of GSCs running before deploying an application whose SLA specifies a specific zone.

To use zones when deploying your PUClosed This is the unit of packaging and deployment in the GigaSpaces Data Grid, and is essentially the main GigaSpaces service. The Processing Unit (PU) itself is typically deployed onto the Service Grid. When a Processing Unit is deployed, a Processing Unit instance is the actual runtime entity. you should:

export GS_OPTIONS_EXT=-Dcom.gs.zones=webZone ${GS_OPTIONS_EXT}
gs-agent --gsc=2
gs deploy -zones webZone myWar.war

Running Multiple Zones

You may have a set of LUS/GSM managing multiple zones (recommended) or have a separate LUS/GSM set per zone. If you have a set of LUS/GSM managing multiple zones, you should run them as follows:

gs-agent.sh --lus=1 --gsm=1 
export GS_OPTIONS_EXT=-Dcom.gs.zones=zoneX ${GS_OPTIONS_EXT}
gs-agent.sh --gsc=4
export GS_OPTIONS_EXT=-Dcom.gs.zones=zoneY ${GS_OPTIONS_EXT}
gs-agent.sh --gsc=2

Runtime File Location

GigaSpaces generates some files while the system is running. You can change the location of the generated files using the following system properties:

System Property Description Default
com.gigaspaces.logger.RollingFileHandler.filename-pattern The location of log files and their file pattern. <GS_HOME>\logs
com.gs.deploy The location of the deploy directory of the GSM. <GS_HOME>\deploy
com.gs.work The location of the work directory of the GSM and GSC. Due to the fact that this directory is critical to the system proper function, it should be set to a local storage in order to avoid failure in case of network failure when a remote storage is used. <GS_HOME>\work
user.home The location of system defaults config. Used by the GigaSpaces Management Center, and runtime system components.  
com.gigaspaces.lib.platform.ext PUs shared classloader libraries folder. PU jars located within this folder loaded once into the JVM system classloader and shared between all the PU instances classloaders within the GSC. In most cases this is a better option than the com.gs.pu-common for JDBCClosed Java DataBase Connectivity. This is an application programming interface (API) for the Java programming language, which defines how a client may access a database. drivers and other 3rd party libraries. This is useful option when you want multiple processing units to share the same 3rd party jar files and do not want to repackage the processing unit jar whenever one of these 3rd party jars changes. <GS_HOME>\lib\platform\ext
com.gs.pu-common The location of common classes used across multiple processing units. The libraries located within this folder loaded into each PU instance classloader (and not into the system classloader as with the com.gigaspaces.lib.platform.ext. <GS_HOME>\lib\optional\pu-common
com.gigaspaces.grid.gsa.config-directory The location of the GSA configuration files. The GigaSpaces Agent (GSA) manages different process types. Each process type is defined within this folder in an xml file that identifies the process type by its name. <GS_HOME>\config\gsa
java.util.logging.config.file It indicates file path to the Java logging file location. Use it to enable finest logging troubleshooting of various GigaSpaces Services. You may control this setting via the GS_LOGS_CONFIG_FILE environment variable. <GS_HOME>\config\log\xap_logging.properties

You can use com.gigaspaces.lib.platform.ext and the com.gs.pu-common to decrease the deployment time if your processing unit contains many third-party JAR files. In this case, each GSC will download the processing unit JAR file (along with all the JARs it depends on) to its local working directory from the GSM. In large deployments spanning tens or hundreds of GSCs, this can be very time consuming. In these cases, consider placing the JARs on which your processing unit depends in a shared location on your network, and then point the com.gs.pu-common or com.gigaspaces.lib.platform.ext directory to this location.

PU Packaging and CLASSPATH

User PU Application Libraries

A Processing Unit JAR file, or a Web Application WAR file should include (within its lib folder) all the necessary JARs required for the application. Resource files should be placed within one of the JAR files within the PU JAR, located under the lib folder. In addition, the PU JAR should include the pu.xml within the META-INF\spring folder. In order to close LRMI threads when closing application, use:LRMIManager.shutdown().

Data Grid PU Libraries

When deploying a data grid PU, it is recommended to include all space classes and their dependency classes as part a PU JAR file. This PU JAR file should include a pu.xml within the META-INF\spring, to include the space declarations and relevant tuning parameters.

GigaSpaces Management Center Libraries

The GigaSpaces Management Center has been deprecated and will be removed in a future release.

It is recommended to include all space classes and their dependency classes as part of the GS-UI CLASSPATH . This ensures that you can query the data via the GigaSpaces Management Center. To set the GigaSpaces Management Center classpath, set the POST_CLASSPATH variable prior to calling the GS-UI script to have the application JARs locations.

To avoid having to load the same library into each PU instance classloader running within the GSC, you should place common libraries (such as JDBC driver, logging libraries, Hibernate libraries and their dependencies) in the $GS_HOME\lib\optional\pu-common folder. You can specify the location of this folder using the com.gs.pu-common system property.

Space Memory Management

The Space supports two memory management modes:

When running with ALL_IN_CACHE, the memory management does the following:

  • Stops clients from writing data into the space when the JVM utilized memory crosses the WRITE threshold (percentage of the heap max size).

  • Throws a MemoryShortageExecption back to the client when the JVM utilized memory crosses the high_watermark_percentage threshold.

When running with ALL_IN_CACHE, ensure that the default memory management parameters are tuned according the JVM heap size. A large heap size (over 2GB RAM) requires special attention. Here is an example of memory manager settings for a 10GB heap size:

<os-core:embedded-space id="space" space-name="mySpace" >
    <os-core:properties>
        <props>
            <prop key="space-config.engine.memory_usage.high_watermark_percentage">95</prop>
            <prop key="space-config.engine.memory_usage.write_only_block_percentage">94</prop>
            <prop key="space-config.engine.memory_usage.write_only_check_percentage">93</prop>
            <prop key="space-config.engine.memory_usage.low_watermark_percentage">92</prop>
        </props>
    </os-core:properties>
</os-core:embedded-space>

Distributing the Primary Spaces

By default, when running GSCs on multiple machines and deploying a Space with backups, GigaSpaces tries to provision primary Spaces to all available GSCs across all the machines. The max-instances-per-vm and the max-instances-per-machine deploy parameters should be set when deploying your data grid, to determine how the deployed Processing Unit (e.g. Space) is provisioned into the different running GSCs.

The number of backups per partition is zero or one.

Without setting the max-instances-per-vm and the max-instances-per-machine, GigaSpaces might provision a primary and a backup instance of the same partition into GSCs running on the same physical machine. To avoid this behavior, set max-instances-per-vm=1 and max-instances-per-machine=1. This ensures that the primary and backup instances of the same partition are provisioned into different GSCs running on different machines. If there is one machine running GSCs and max-instances-per-machine=1, backup instances are not provisioned.

Here is an example of how to deploy a data grid with 4 partitions, with a backup per partition (total of 8 Spaces), with 2 Spaces per GSC, and the primary and backup running on different machines (even when you have other GSCs running):

gs deploy-space -cluster schema=partitioned-sync2backup total_members=4,1
   -max-instances-per-vm 2  -max-instances-per-machine 1 MySpace

Log Files

GigaSpaces generates log files for each running component . This includes the GSA, GSC, GSM, Lookup Service and client-side components. By default, the log files are created within the <GS_HOME>\logs folder. After some time, you may end up with a large number of files that are difficult to maintain and search. it is recommended to back up or delete old log files. You can use the logging backup policy to manage your log files.