Cluster Setup for GigaSpaces

This topic explains how to install and run GigaSpaces on a cluster.

Starting a Whole Cluster

Your cluster should consist of one master node and several slave nodes for the following configuration:

Master nodes usually host the Spark master and the GigaSpaces Manager (for data grid management)
Slave nodes host the Spark workers and data grid cluster members (Processing Unit instances)

There are several environment variables that must be set in order for your GigaSpaces cluster to function correctly. The environment variables are located in the $GS_HOME/bin/setenv-overrides.sh/bat file, and can be configured as described in the Configuration page of the Getting Started guide.

XAP_MANAGER_SERVERS - Must be configured on each machine and is required for the master node, which starts the GigaSpaces Manager along with Apache Zookeeper for high availability. See the the Manager page for more information.
XAP_LOOKUP_GROUPS - This property is used to discover GigaSpaces components across the network.
XAP_GSC_OPTIONS - Set this value based on the size of the JVMs that will host the Processing Unit instances. For example, you can configure the amount of memory required as -Xmx5g -Xms5g.

Starting a Cluster Locally

The run-agent command automatically resolves which service to run on the current host. The resolution is based on the XAP_MANAGER_SERVERS environment variable, but when undefined it will use localhost as the server IP.

$GS_HOME/bin/gs.sh host run-agent --auto

This command will run a GigaSpaces Manager, Web Management Console, Spark master, Spark worker and the Zeppelin interpreter.

REST URL - http://localhost:8090 Web Management Console - http://localhost:8099 Spark master - http://localhost:8080/ Spark worker - http://localhost:8081/ Zeppelin - http://localhost:9090/

Starting a Master Node

Master nodes consist of a GigaSpaces Manager and a Spark master. On each master node, run the following:

$GS_HOME/bin/gs.sh host run-agent --manager --spark-master

Starting Slave Nodes

Slave nodes consist of GigaSpaces containers and a Spark worker. On each slave node, run the following:

Use --containers=n to put GigaSpaces containers on a specific machine. If not specified, no GigaSpaces containers will be started.

$GS_HOME/bin/gs.sh host run-agent --spark-worker [--containers=n]

After installation, you can verify that the Spark workers are up and running using the Spark master web UI at http://your-master-ip-here:8080.

Deploying an Empty Space