Cluster Setup for InsightEdge
This topic explains how to install and run InsightEdge on a cluster.
Your cluster should consist of one master node and several slave nodes for the following configuration:
- Master nodes usually host the Spark master and the GigaSpaces Manager (for data grid management)
- Slave nodes host the Spark workers and data grid cluster members (Processing Unit instances)
There are several environment variables that must be set in order for your InsightEdge cluster to function correctly. The environment variables are located in the
$GS_HOME/bin/setenv-overrides.sh/bat file, and can be configured as described in the Configuration page of the Getting Started guide.
GS_MANAGER_SERVERS- Must be configured on each machine and is required for the master node, which starts the GigaSpaces Manager along with Apache Zookeeper for high availability. See the the Manager page for more information.
GS_LOOKUP_GROUPS- This property is used to discover GigaSpaces components across the network.
GS_GSC_OPTIONS- Set this value based on the size of the JVMs that will host the Processing Unit instances. For example, you can configure the amount of memory required as
The run-agent command automatically resolves which service to run on the current host.
The resolution is based on the
GS_MANAGER_SERVERS environment variable, but when undefined it will use localhost as the server IP.
$GS_HOME/bin/gs.sh host run-agent --auto
This command will run a GigaSpaces Manager, Web Management Console, Spark master, Spark worker and the Zeppelin interpreter.
Master nodes consist of a GigaSpaces Manager and a Spark master. On each master node, run the following:
$GS_HOME/bin/gs.sh host run-agent --manager --spark-master
Slave nodes consist of GigaSpaces containers and a Spark worker. On each slave node, run the following:
--containers=n to put GigaSpaces containers on a specific machine. If not specified, no GigaSpaces containers will be started.
$GS_HOME/bin/gs.sh host run-agent --spark-worker [--containers=n]
After installation, you can verify that the Spark workers are up and running using the Spark master web UI at
# topology 2,1 starts 2 primary partitions with 1 backup partition for each primary $GS_HOME/bin/gs.sh space deploy --partitions=2 --ha insightedge-space