GigaSpaces has a Python API available via PySpark. Its functionality is limited to the DataFrame API.

Interactive Use

You can use the following tools to analyze data interactively in Python:

Zeppelin Notebook

To develop notebooks in Python, use the %pyspark interpreter in the Zeppelin web notebook. See the InsightEdge python example notebook as a reference example.

Command Line Shell

To start the command line shell, run the ./bin/insightedge-pyspark script in the GigaSpaces directory.

For example, start the GigaSpaces demo:

$GS_HOME/bin/ demo
$GS_HOME\bin\gs demo

Then start the command line shell:

$GS_HOME/insightedge/bin/insightedge-pyspark --master spark://
$GS_HOME\insightedge\bin\insightedge-pyspark --master spark://

Saving and Loading DataFrames in Python

To operate on GigaSpaces DataFrames, use the regular PySpark DataFrame API with the org.apache.spark.sql.insightedge format, and specify Data Grid collection or class options. For example,

# create pyspark.sql.SparkSession
spark = SparkSession.builder.getOrCreate()

# load SF salaries dataset from file
jsonFilePath = os.path.join(os.environ["GS_HOME"], "insightedge/data/sf_salaries_sample.json")
jsonDf =

# save DataFrame to the grid

# load DataFrame from the grid
gridDf ="org.apache.spark.sql.insightedge").option("collection", "salaries").load()

You can also load a DataFrame backed by a DataGrid Scala class with the class options, for example:

df ="org.apache.spark.sql.insightedge").option("class", my_class_name).load()

Self-Contained Applications

To develop a self-contained submittable application, use the regular PySpark and configure the GigaSpaces settings in SparkConf:

from pyspark.conf import SparkConf
from pyspark.sql import SparkSession

conf = SparkConf()
conf.setAppName("InsightEdge Python Example")
conf.set("", "demo")
conf.set("", "insightedge")
conf.set("", "")

spark = SparkSession.builder.config(conf=conf).getOrCreate()

The complete source code is available at $GS_HOME/insightedge/examples/python/

The application can be submitted with the insightedge-submit script, for example:

$GS_HOME/insightedge/bin/insightedge-submit --master spark:// $GS_HOME/insightedge/examples/python/
$GS_HOME\insightedge\bin\insightedge-submit --master spark:// $GS_HOME\insightedge\examples\python\

PySpark doesn't support Java 11. Make sure your JAVA_HOME environment variable is pointing to a JDK 8 folder. See here for more information about PySpark and Java.