Python
InsightEdge has a Python API available via PySpark. Its functionality is limited to the DataFrame API.
Interactive Use
There are two options for analyzing data interactively in Python:
- Zeppelin
- command line shell
Zeppelin Notebook
To develop notebooks in Python, use the %pyspark
interpreter in the Zeppelin web notebook. See the InsightEdge python example
notebook as a reference example.
Command Line Shell
To start the command line shell, run the ./bin/insightedge-pyspark
script in the InsightEdge directory.
For example, start the InsightEdge demo:
<XAP-HOME>/bin/insightedge demo
<XAP-HOME>\bin\insightedge demo
Then start the command line shell:
<XAP-HOME/insightedge/bin/insightedge-pyspark --master spark://127.0.0.1:7077
<XAP-HOME>\insightedge\bin\insightedge-pyspark --master spark://127.0.0.1:7077
Saving and Loading DataFrames in Python
To operate on InsightEdge DataFrames, use the regular PySpark DataFrame API with the org.apache.spark.sql.insightedge
format, and specify Data Grid collection
or class
options. For example,
# create pyspark.sql.SparkSession
spark = SparkSession.builder.getOrCreate()
# load SF salaries dataset from file
jsonFilePath = os.path.join(os.environ["XAP_HOME"], "insightedge/data/sf_salaries_sample.json")
jsonDf = spark.read.json(jsonFilePath)
# save DataFrame to the grid
jsonDf.write.format("org.apache.spark.sql.insightedge").mode("overwrite").save("salaries")
# load DataFrame from the grid
gridDf = spark.read.format("org.apache.spark.sql.insightedge").option("collection", "salaries").load()
gridDf.show()
You can also load a DataFrame backed by a DataGrid Scala class with the class
options, for example:
df = spark.read.format("org.apache.spark.sql.insightedge").option("class", my_class_name).load()
Self-Contained Applications
To develop a self-contained submittable application, use the regular PySpark and configure the InsightEdge settings in SparkConf
:
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession
conf = SparkConf()
conf.setAppName("InsightEdge Python Example")
conf.set("spark.insightedge.space.name", "insightedge-space")
conf.set("spark.insightedge.space.lookup.group", "insightedge")
conf.set("spark.insightedge.space.lookup.locator", "127.0.0.1:4174")
spark = SparkSession.builder.config(conf=SparkConf()).getOrCreate()
The complete source code is available at <XAP-HOME>/insightedge/examples/python/sf_salaries.py
.
The application can be submitted with the insightedge-submit
script, for example:
<XAP-HOME>/insightedge/bin/insightedge-submit --master spark://127.0.0.1:7077 <XAP-HOME>/insightedge/examples/python/sf_salaries.py
<XAP-HOME>\insightedge\bin\insightedge-submit --master spark://127.0.0.1:7077 <XAP-HOME>\insightedge\examples\python\sf_salaries.py