InsightEdge Data Modeling

This section describes how to define the Data Grid model.

Class Annotations

The Data Grid native modeling is designed for Java POJO classes. In order to use Scala case classes with the grid, do the following:

  • Annotate each property in the case class with scala.beans.BeanProperty (instructs the compiler to generate a bean-compatible getter and setter).
  • Mark each property as var.
  • Add a no-args constructor with default values.

There is partial support for immutable case classes in the data grid. For more information, refer to Constructor Based Properties.

In addition, you can use data-grid specific annotations from org.insightedge.scala.annotation to enhance your data model. @SpaceId is mandatory, the rest is optional.

Here is an example of Product class:

import org.insightedge.scala.annotation._
import scala.beans.{BeanProperty, BooleanBeanProperty}

case class Product(
   @BeanProperty @SpaceId var id: Long,
   @BeanProperty var description: String,
   @BeanProperty var quantity: Int
) {
   def this() = this(-1, null, -1)
}

The Spark shell does not support defining annotations on your class model. Instead, pre-compile and import your model classes, or use the Zeppelin Web Notebook instead of the shell.

Autogenerate ID

If you want to increment the id property automatically when saving to the data grid, use @SpaceId(autoGenerate = true) (only for String fields).

Indexing

You can improve the speed of data filtering and retrieval operations by indexing relevant fields with the @SpaceIndex annotation.

For more information, see the Indexing section.


Controlling Spark Partitions

By default there is a one-to-one mapping between Spark and Data Grid partitions. If you want your RDD or DataFrame to have more partitions than Data Grid, you have to mixin org.insightedge.spark.model.BucketedGridModel trait into your class. The BucketedGridModel.metaBucketId property should be uniformly distributed between 0 and 128.

Limitations

If you use Java client to write POJO objects to the Space, but then read them as DataFrame from the Apache Zeppelin notebook, you may encounter any of the following limitations:

  • If you try to read a class where one or more of the properties is defined as Enum, read operations will fail.
  • The InsightEdge JDBC interpreter doesn’t support nested properties. Instead, use the Spark SQL interpreter to read POJOs that contain nested properties.
  • In some edge cases, the Java class definition may be incompatible with Scala. For example, if the Java class contains a sub-class that references the parent class. When this type of incompatibility is encountered, read operations From Apache Zeppelin may fail with an error.