InsightEdge Data Modeling


This section describes how to define the Data Grid model.

Class Annotations

The Data Grid native modeling is designed for Java POJO classes. In order to use Scala case classes with the grid, do the following:

  • Annotate each property in the case class with scala.beans.BeanProperty (instructs the compiler to generate a bean-compatible getter and setter).
  • Mark each property as var.
  • Add a no-args constructor with default values.
See also:

There is partial support for immutable case classes in the data grid. For more information, refer to Constructor Based Properties.

In addition, you can use data-grid specific annotations from org.insightedge.scala.annotation to enhance your data model. @SpaceId is mandatory, the rest is optional.

Here is an example of Product class:

import org.insightedge.scala.annotation._
import scala.beans.{BeanProperty, BooleanBeanProperty}

case class Product(
   @BeanProperty @SpaceId var id: Long,
   @BeanProperty var description: String,
   @BeanProperty var quantity: Int
) {
   def this() = this(-1, null, -1)
}
Annotations and Shell

The Spark shell does not support defining annotations on your class model. Instead, pre-compile and import your model classes, or use the Zeppelin Web Notebook instead of the shell.

Autogenerate ID

If you want to increment the id property automatically when saving to the data grid, use @SpaceId(autoGenerate = true) (only for String fields).

Indexing

You can improve the speed of data filtering and retrieval operations by indexing relevant fields with the @SpaceIndex annotation.

See also:

For more information, refer to Indexing.


Controlling Spark Partitions

By default there is a one-to-one mapping between Spark and Data Grid partitions. If you want your RDD or DataFrame to have more partitions than Data Grid, you have to mixin org.insightedge.spark.model.BucketedGridModel trait into your class. The BucketedGridModel.metaBucketId property should be uniformly distributed between 0 and 128.