Aggregators

With many systems such as pricing systems, risk management, trading and other analytic and business intelligence applications you may need to perform an aggregation activity across data stored within the data grid when generating reports or when running some business process. Such activity can leverage data stored in memory and will be much faster than performing it with a database. GigaSpaces provides common functionality to perform aggregations across the space. There is no need to retrieve the entire data set from the space to the client side , iterate the result set and perform the aggregation. This would be an expensive activity as it might return large amount of data into the client application.

Built-in Aggregators allow you to perform the entire aggregation activity at the space side avoiding any data retrieval back to the client side. Only the result of each aggregation activity performed with each partition is returned back to the client side where all the results are reduced and returned to the client application. Such aggregation activity utilize the partitioned nature of the data-grid allowing each partition to execute the aggregation with its local data in parallel, where all the partitions intermediate results are fully aggregated at the client side using the relevant reducer implementation.

How Do Aggregators Work?

Aggregators are executed by iterating the internal data grid structure that maintains the space objects. There is no materialization of the original user data grid object when performing this iteration (scan). This allows relatively fast scan. There is no need to index the aggregated fields (paths) - only the fields (paths) used to execute the query used to generate the result set scanned to calculate the aggregation. Future GigaSpaces releases may use indexes to perform the aggregation.

Supported Aggregators

GigaSpaces comes with several built-in Aggregators you may use. The aggregation process executed across all data grid partitions when using a partitioned data grid , or across the proxy master replica when using a replicated data grid. You may rout the aggregation into a specific partition.


Name Description
Min Returns the minimum value for a set of data grid entries for a given field (path) based on a given query.
Max Returns the maximum value for a set of data grid entries for a given field (path) based on a given query.
Average Returns the average value for a given set of data grid entries for a given field (path) based on a given query.
Sum Returns the sum value for a set of data grid entries for a given field (path) based on a given query.
MaxEntry Returns the Entry (space object) with the maximum value for a set of data grid entries for a given field (path) based on a given query.
MinEntry Returns the Entry (space object) with the minimum value for a set of data grid entries for a given field (path) based on a given query.

Interoperability

Aggregators may be performed on any data generated by any type of client. For example - A call for Aggregation from a .NET application may be performed on space objects that were written into the space using .NET application using the XAP.NET API or C++ application using the GigaSpaces C++ API. Same for a call from .NET Aggregation API for data written into the space via a .NET application.

Usage

using GigaSpaces.Core.Linq;

...
var queryable = from p in spaceProxy.Query<Person>("Country='UK' OR Country='U.S.A'") select p;
// retrieve the maximum value stored in the field "Age"
int maxAgeInSpace = queryable.Max(p => p.Age);
// retrieve the minimum value stored in the field "Age"
int minAgeInSpace = queryable.Min(p => p.Age);
// Sum the "Age" field on all space objects.
int combinedAgeInSpace = queryable.Sum(p => p.Age);
// Sum's the "Age" field on all space objects then divides by the number of space objects.
double averageAge = queryable.Average(p => p.Age);
// Retrieve the space object with the highest value for the field "Age".
Person oldestPersonInSpace = queryable.MaxEntry(p => p.Age);
// Retrieve the space object with the lowest value for the field "Age".
Person youngestPersonInSpace = queryable.MinEntry(p => p.Age);
[SpaceClass]
public class Person
{
    [SpaceID(AutoGenerate = true)]
    public string Id { get; set; }

    public string Name { get; set; }

    [SpaceIndex]
    public string Country { get; set; }

    public int Age { get; set; }
}

Group Aggregation

The following examples show how to group data in various ways:

using GigaSpaces.Core.Linq;

...
/* group by a single property with default select */
var query = from p in spaceProxy.Query<Person>()
                 group p by p.Gender into g
                 select g;

/* group by a single property with single select */
var query = from p in spaceProxy.Query<Person>()
                 group p by p.Gender into g
                 select g.Sum(p => p.Age);

/* group by single property with multi select */
var query = from p in spaceProxy.Query<Person>()
                 group p by p.Gender into g
                 select new { Max = g.Max(p => p.Age), Gender = g.Key, Min = g.Min(p => p.Age) };

/* group by multiple properties with default select */
var query = from p in spaceProxy.Query<Person>()
                 group p by new { p.Gender, p.Country } into g
                 select g;

/* group by multiple properties with single select */
var query = from p in spaceProxy.Query<Person>()
                 group p by new {p.Gender, p.Country} into g
                 select g.Sum(p => p.Age);

/* group by multiple properties with multiple select */
var query = from p in spaceProxy.Query<Person>()
                 group p by new { p.Gender, p.Country } into g
                 select new { Max = g.Max(p => p.Age), TheKey = g.Key, Min = g.Min(p => p.Age) };

Compound Aggregation

Compound aggregation will execute multiple aggregation operations across the space returning all of the result sets at once. When multiple aggregates are needed the compound aggregation API is significantly faster than calling each individual aggregate.


SqlQuery<Person> query = new SqlQuery<Person>("Country=? OR Country=?");
query.SetParameter(1,"UK");
query.SetParameter(2,"U.S.A");

var aggregationSet = new AggregationSet();
aggregationSet.MaxEntry("Age");
aggregationSet.MinEntry("Age");
aggregationSet.Sum("Age");
aggregationSet.Average("Age");
aggregationSet.MinValue("Age");
aggregationSet.MaxValue("Age");

var result = spaceProxy.Aggregate(sqlQuery, aggregationSet);

var oldest = (Person)result.Results[0];
var youngest = (Person)result.Results[1];
var sum = (int)result.Results[2];
var average = (double)result.Results[3];
var min = (int)result.Results[4];
var max = (int)result.Results[5];

Aggregate Embedded Fields

Aggregation against the members of embedded space classes is supported by supplying the field path while invoking the desired aggregate function.

using GigaSpaces.Core.Linq;


var queryable = from p in spaceProxy.Query<Person>() where p.Country == "UK" || p.Country=="U.S.A" select p;
// retrieve the maximum value stored in the field "Age"
var result = queryable.Max(p => p.Demographics.Age);
[SpaceClass]
public class Person
{
    [SpaceID(AutoGenerate = true)]
    public string Id { get; set; }
    public string Name { get; set; }
    public string State { get; set; }
    public Demographics Demographics { get; set; }
}
[Serializable]
public class Demographics
{
    public int Age { get; set; }

    public char Gender { get; set; }
}