08 August 2024

Data Grid Architecture

The GigaSpaces data grid is built from the following sub-systems:

Open Interfacing Layer

Supports any language, any platform, any API - Achieve interoperability, easy migration, reduced learning curves, and faster time to market by leveraging existing assets - such as code and programming expertise - through:

Standard API Support: SQL, JPA, Spring, REST REpresentational State Transfer. Application Programming Interface An API, or application programming interface, is a set of rules that define how applications or devices can connect to and communicate with each other. A REST API is an API that conforms to the design principles of the REST, or representational state transfer architectural style. and more.
Multi-language Interoperability: Java, .NET, and C++
Multi-platform Support: Any OS, physical or virtual.
API Mashup: Easily leverage modern APIs alongside existing standard APIs - enables you to use the right tool for the job at hand.

OpenSpaces

OpenSpaces is the GigaSpaces native programming API. It is an open-source Spring-based application interface designed to make Space Where GigaSpaces data is stored. It is the logical cache that holds data objects in memory and might also hold them in layered in tiering. Data is hosted from multiple SoRs, consolidated as a unified data model.-based development easy, reliable, and scalable. In addition, the programming model is non-intrusive, based on a simple POJO Plain Old Java Object. A regular Java object with no special restrictions other than those forced by the Java Language Specification and does not require any classpath. programming model and a clean integration point with other development frameworks.

The OpenSpaces API is divided into four parts:

Core API
Messaging and Events
Space-Based Remoting
Integrations

Core API

Note: .NET version is for XAP GigaSpaces eXtreme Application Platform. Provides a powerful solution for data processing, launching, and running digital services only

The core package of OpenSpaces provides APIs for direct access to a data grid, internally referred to as a "Space." The main interface is the GigaSpace, which enables the basic interaction with the data grid. The core components include basic infrastructure support such as Space Java version | .NET version construction, simplified API using the GigaSpaces interface including Transaction Management Java version | .NET version and declarative transaction support.

Messaging and Events

The events package is built on top of the core package, and provides simple object-based event processing components through the event containers, making it roughly equivalent to Java EE's message-driven beans. (The primary differences outside of API semantics are in the power of OpenSpaces' selection criteria and routing The mechanism that is in charge of routing the objects into and out of the corresponding partitions. The routing is based on a designated attribute inside the objects that are written to the Space, called the Routing Index..) The event package enables simple construction of event-driven applications.

The events module includes components for simplified EDA Event Driven Architecture. An event-driven architecture uses events to trigger and communicate between decoupled services and is common in modern applications built with microservices./Service Bus development. These components allow event driven programming and provide two mechanisms for event-generation: a Polling Container Java version | .NET version uses polling received operations against the Space, and a Notify Container Java version | .NET version which uses the Space's built-in notification support.

The messaging grid aspect of the Space provides messaging capabilities such as:

Event-Driven capabilities - the ability to build event-driven processing applications. This model enables fast (in-memory-based) asynchronous modular processing, resulting in a very efficient and scalable processing paradigm.
Asynchronous production and consumption of information.
One-to-one, Many-to-One, One-to-Many and Many-to-Many relationships.
FIFO FIFO is an acronym for first in, first out, a method for organizing the manipulation of a data structure where the oldest entry, or "head" of the queue, is processed first. ordering. Java version | .NET version
Transaction Management Java version | .NET version

Space Based Remoting and Task Execution

The Remoting Java version | .NET version package provides capabilities for clients to access remote services. Remoting in GigaSpaces is implemented on top of the data grid's clustering model, which provides location transparency, fail-over, and performance to remote service invocations. GigaSpaces implements remoting , using the Space as the transport layer, similar to the Spring remoting components.

Remoting can be viewed as the alternative to Java EE Session Beans A Java Bean is a reusable software component (class) that can be visually manipulated in builder tools. They adhere to a specific set of conventions and guidelines defined by Sun Microsystems (now Oracle)., or Java RMI, as it provides all of their capabilities as well as supporting synchronous and asynchronous invocations, and dynamic scripting languages - enabling you to use Groovy or Ruby in your space-based applications.

Compute Grid

The compute grid is a mechanism that allows you to run user code on all/some nodes of the grid, so that the code can run locally with the data.

Compute grids are an efficient solution when a computation requires a large data set to be processed, so that moving the code to where the data is, is much more efficient than moving the data to where the code is.

The process is widely known as map/reduce, and is used extensively by companies like Google whenever a large data set needs to be processed in a short amount of time.

For dynamic execution of code from the server side, use Task Execution. Java version | .NET version. The efficiency derives from the fact that the processing task is sent to all the desired grid nodes concurrently. A partial result is calculated using the data on that particular node, and then sent back to the client, where all the partial results are reduced to a final result.

Processing Services

With the Messaging and Events and Space-Based Remoting and Task Execution, we can build distributed parallel processing services

Parallel Processing

Sometimes the scalability bottleneck is within the processing capabilities. This means that there is a need to gain more processing power to be executed concurrently. In other words, there is a need for parallel processing. When there is no state involved, it is common to spawn many processes on multiple machines, and to assign a different task to each process.

However, the problem becomes much more complex when the tasks for execution require sharing of information. GigaSpaces has built-in services for parallel processing. The master/worker pattern is used, where one process serves as the master and writes objects into the space, and many worker services each take work for execution and share the results. The workers then request a new piece of work, and so on. This pattern is important in practice, since it automatically balances the load.

Scaling

Scaling is increasing or decreasing the resources based on the application requirements at any given time.

There are several types of scaling:

Horizontal Scaling - Increasing the number of instances or partitions if the service includes a space (partition), or having more instances if it is a stateless service (instance).

Vertical Scaling - Remain with the same number of instances, but the allocated memory or CPU can be updated for each instance.

Spring Container

The Spring framework container integration is included as part of GigaSpaces, and provides the ability to take advantage of Spring components, programming model and capabilities.

The Spring framework provides very elegant abstractions, which makes it very easy to build layered and decoupled applications.

Jetty Web Container - XAP Only

Jetty is a very popular web container, which provides support for JEE web container specification services such as: Servlet, JavaServer Pages, JavaServer Faces, and others.

The integration with the Jetty web container allows you to run JEE web applications (.war files) in GigaSpaces.

GigaSpaces can host your Java web modules so your application is entirely managed and scaled on a single platform, providing load balancing and extreme throughput, and ensuring end to end scalability.

GigaSpaces allows you to deploy web applications (packaged as WAR files) onto the GSC Grid Service Container. This provides an isolated runtime for one (or more) processing unit (PU) instance and exposes its state to the GSM.. This support provides:

Dynamic allocation of several instances of a web application (probably fronted by a load balancer).
Management of the instances running (if a GSC fails, the web application instances running on it will be instantiated on a different GSC).
SLA-monitor-based dynamic allocation and de-allocation of web application instances.

The deployed WAR is a pure Java EE-based web application. The application can be the most generic web application, and automatically make use of the Service Grid A built-in orchestration tool which contains a set of Grid Service Containers (GSCs) managed by a Grid Service Manager. The containers host various deployments of Processing Units and data grids. Each container can be run on a separate physical machine. This orchestration is available for XAP only. features. The web application can define a Space (either embedded or remote) very easily (either using Spring or not).

Instead of using Jetty Web Container, the Spring boot web application can be used as a client of the Space especially in a Kubernetes An open-source container orchestration system for automating software deployment, scaling, and management of containerized applications. environment where there is no need to manage the SLA.

Microsoft .NET Container - XAP Only

The .NET SBA Space-Based Architecture. This architecture implementation is a set of Processing Units, with the following properties: Each processing unit instances holds a partitioned space instance and one or more services that are registered on events on that specific partition. Together they form an application cluster. Utlized by Utilized GigaSpaces cloud-native IMDG. application takes advantage of the ability to run business services and .NET code, co located with the data stored within the space.

The .NET container bridges the technical gap and provides a native .NET experience for .NET applications.

Unified In-Memory Services

Data access, messaging, parallel processing services, speeding up your application performance.

In-memory speed: Delivering unmatched performance by removing all physical I/O bottlenecks from the runtime flow
Scalability The ability of a system to handle increased load by adding resources, such as processing power or storage. Scalability ensures that the system can grow with the demands placed on it.: Intelligently distribute any data and messaging load across all available resources
Capacity: Support terabytes of application data
High Availability: Built-in hot backup and self-healing capabilities for zero downtime
Consistency: Maintain data integrity with 100% transactional data handling

As an application platform, GigaSpaces provides integrated, memory-based runtime capabilities. The core of these capabilities is backed by the Space technology.

In-Memory Data Grid

An In-Memory Data Grid (IMDG In-Memory Data Grid. A simple to deploy, highly distributed, and cost-effective solution for accelerating and scaling services and applications. It is a high throughput and low latency data fabric that minimizes access to high-latency, hard-disk-drive-based or solid-state-drive-based data storage. The application and the data co-locate in the same memory space, reducing data movement over the network and providing both data and application scalability.) is a way of storing data across a grid of memory nodes. This service provides the application with:

Data storage capabilities.
Data query capabilities - single object, multiple object and aggregated complex queries.
Caching semantics - the ability to retrieve information from within-memory data structures.
Ability to execute business logic within the data - similar to database storage procedure capabilities.

It is important to note that the data grid, although a memory-based service, is fully transactional, and follows the ACID In the context of databases and data storage systems, a transaction is any operation that is treated as a single unit of work, which either completes fully or does not complete at all, and leaves the storage system in a consistent state. ACID is an acronym that refers to the set of 4 key properties that define a transaction: Atomicity, Consistency, Isolation, and Durability. If a database operation has these ACID properties, it can be called an ACID transaction. (Atomicity, Concurrency, Isolation and Durability) transactional rules.

For information about ACID compliance, read our blog on How to Achieve ACID Compliance on Distributed, Highly Available Systems.

The data grid uses the unified clustering layer, to provide a highly available and reliable service.

The main API to access the data grid service is the GigaSpace interface.

Virtualized Deployment Infrastructure

Detailed information about the virtualized deployment infrastructure can found in the Orchestration section.