18 March 2024

Smart DIH - Platform Components

Smart DIH Smart DIH allows enterprises to develop and deploy digital services in an agile manner, without disturbing core business applications. This is achieved by creating an event-driven, highly performing, efficient and available replica of the data from multiple systems and applications, is an operational data hub designed to address IT challenges of supporting modern, digital applications over heterogeneous, mostly legacy data architectures. By creating an event-driven, highly performing, efficient and available replica of the data from multiple systems and applications, Smart DIH Digital Integration Hub. An application architecture that decouples digital applications from the systems of record, and aggregates operational data into a low-latency data fabric. allows enterprises to develop and deploy digital services in an agile manner, without disturbing core business applications. Strategic initiatives such as integration data hub, digital innovation over legacy systems, API scaling, cloud migration, and business 360 are common use cases for which Smart DIH is utilized.

Platform Components

At a high level, GigaSpaces Smart DIH platform bridges between Enterprise’s data sources and its high-end applications. It does so by streaming the data through these 3 stages (south to north):

Data Capture and Transformation
Caching and Backup
Data Servicing

Another aspect is the platform’s control and monitoring facilities. More information about this can be found in the Application Lifecycle page.

Data Integration

System of Record Agent

This thin agent which resides close to the System of Record (e.g., DB2), fetches raw data changes from the data source and sends it over to the transformation stage.
Change Data Capture (CDC Change Data Capture. A technology that identifies and captures changes made to data in a database, enabling real-time data integration and synchronization between systems. Primarily used for data that is frequently updated, such as user transactions.) Technology for Real-time Events

CDC tools are implemented to capture real-time changes from the on-premise System of Record. These tools capture and propagate changes to downstream systems.
Kafka Apache Kafka is a distributed event store and stream-processing platform. Apache Kafka is a distributed publish-subscribe messaging system. A message is any kind of information that is sent from a producer (application that sends the messages) to a consumer (application that receives the messages). Producers write their messages or data to Kafka topics. These topics are divided into partitions that function like logs. Each message is written to a partition and has a unique offset, or identifier. Consumers can specify a particular offset point where they can begin to read messages. Bus for Event Sourcing

Apache Kafka is used as a message bus to stream data change events from the agent to the data hub. Kafka ensures reliable and scalable data streaming.
Data Catalog

The DIH platform learns the source's data structures and creates a catalog where the source and data-grid metadata is kept.
Data Transformation

A layer that extracts and transforms the incoming data into an effective data structure. Multiple on-line functions can be applied on the stream..

Caching and Backup

GigaSpaces Data Hub

An in-memory data grid (IMDG In-Memory Data Grid. A simple to deploy, highly distributed, and cost-effective solution for accelerating and scaling services and applications. It is a high throughput and low latency data fabric that minimizes access to high-latency, hard-disk-drive-based or solid-state-drive-based data storage. The application and the data co-locate in the same memory space, reducing data movement over the network and providing both data and application scalability.) for high-performance caching. It provides fast access to frequently accessed data and enhances the system's overall responsiveness.
Tiered Storage Automatically assigns data to different categories of storage types based on considerations of cost, performance, availability, and recovery.

Through Intelligent Data Tiering, data can extend beyond the data-grid. The data grid oversees the persistence of data using a rule-based process.

Data Services

Micro-Services

Using Smart DIH, micro-services or server-less functions can be created be consuming data from data grid. These services cover specific business logic, providing a scalable and modular architecture. Refer to SpaceDeck – Services to see how micro-services can be created through SpaceDeck.
API Gateway

API Gateway Integration (e.g., AWS API Gateway, Azure API Management) manages and exposes APIs for data access and integration. The API Gateway can enforce security policies, rate limiting, and handle authentication.

Extensions

The following enhancements can be tailored to the needs of our customers:

Mirror Performs the replication of changes to the target table or accumulation of source table changes used to replicate changes to the target table at a later time. If you have implemented bidirectional replication in your environment, mirroring can occur to and from both the source and target tables. to Persistence Database

With the mirror customization functionality a cloud-based data-store (e.g., Amazon RDS, Azure Database for PostgreSQL) can be used to replicate data persistently. This database serves as a replica of the on-premise System of Record.
Event Driven Application Push

GigaSpaces data-grid technology has the ability to notify the application in real time when there is any change to the data. To that end, the customer can add a program with the behavior and destination using special processing units This is the unit of packaging and deployment in the GigaSpaces Data Grid, and is essentially the main GigaSpaces service. The Processing Unit (PU) itself is typically deployed onto the Service Grid. When a Processing Unit is deployed, a Processing Unit instance is the actual runtime entity..
Network Data Sync

Apache Kafka is used as a message bus to stream data change events from the agent to the data hub. Kafka ensures reliable and scalable data streaming.

For information about Smart DIH refer back to the Smart DIH contents page and choose another topic.