Installation Procedure for GigaSpaces Smart DIH Using ODSX

This document describes basic operational procedures for installing GigaSpaces Smart DIH, including the Tiered StorageClosed Automatically assigns data to different categories of storage types based on considerations of cost, performance, availability, and recovery. and Data IntegrationClosed The Data Integration (DI) layer is a vital part of the Digital Integration Hub (DIH) platform. It is responsible for a wide range of data integration tasks such as ingesting data in batches or streaming data changes. This is performed in real-time from various sources and systems of record (SOR. The data then resides in the In-Memory Data Grid (IMDG), or Space, of the GigaSpaces Smart DIH platform. modules.

First we install and deploy the basic Smart DIHClosed Digital Integration Hub. An application architecture that decouples digital applications from the systems of record, and aggregates operational data into a low-latency data fabric. system:

  • Download the Smart DIH software and place it in the proper directories. The downloaded software includes the ODSX script. ODSX allows us to install Smart DIH using a simple CLI-based menu.

  • Run the ODSX script to install and deploy a basic Smart DIH installation, including the Tiered Storage module.

Then we add the Data Integration module:

  • Note that the installation of Data Integration is performed manually and is not automated by the ODSX script.

Finally, we Run the ODSX script again, this time to display the status of the Data Integration servers.

What is Smart DIH?

Digital Integration Hub (DIH) is an application architecture that decouples digital applications from the systems of record, and aggregates operational data into a low-latency data fabric. A digital integration hub supports modernization initiatives by offloading from legacy architecture and providing a decoupled API layer that effectively supports modern on-line applications.

For more information on the architecture of DIH, see SmartDIH-Overview.

Architecture

The example DIH architecture referenced in this topic consists of a Pivot server, used for common access to scripts and resouirces, and several DIH servers. The architecture is shown below.

Installing DIH via ODSX – Basic System

This example installs the basic DIH system, including Tiered Storage. For an example using ODSX for the Data Integration module, see Installing DIH via ODSX – Data Integration Module .

Install the ODSX Installation Script

Smart DIH can be installed using an automated script called ODSX. In order to use ODSX, please note the following prerequisites:

  1. ODSX is supported on RHEL/Centos 7.7.
  2. ODSX must run as root.
  3. ODSX should be able to connect to the other machines via ssh passwordless (using a ssh key) as root.
  4. The Pivot machine should have the following folders:
    • /dbagigashare - a local folder which contains the sources (installation and configuration files) and is shared (i.e. NFS) with the other dih machines.
    • /dbagiga - a local folder (work directory that contains ODSX and DIH)
  5. Download the latest version of ODSX and DIH sources to /dbagigashare on the Pivot server.
  6. In order to download ODSX and DIH sources, please contact Customer Success.

  7. Extract the tar file to /dbagiga and create soft links.
    For example,
    gs-odsx → gs-odsx-3.46-sox-release
    gigaspaces-smart-ods → gigaspaces-smart-dih-enterprise-16.2.0-m26-thu-30
  8. Create a mount point in all the dih machines (except pivot) and configure auto mount
    For example,
    /dbagigashare to: Pivot:/dbagigashare

Run the ODSX Installation Script

Prerequisites: Server setup

For this example, we will install using the following servers and IP addresses:

Number

Instance Type

Instance Name

Private IP

1

ODSX — PIVOT

odsx_demo_pivot

10.0.0.165

2

Manager1

odsx_demo_manager1

10.0.0.198

3

Manager2

odsx_demo_manager2

10.0.0.161

4

Manager3

ocisx_demo_manager3

10.0.0.36

5

Space1

odsx_demo_space1

10.0.0.247

6

Space2

odsx_demo_space2

10.0.0.56

7

Space3

odsx_demo_space3

10.0.0.72

8

GrafanaClosed Grafana is a multi-platform open source analytics and interactive visualization web application. It provides charts, graphs, and alerts for the web when connected to supported data sources./Influx

odsx_demo_influx_grafana

10.0.0.82

Server 1 – Pivot server, containing shared files and the ODSX script

Servers 2, 3, 4 – Managers

Servers 5, 6, 7 – SpaceClosed Where GigaSpaces data is stored. It is the logical cache that holds data objects in memory and might also hold them in layered in tiering. Data is hosted from multiple SoRs, consolidated as a unified data model. objects (stateful Processing UnitsClosed This is the unit of packaging and deployment in the GigaSpaces Data Grid, and is essentially the main GigaSpaces service. The Processing Unit (PU) itself is typically deployed onto the Service Grid. When a Processing Unit is deployed, a Processing Unit instance is the actual runtime entity.)

Server 8 – Grafana and InfluxDB, for metrics

 

Navigate to the Pivot server, install and run the ODSX script

To install the ODSX script, navigate to the scripts folder and run the setup script, logged in as root:

cd gs-odsx/scripts
./setup.sh

Then exit from the terminal and re-login to the Pivot (ODSX) machine.

After connecting to the Pivot server, go to the gs-odsx directory and run the ODSX script:

cd gs-odsx
./odsx.py

The ODSX Menu

The ODSX script presents a structured menu of options and operations, as shown below.

The menu options used in this example as are follows:

[2] Servers – Define and manage servers

[11] Tiered Storage – Manage and update Tiered Storage

 

View Server Information

The [2] Servers option displays the Servers menu, which allows you to install and manage the servers in the system.

This sub menu allows you to specify actions for a specific server.

Install Grafana

Choose options [2] Servers[5] Grafana[1] Install to install Grafana.

When complete, messages similar to the following are displayed:

Choose options [2] Servers[5] Grafana[4] List to see the status of the Grafana server.

Start Grafana

Choose options [2] Servers[5] Grafana[2] Start to start Grafana. A display similar to the following will appear:

Install InfluxDB

Choose options [2] Servers[6] Influxdb[1] Install to install InfluxDB.

When complete, messages similar to the following are displayed:

Install Managers

Choose options [2] Servers[1] Manager[1] Install to install the Managers.

The script will show the servers that are to be installed:

When complete, messages similar to the following are displayed:

Start Managers

Choose options [2] Servers[1] Manager[3] Start to start the Managers.

The script will show information similar to that shown below, as the processing is executing:

When complete, choose options [2] Servers[1] Manager6] List to see information similar to that shown below.

Install Spaces

Choose options [2] Servers[2] Space1] Install to install the Space servers.

The script will show the current cluster configuration:

When complete, choose options [2] Servers[2] Space4] List to see information similar to that shown below.

Start Spaces

Choose options [2] Servers[2] Space2] Start to start the Space servers.

The script will show information similar to that shown below, as the processing is executing:

When complete, choose options [2] Servers2] Space4] List to see information similar to that shown below.

View Spaces with Version Information

Choose options [2] Servers[2] Space6] ListWithVersion to see the list of Space servers with their version information.

The script will show information similar to that shown below, as the processing is executing:

When complete, choose options [2] Servers2] Space4] List to see information similar to that shown below.

Deploy Tiered Storage

Choose options 11] Tiered Storage1] Deploy to deploy Tiered Storage.

In this example, there are no existing Spaces on the cluster with Tiered Storage. Three available Space hosts are also shown.

Enter the requested information as requested by the prompts. Sample information is shown below, including the path to the criteria file for Tiered Storage.

A typical Tiered Storage criteria file is shown below.

When deployment is complete, a screen similar to the one below will display.

Update Tiered Storage

After making changes to the Tiered Storage criteria file, choose options 11] Tiered Storage2] UpdateCachePolicy to update Tiered Storage.

You will see a display similar to the example shown below. Note that the upper part of the screen shows the available Spaces. After choosing Space number 1, the instances of that Space are shown. Follow the prompts to continue the process of updating Tiered Storage.

Installing the Data Integration Module

Flink installation

In order to install standalone FlinkClosed Apache Flink is an open-source, unified stream-processing and batch-processing framework developed by the Apache Software Foundation. The core of Apache Flink is a distributed streaming data-flow engine written in Java and Scala. Flink executes arbitrary dataflow programs in a data-parallel and pipelined manner. please do the following.

1. Flink should be installed on the same server with KafkaClosed Apache Kafka is a distributed event store and stream-processing platform. Apache Kafka is a distributed publish-subscribe messaging system. A message is any kind of information that is sent from a producer (application that sends the messages) to a consumer (application that receives the messages). Producers write their messages or data to Kafka topics. These topics are divided into partitions that function like logs. Each message is written to a partition and has a unique offset, or identifier. Consumers can specify a particular offset point where they can begin to read messages.. Download flink version 1.15.0 from here.

2. As gsods OS user, copy the downloaded tar.gz file to the following directory: /home/gsods/di-flink

3. Unzip and open the downloaded flink archive file:

tar -xzf flink-1.15.0-bin-scala_2.12.tgz

4. Create a symbolic link latest-flink to the new flink directory:

ln -s flink-1.15.0 latest-flink

5. Start Flink as gsods OS user:

cd  /home/gsods/di-flink/latest-flink/bin
./start-cluster.sh

6. Flink UI is started on the port 8081. Connect to Flink UI:

http://<flink server>:8081

This is a basic Apache Flink installation. More advanced instructions of a Flink configuration in a cluster mode and flink services configuration will be provided later.

DI Metadata Manager Installation

In order to install DI Metadata Manager first time do the following.

1. Download the latest DI Metadata Manager tar.gz file

2. Create a directory for DI Metadata Manager software as gsods OS user:

mkdir /home/gsods/di-Metadata Manager

3. Copy the downloaded tar.gz to the /home/gsods/di-Metadata Manager directory.

4. Change to the /home/gsods/di-Metadata Manager

cd  /home/gsods/di-Metadata Manager

5. Unzip and open the downloaded di-Metadata Manager archive file

tar -xzf di-Metadata Manager-<version>.tar.gz

6. Create a symbolic link latest-di-Metadata Manager to the newly created directory:

ln -s di-Metadata Manager-<version> latest-di-Metadata Manager

7. As root OS user, change to the config directory of the di-Metadata Manager

su - root
cd /home/gsods/di-Metadata Manager/latest-di-Metadata Manager/config

8. Copy di-Metadata Manager service file to the systemd services directory (as root OS user)

cp di-Metadata Manager.service /etc/systemd/system

9. Reload systemd configuration (as root OS user)

systemctl daemon-reload

10. Start di-Metadata Manager service (as root OS user)

systemctl start di-Metadata Manager

11. Monitor the log of the di-Metadata Manager (as gsods OS user)

/home/gsods/di-Metadata Manager/latest-di-Metadata Manager/logs/di-Metadata Manager.log

12. DI Metadata Manager service is started on a port 6081, check the DI Metadata Manager restClosed REpresentational State Transfer. Application Programming Interface An API, or application programming interface, is a set of rules that define how applications or devices can connect to and communicate with each other. A REST API is an API that conforms to the design principles of the REST, or representational state transfer architectural style. service

http://<di Metadata Manager>:6081/swagger-ui

DI Manager installation

In order to install DI Manager first time do the following.

1. Download the latest DI Manager tar.gz file

2. Create a directory for DI Manager software as gsods OS user

mkdir /home/gsods/di-manager

3. Copy the downloaded tar.gz to the /home/gsods/di-manager directory

4. Change to the /home/gsods/di-manager directory

cd  /home/gsods/di-manager

5. Unzip and open the downloaded di-manager archive file

tar -xzf di-manager-<version>.tar.gz

6. Create a symbolic link latest-di-manager to the newly created directory

ln -s di-manager-<version> latest-di-manager

7. Change the di-manager parameter to point to the correct di-Metadata Manager server (as gsods OS user)

cd  /home/gsods/di-manager/latest-di-manager/config
vi di-manager-application.properties

8. Change Metadata Manager.server.url to point to the Metadata Manager server

Avoid using localhost but the actual hostname where Metadata Manager is running. Usually DI-Metadata Manager and DI-Manager will be running on the same host.

9. As root OS user change to the configdirectory of the di-manager

su - root
cd /home/gsods/di-manager/latest-di-manager/config

10. Copy the di-manager service file to the systemd services directory (as root OS user)

cp di-manager.service /etc/systemd/system

11. Reload systemd configuration (as root OS user)

systemctl daemon-reload

12. Start the di-manager service (as root OS user)

systemctl start di-manager

13. Monitor the log of the di-manager (as gsods OS user)

/home/gsods/di-manager/latest-di-manager/logs/di-manager.log

14. DI Manager service is started on port 6080, check the DI Manager rest service

http://<di manager host>:6080/swagger-ui

DI packages - upgrade procedure

In order to install new DI package please do the following.

1. Via postman stop active pipelines

2. As gsods OS user, download the new DI package (di-mdm or di-manager) to the /home/gsods/<DI component>

3. For di-mdm, download the new tar.gz package to the /home/gsods/di-mdm directory

4. For the di-manager, download the new tar.gz package to the /home/gsods/di-manager directory

5. As gsods OS user, unzip / untar the newly downloaded package

tar -xzf <new tar file>

6. For di-mdm do the following manual steps as gsodsOS user:

cd /home/gsods/di-mdm/<new package>/lib
cp /home/gsods/di-mdm/latest-di-mdm/lib/sqlitesample.db .

7. For di-manager do the following manual step as gsods OS user:

tar -xzf di-manager-<version>.tar.gz
cd /home/gsods/di-manager/<new package>/config
cp /home/gsods/di-manager/latest-di-manager/config/di-manager-application.properties .

The two manual steps shown above will be automated in a future release.

8. As root OS user do the following:

9. Go to the utils directory of the newly installed package

cd /home/gsods/di-mdm/<new package>/utils

10. Run the installation script

./install_new_version.sh

The installation script does the following:

  • Stops the running service of the DI component (di-manager, di-mdm)

  • Changes the symbolic link latest-<di component> to the active package directory

  • Starts the DI component service

11. As gsods OS user , upload the updated di-processor jar file to the flink

12. Remove the already uploaded di-processor jar file from the flink UI

13. Verify in Postman that environment variable points to the latest Metadata Manager processor jar:

14. Via postman run Configure Flink API:

15. Via postman start a pipeline:

Working with Postman

The Postman standalone application can be found very useful in organizing the various DI layer Rest APIsClosed REpresentational State Transfer. Application Programming Interface An API, or application programming interface, is a set of rules that define how applications or devices can connect to and communicate with each other. A REST API is an API that conforms to the design principles of the REST, or representational state transfer architectural style. in a correct logical order.

Create environment variables

As a first step let's define environment with all required environment variables.

  1. Open Postman.

  2. Go to the Environments tab:

  3. Create a new variables environment:

  4. Provide a name for this new environment , for example “Development”:

  5. Define the environment variables listed in the table below.
  6. Variable name

    Example

    Description 

    managerUrl

    http://di-stage-s1:6080

    The http URL endpoint of the DI Manager including the port

    mdmUrl

    http://di-stage-s1:6081

    The http URL endpoint of the MDM including the port

    flinkUrl

    http://di-stage-kafka1:8081

    The http URL endpoint of the Flink including port

    bootstrapServers

    di-stage-kafka1:9092,

    di-stage-kafka2:9092,

    di-stage-kafka3:9092

    Kafka bootstrap servers including ports. Multiple Kafka servers can be included in a comma separated format 

    spaceLookupLocators

    di-stage-gs1

    Space server

    spaceLookupGroups

    xap-16.2.0

     

    spaceName

    DEV

    Space name

    kafkaGroupId

    diprocessor

    Kafka DI Consumer group name

    kafkaTopic_CDCClosed Change Data Capture. Primarily used for data that is frequently updated, such as user transactions

    pipeline_cdc

    The name of the Kafka topic for CDC changes

    kafkaTopic_IL

    pipeline_il

    The name of the Kafka topic for initial load changes

    diProcessorJar

    /home/gsods/di-manager/latest-di-manager/lib/job-0.0.7.jar

    The full path of the DI Processor jar

    pipelineId

     

    The ID of the pipeline. This is generated upon pipeline creation and can be retrieved later on via list pipelines REST API

    GET {{managerUrl}}/api/v1/pipeline/

    dbUrl

     

    JDBCClosed Java DataBase Connectivity. This is an application programming interface (API) for the Java programming language, which defines how a client may access a database. URL to connect to the source database Db2 zos example:

    jdbc:db2://<IP>:<db port>/<DB location>

    dbUsername

     

    DI database user name to connect to the source database 

    dbName

     

    The name of the source database inside DI internal repository  (this can be any name that logically represents the System of Records)

    dbSchemaName

     

    The source database schema that owns tables that DI captures changes from 

  7. Save the newly created environment variables.

Import Collection of APIs

Postman API collection is a group of APIs organized in a logical order that represents certain product functionality , such as create pipeline, define generic environment, or administer pipeline.

DI Postman collection is released with every DI release and can be imported into the local postman environment.

In order to import released DI collection, proceed as follows.:

  1. Open Postman

  2. Go to the Collections tab and click Import:

  3. In a file tab click on “Upload files” and choose the newly downloaded DI Postman collection file to import from:

  4. At the end of the import you should see a new collection present under the “Collections” tab.

Display the Status of the Data Integration Servers

After connecting to the Pivot server, go to the gs-odsx directory and run the ODSX script, logged in as root:

cd gs-odsx
./odsx.py

The ODSX Menu

The ODSX script presents a structured menu of options and operations, as shown below.

View Server Information

The [2] Servers option displays the Servers menu, which allows you to install and manage the servers in the system.

The sub menu allows you to specify actions for a specific server.

Choose [3] DI, and the following menu appears:

Option [4] list shows the available DI servers and their status. An example display is shown below.

APPENDIX – DI v2.0 Layer - Ports

#

DI Component

Protocol : Port

Accessed by

Remarks

1

IIDRClosed IBM Infosphere Data Replication. This is a solution to efficiently capture and replicate data, and changes made to the data, on relational source systems. Used to move data from DataBases to the In-Memory Data Grid. DB2ZOS Agent

TCP:11801

IIDR Windows UI

 

2

IIDR Kafka Agent

TCP:11701

IIDR Windows UI

 

3

IIDR Access Server

TCP:10101

IIDR Windows UI

 

4

Flink server

HTTP:8081

Windows UI DI Manager

 

5

Kafka

TCP:9092

Flink server IIDR Kafka Agent

Flink and Kafka are running on the same server

6

ZookeeperClosed Apache Zookeeper. An open-source server for highly reliable distributed coordination of cloud applications. It provides a centralized service for providing configuration information, naming, synchronization and group services over large clusters in distributed systems. The goal is to make these systems easier to manage with improved, more reliable propagation of changes.

TCP:2181

Metadata Manager

 

7

DB2 ZOS

TCP:<DB port>

IIDR DB2ZOS Agent Metadata Manager DI Manager

 

8

ZOS

TCP:22

IIDR DB2ZOS Agent

 

9

DI Manager

HTTP:6080

GS UI (Windows)

 

10

Metadata Manager

HTTP:6081

GS UI(Windows) DI Manager Flink Server (DI Processor)

Metadata Manager and DI Manager are running on the same server

 

Need assistance or further information? Don't hesitate to contact us — we're here to help!