SpaceDeck – Data Pipeline – Create New Pipeline

Data Pipelines allow a convenient, no-code method to pipe data from the System of Record to the GigaSpaces in-memory data grid.

A new data pipeline definition will include the definitions of the System of Record databases, tables and fields that will provide data to the pipeline. The definition also indicates the in-memory Space that will receive the pipeline data.

Additional information includes optional validation rules and automatic conversion of specified field definitions.

Display the Configuration screen

From the Data PipelineClosed A series of data processing steps, including extraction, transformation, and loading (ETL), that move data from its source to a destination system. Data pipelines are essential for integrating and managing data flows. main screen, click Create New + to begin defining your first pipeline.

 

 

Pipeline Configuration Screen

STATUS - Possible statuses are: Inactive, Running, Error and Warning.

 

 

 

The Pipeline Configuration screen appears as follows:

Basic Pipeline Information

You can fill in some or all of the pipeline configuration items (shown below) from a JSON-format configuration file by clicking the Load Configuration button.
The configuration file may contains some or all of the required details. After the configuration details are loaded from the configuration file, they can be edited if desired, before saving .

Click Create Pipeline to create the new data pipeline.

Select Tables for the Pipeline

Once the pipeline has been created, click Add / Remove tables to select which tables to include in the pipeline.

 

 

A list of available tables will be presented.  Select the table to be added, and click Add to add the selected tables to the pipeline.

Once Add is clicked, it will take a few seconds to update.

Edit the Tables

Editing the table can only be performed during the initial setup and configuration process. The pipeline has to be deleted and configured from new in order to perform this action again.

From the Included Tables area of the screen, click Edit

1: Return to Data Pipelines screen

2: Go to previous Space Type entry

3. Go to next Space Type entry

4. Add Indexes

It is possible to change the name of the Space Type to be different to that of the table name.

Indexes, Space ID and Routing ID

The Indexes area contains the following fields and options:

  • Index Name – Supply an index name

  • Properties – Select the columns to be part of this index

 

The Space ID area contains the following fields and options:

  • Position – In the case of a compound key (multiple keys in the same key), the position of every field in a key needs to be identified (first column, second column, etc.). For a single column, it should always be 1 (default).

  • Primary Key - Serves as a unique identifier in a table (space type)

 

The RoutingClosed The mechanism that is in charge of routing the objects into and out of the corresponding partitions. The routing is based on a designated attribute inside the objects that are written to the Space, called the Routing Index. ID area contains the following fields and options:

  • Routing ID – Select a column in order to change the routing key.

Tiered storage will be part of a future release.

In the Fields tab of this screen, the Field (column) names are initially the names of the fields from the database table that are included in the data pipeline. These can be edited to provide different property names (column names) in the GigaSpaces object type (table), or to include or exclude it.
Other fields in this screen will be editable in a future release.

The Fields tab contains the following fields and options:

  • Field ID – Search for a field by field id

  • Field (column) names – Initially, the names of the fields from the database table, that are included in the data pipeline. These can be edited to provide different property names (column names) in the GigaSpaces object type (table).
    In this example, the fields are named MEAL_ID, COMPANY_ID, etc.

If you click one of the field names, the Pipeline Fields section appears, which allows you to edit the field characteristics:

  • Name – Name of the field

  • Included Toggle - Can include or exclude the a field of the table from the space type.

Once the setup has been completed, return to the main Data Pipelines screen and select save.

Remove the table and then add it again to be able to edit it again

Start the Pipeline

Once the pipeline has been defined, it can be started from two different menus.

  1. From the Configuration menu, accessed by selecting the pipeline name from the main menu:

  2. From the main screen by clicking Start Pipeline from the kebab menu (vertical three-dot menu) on the far right.

Then a Parameters to apply pop-up is displayed:

Point in time options:

  • EARLIEST - Start a pipeline from the beginning, means whatever is available for consumption in our queue , start processing from the beginning.

  • LATEST - Start a pipeline from now , start consuming and processing data from a current message regardless last stored checkpoint.

  • COMMITTED - Start a pipeline from the last successful processed message (kind of a checkpoint).

Once the pipeline has been started, it cannot be edited. To make changes it is necessary to delete the pipeline (from the main data Pipeline screen) and build it again.

After you have added the tables and saved the pipeline, save the changes and press Start to start the pipeline.

The pipeline will show as Started in the Data Pipeline Status screen and Active in the Spaces screen.

Deleting a Pipeline

From the main screen by clicking Delete Pipeline from the kebab menu (vertical three-dot menu) on the far right.

If you decide to delete a pipeline the following confirmation pop-up screen will be displayed before the action can be completed:

Once Approve has been clicked, the pipeline is deleted immediately and cannot be restored.