11 March 2024

Export/Import Tool

Download
Github link

The Export/Import tool was originally created to help our engineers quickly replicate test scenarios. Since its creation this tool has evolved to be an easy-to-use method for migrating data to new GigaSpaces deployments, capturing data snapshots, and bootstrapping disparate environments.

The Export/Import tool leverages several GigaSpaces features that make it effective for each use case. The fundamental feature leveraged by Export/Import is GigaSpaces's Task Execution API.

When the Export/Import tool is started it will send either an Export or Import task to each partition, and this is where the actual operation will be performed. To put it simply each primary instance is responsible for exporting or importing their own data.

When is the Export/Import Tool Useful?

Creating a snapshot of data
Introducing a new environment
Upgrading product versions
Setting up integration-style test scenarios

You can download and build the source code from our GitHub repository. Directions on how to build the project can be found in the repository's README document. Alternatively you can contact your GigaSpaces Technical Account Manager for pre-built binaries.

Export

During the export a remote task is sent to each primary space instance on the grid; or a subset of the space instances if specified via the command line options.

Once the task begins executing on the grid it will acquire a list of all space classes described in that instance, and use this list to drive the creation of export files. It is at this time that a new thread pool will be created to dictate how many files can be exported in parallel.

For each combination of class name and partition a query will be performed on the local space instance. If any space class instances match the route to the new partition it will be written to disk. If no matches were found the file will not be written.

Files Name Pattern:

{class-name-with-package}.{originating-partition}.{target-partition}.ser.gz
Example: com.j_spaces.examples.benchmark.messages.MessagePOJO.ser.gz

File Content Structure (Uncompressed):

UTF: Class Name
Obj: Specialized Type Description (Portable/Serializable Class Definition)
Obj: Space Where GigaSpaces data is stored. It is the logical cache that holds data objects in memory and might also hold them in layered in tiering. Data is hosted from multiple SoRs, consolidated as a unified data model. Class Instance (x Row Count)

How Exporting Data Works

Usage

Due to the number of configuration options available we are unable to show all permutations of the tool, but the simplest and most common usage is shown below.

./setAppEnv.sh

$JAVA_HOME/bin/java -cp $GS_HOME/lib/required/*:./lib/* com.gigaspaces.tools.importexport.Program -o export -g $LOOKUPGROUPS -s myspace --jarless -d /tmp/export/output

call "%~dp0\setAppEnv.bat"

%JAVA_HOME%\bin\java.exe -cp %GS_HOME%\lib\required\*;.\lib\* com.gigaspaces.tools.importexport.Program -o export -g %LOOKUPGROUPS% -s myspace --jarless -d c:\tmp\output

PS C:\var\import-export> .\export.ps1
2015-12-12 23:02:15,130 CONFIG [com.gigaspaces.logger] - Log file: C:\opt\gigaspaces\xap-10.2.0-ga\logs\2015-12-12~23.02
-gigaspaces-service-riomhairenua-13864.log
2015-12-12 23:02:15,125  INFO [com.gigaspaces.tools.importexport.config.SpaceConnectionFactory] - Creating connection wi
th url:
/./myspace?total_members=2,0&schema=default&cluster_schema=partitioned-sync2backup&id=1&groups=space-test-10&state=start
ed
2015-12-12 23:02:15,199  INFO [import-export] - Started import/export operation with the following configuration:
EXPORT [Space: myspace, Lookup Groups: [space-test-10], Lookup Locators: [], Output/Input Directory: c:\var\import-expor
t\output, Operating Partitions: '[]', Export/Import Classes: '[]', XAP GigaSpaces eXtreme Application Platform.
Provides a powerful solution for data processing, launching, and running digital services Read Batch Size: 1000, PU This is the unit of packaging and deployment in the GigaSpaces Data Grid, and is essentially the main GigaSpaces service. The Processing Unit (PU) itself is typically deployed onto the Service Grid. When a Processing Unit is deployed, a Processing Unit instance is the actual runtime entity. Name Override: null, Se
curity level: null, New partition count: Not specified, Threads: 20, Jarless: true, Thread sleep ms: 1000]

2015-12-12 23:02:17,203  INFO [import-export] - Partition 1 Finished ---------------------

        Partition Id: 1
        Process Id: 10048
        Hostname: 127.0.0.1
        Elapsed Process Time (ms): 1610

        Files:
                com.j_spaces.examples.benchmark.messages.MessagePOJO.1.1.ser.gz | Records: 5000 | Elapsed time (ms): 742


2015-12-12 23:02:17,204  INFO [import-export] - Partition 2 Finished ---------------------

        Partition Id: 2
        Process Id: 11592
        Hostname: 127.0.0.1
        Elapsed Process Time (ms): 1595

        Files:
                com.j_spaces.examples.benchmark.messages.MessagePOJO.2.2.ser.gz | Records: 5000 | Elapsed time (ms): 737


PS C:\var\import-export>

Options

Short Name	Long Name	Optional / Required	Default Value	Acceptable Values	Description
Grid Connection Information
-s	--space	required	n/a	n/a	Name of the target space to perform the operation on.
-l	--locators	optional	n/a	n/a	A comma separated list of data grid lookup locators for the target grid.
-g	--groups	optional	n/a	n/a	A comma separated list of data grid lookup groups for the target grid.
-u	--username	optional	n/a	n/a	Specifies a data grid username with read and execute privileges. Required when connecting to a secured grid.
-a	--password	optional	n/a	n/a	Specifies a data grid password corresponding to the specified data grid username. Required when connecting to a secured grid.
	--security-level	optional	n/a	grid, space, both	Indicates the level of security for the grid.
General Configuration
-o	--operation	required	export	export, import	A flag indicating whether an export or import will be performed.
-d	--directory	required	n/a	n/a	A full path to the directory containing either previously exported files, or where the exported files should be placed.
	--pu-name	optional	n/a	n/a	Overrides the name of the processing unit This is the unit of packaging and deployment in the GigaSpaces Data Grid, and is essentially the main GigaSpaces service. The Processing Unit (PU) itself is typically deployed onto the Service Grid. When a Processing Unit is deployed, a Processing Unit instance is the actual runtime entity., relevant only when the processing unit is different from space name.
	--jarless	optional	n/a	n/a	Indicates that the import / export will not use Java class definitions during processing.
-c	--classes	optional	n/a	n/a	A comma separated list of class names to operate on. The class names are case sensitive.
-p	--partitions	optional	n/a	n/a	A comma separated list of partitions that will be operated on.
-n	--number	optional	n/a	n/a	Relevant only when exporting data for use in a grid with a different partition count (i.e. Exporting data from a 6 partition grid to 2 partition grid or vice versa.)
Performance Configuration
-b	--batch	optional	1000	n/a	Performance option to batch records retrieved from the space.
	--thread-sleep	optional	1000	n/a	Number of milliseconds to sleep between checks for task completion.
-t	--threads	optional	20	n/a	Number of threads to simultaneously process import or export files.

Import

During the import a remote task is sent to each primary space instance. From there each primary instance will search the file system for relevant files. Relevancy is determined by the second integer in the file name also known as the destination partition ID.

All files destined for the new partition will then be queued and processed from the export/import thread pool. The number of files that will be processed in parallel for each partition is configurable and based on the size of the export/import thread pool.

How Exporting Data Works

Usage

Due to the number of configuration options available we are unable to show all permutations of the tool, but the simplest and most common usage is shown below.

./setAppEnv.sh

$JAVA_HOME/bin/java -cp $GS_HOME/lib/required/*:./lib/* com.gigaspaces.tools.importexport.Program -o import -g $LOOKUPGROUPS -s myspace -d /tmp/export/output

call "%~dp0\setAppEnv.bat"

%JAVA_HOME%\bin\java.exe -cp %GS_HOME%\lib\required\*;.\lib\* com.gigaspaces.tools.importexport.Program -o import -g %LOOKUPGROUPS% -s myspace -d c:\tmp\output

PS C:\var\import-export> .\import.ps1
2015-12-13 00:58:12,624 CONFIG [com.gigaspaces.logger] - Log file: C:\opt\gigaspaces\xap-10.2.0-ga\logs\2015-12-13~00.58-gigaspaces-service-
riomhairenua-3836.log
2015-12-13 00:58:12,618  INFO [com.gigaspaces.tools.importexport.config.SpaceConnectionFactory] - Creating connection with url:
/./myspace?total_members=2,0&schema=default&cluster_schema=partitioned-sync2backup&id=1&groups=space-test-10&state=started
2015-12-13 00:58:12,697  INFO [import-export] - Started import/export operation with the following configuration:
IMPORT [Space: myspace, Lookup Groups: [space-test-10], Lookup Locators: [], Output/Input Directory: c:\var\import-export\output, Operating
Partitions: '[]', Export/Import Classes: '[]', XAP Read Batch Size: 1000, PU Name Override: null, Security level: null, New partition count:
 Not specified, Threads: 20, Jarless: false, Thread sleep ms: 1000]

2015-12-13 00:58:14,699  INFO [import-export] - Partition 1 Finished ---------------------

        Partition Id: 1
        Process Id: 10048
        Hostname: 127.0.0.1
        Elapsed Process Time (ms): 1096

        Files:
                com.j_spaces.examples.benchmark.messages.MessagePOJO.1.1.ser.gz | Records: 5000 | Elapsed time (ms): 1095

2015-12-13 00:58:14,700  INFO [import-export] - Partition 2 Finished ---------------------

        Partition Id: 2
        Process Id: 11592
        Hostname: 127.0.0.1
        Elapsed Process Time (ms): 1099

        Files:
                com.j_spaces.examples.benchmark.messages.MessagePOJO.2.2.ser.gz | Records: 5000 | Elapsed time (ms): 1092

PS C:\var\import-export>

Options

Short Name	Long Name	Optional / Required	Default Value	Acceptable Values	Description
Grid Connection Information
-s	–space	required	n/a	n/a	Name of the target space to perform the operation on.
-l	–locators	optional	n/a	n/a	A comma separated list of data grid lookup locators for the target grid.
-g	–groups	optional	n/a	n/a	A comma separated list of data grid lookup groups for the target grid.
-u	–username	optional	n/a	n/a	Specifies a data grid username with read and execute privileges. Required when connecting to a secured grid.
-a	–password	optional	n/a	n/a	Specifies a data grid password corresponding to the specified data grid username. Required when connecting to a secured grid.
	–security-level	optional	n/a	grid, space, both	Indicates the level of security for the grid.
General Configuration
-o	–operation	required	export	export, import	A flag indicating whether an export or import will be performed.
-d	–directory	required	n/a	n/a	A full path to the directory containing either previously exported files, or where the exported files should be placed.
	–pu-name	optional	n/a	n/a	Overrides the name of the processing unit, relevant only when the processing unit is different from space name.
Performance Configuration
-b	–batch	optional	1000	n/a	Performance option to batch records retrieved from the space.
	–thread-sleep	optional	1000	n/a	Number of milliseconds to sleep between checks for task completion.
-t	–threads	optional	20	n/a	Number of threads to simultaneously process import or export files.

Troubleshooting & Frequently Asked Questions

Can I use the Export/Import tool with a secured grid?

Yes, the Export/Import tool will work with secured infrastructure components and/or secured spaces. The username and password provided should have sufficient privileges to execute a remote task on the grid.

If using a secured space the space must be authenticated with sufficient privileges to read, write, and update all classes, as well as monitor_pu.

I'm receiving a FileNotFoundException; what does this mean?

The most common reason for this exception is that your storage directory (the -d command line option) may not exist on all hosts. Double check the directory exists before re-running the Export/Import tool.

Can I use this to upgrade product versions?

Yes, this tool has been tested for upgrades between XAP 9.7, XAP 10, and XAP 11.

One or more of my files did not get imported, why is that?

Each partition is responsible for exporting or importing its data. Because of this the files will be stored in the directory provided on that partition's host machine.

It is recommended that all machines mount a NFS drive that will be used during the export and import operations. This will ensure all partitions regardless of hosts will have access to the export files.

Why should I use the `--jarless` option during export?

If you do not provided the --jarless option during export you will be required to include your application jars on the Export/Import tool's classpath as well as the classpath used by your processing unit running on data grid. The jars would be required on both import and export operations.

When --jarless is provided all objects will be read as a data grid Space Document this removes the requirement to include your application jars in the classpath. Documents and Space Classes can be interoperable, and when following data grid documentation and best practices should be interoperable.

Why am I receiving a NoClassDefFoundError and/or a ClassNotFoundException?

There are several reasons this could be. If you're seeing this on one of your space classes during import it is because the classes were exported without the --jarless option and you will need to include your jars in the Export/Import tools classpath as well as the Processing Unit's classpath.

If this is occurring during export you may be missing jars on the classpath, or the class definitions in the space may not be available to the export thread. If it is the latter the solution would be making your jars available to the processing unit instances by placing the required jars in your pu_common folder before the processing units are deployed.

Export/Import Tool

When is the Export/Import Tool Useful?

Export

Usage

Options

Import

Usage

Options

Troubleshooting & Frequently Asked Questions

Can I use the Export/Import tool with a secured grid?

I'm receiving a FileNotFoundException; what does this mean?

Can I use this to upgrade product versions?

One or more of my files did not get imported, why is that?

Why should I use the --jarless option during export?

Why am I receiving a NoClassDefFoundError and/or a ClassNotFoundException?

Why should I use the `--jarless` option during export?