18 March 2024

Full Text Search

GigaSpaces products provide full text search capability, leveraging the Lucene search engine library.

The following features are supported:

Keyword matching
Search for phrase
Wildcard matching
Proximity matching
Range searching
Boosting a term
Regular expressions
Fuzzy search

Full text search queries can be used with any Space Where GigaSpaces data is stored. It is the logical cache that holds data objects in memory and might also hold them in layered in tiering. Data is hosted from multiple SoRs, consolidated as a unified data model. operation that supports SQL queries (read, readMultiple, take, etc.).

Dependencies
In order to use this feature, include the $GS_HOME/lib/optional/full-text-search/xap-full-text-search.jar file on your classpath or use Maven dependencies:

<dependency>
    <groupId>org.gigaspaces</groupId>
    <artifactId>xap-full-text-search</artifactId>
    <version>17.0-m1</version>
</dependency>

For more information about dependencies, see Maven Artifacts.

Examples

Text search queries are available through the text: extension to the SQL query syntax.

For example, suppose we have a class called NewsArticle with a String property called content and a String property called type:

// Matching 
SQLQuery<NewsArticle> query = new SQLQuery<NewsArticle>(NewsArticle.class, "content text:match ?");
query.setParameter(1, "deployment"); 
    
// Wildcard search
// To perform a single character wildcard search use the "?" symbol. 
SQLQuery<NewsArticle> query = new SQLQuery<NewsArticle>(NewsArticle.class, "content text:match ?");
query.setParameter(1, "GigaSpac?s");
        
// To perform a multiple character wildcard search use the "*" symbol.
SQLQuery<NewsArticle> query = new SQLQuery<NewsArticle>(NewsArticle.class, "content text:match ?");
query.setParameter(1, "clou*y");
    
//Regular Expression search
SQLQuery<NewsArticle> query = new SQLQuery<NewsArticle>(NewsArticle.class, "content text:match ?");
query.setParameter(1, "/[tp]es/");

// Fuzzy Search
SQLQuery<NewsArticle> query = new SQLQuery<NewsArticle>(NewsArticle.class, "content text:match ?");
query.setParameter(1, "space~");

// Boolean operator
SQLQuery<NewsArticle> query = new SQLQuery<NewsArticle>(NewsArticle.class, "content text:match ? AND type text:match ?");
query.setParameter(1, "space");
query.setParameter(1, "blog");

Supported Search Operations

GigaSpaces supports the Lucene Query Parser Syntax except Fields.

Nested Properties

In the example below, the author is a property of type Person which is a property of NewsArticle:

@SpaceClass
public class NewsArticle {
    private UUID id;
    private String content;
    private Person author;
    private Long articleNumber;
    private String type;

    public String getContent() {
        return content;
    }
    public Person getAuthor() {
        return author;
    }
    public void setAuthor(Person author) {
        this.author = author;
    }
    //......
}

public class Person {
    private String firstName;
    private String lastName;

    public String getFirstName() {
        return firstName;
    }
    public void setFirstName(String firstName) {
        this.firstName = firstName;
    }
    public String getLastName() {
        return lastName;
    }
    public void setLastName(String lastName) {
        this.lastName = lastName;
    }
}

And here is an example how you can query for nested properties:

SQLQuery<NewsArticle> query = new SQLQuery<NewsArticle>(NewsArticle.class, "author.firstName text:match ? AND  author.lastName text:match ?");
query.setParameter(1, "Friedrich");
query.setParameter(2, "Durrenmatt");

Combining Text and Standard Predicates

Suppose our NewsArticle class contains a articleNumber property as well, and we want to enhance our query and find the NewsArticle with a articleNumber. We can simply add the relevant predicate to the query’s criteria:

SQLQuery<NewsArticle> query = new SQLQuery<NewsArticle>(NewsArticle.class, "content text:match ? AND articleNumber < ?");
query.setParameter(1, "deployment");
query.setParameter(2, new Long(1000));

Analyzer

An Analyzer is responsible for supplying a TokenStream which can be consumed by the indexing and searching processes in Lucene. There are several different Analyzers available.

You can use the @SpaceTextAnalyzer annotation to choose the Analyzer:

import org.apache.lucene.analysis.core.KeywordAnalyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.openspaces.textsearch.SpaceTextAnalyzer;
import org.openspaces.textsearch.SpaceTextIndex;
import org.openspaces.textsearch.SpaceTextIndexes;

import com.gigaspaces.annotation.pojo.SpaceClass;
import com.gigaspaces.annotation.pojo.SpaceId;

@SpaceClass
public class NewsArticle {
    private UUID id;
    private String content;
    private Person author;
    private Long articleNumber;
    private String type;
    
    @SpaceTextIndex
    @SpaceTextAnalyzer(analyzer = StandardAnalyzer.class)
    public String getContent() {
        return content;
    }

    @SpaceTextAnalyzer(analyzer = KeywordAnalyzer.class)
    public String getType() {
        return type;
    }
  // ....
}

For nested properties, you can use the @SpaceTextAnalyzersannotation:

@SpaceClass
public class NewsArticle {
    private UUID id;
    private String content;
    private Person author;
    private Long articleNumber;
    private String type;

    @SpaceTextAnalyzers({ @SpaceTextAnalyzer(path = "firstName", analyzer = KeywordAnalyzer.class),
            @SpaceTextAnalyzer(path = "lastName", analyzer = StandardAnalyzer.class) })
    public Person getAuthor() {
        return author;
    }
    // .....

If the @SpaceTextAnalyzer annotation is omitted, the StandardAnalyzer is applied.

For collection properties, you can use the @SpaceTextAnalyzers annotation:

@SpaceClass
public class Director {
    private UUID id;
    private List<Movie> movies;

    @SpaceTextAnalyzers({ @SpaceTextAnalyzer(analyzer = KeywordAnalyzer.class,path = "[*].title")})
    public List<Movie> getMovies() {
        return movies;
    }
    // .....

Indexing

The performance of text search queries can be vastly improved by indexing the relevant properties. For detailed information see See Indexing for more information.

Space Document

The text search is also supported with Space Documents. Lets take the above example of the NewsArticle and use it as a SpaceDocument:

DocumentProperties author = new DocumentProperties();
author.put("firstName", "Friedrich");
author.put("lastName", "Durrenmatt");

SpaceDocument doc =  new SpaceDocument("NewsArticle")
    .setProperty("id", 1)
    .setProperty("content", "The quick brown fox jumps over the lazy dog")
    .setProperty("author", author);
 
// ...

Defining the TypeDescriptor and registering with the Space is done with the addQueryExtensionInfo method:

GigaSpace gigaSpace = new GigaSpaceConfigurer(new EmbeddedSpaceConfigurer("xapSpace")).gigaSpace();

// Simple 
gigaSpace.getTypeManager().registerTypeDescriptor(new SpaceTypeDescriptorBuilder(typeName).idProperty("id").create());
                
// Analyzer                                     
gigaSpace.getTypeManager().registerTypeDescriptor(new SpaceTypeDescriptorBuilder(typeName).idProperty("id")
                .addQueryExtensionInfo("content",LuceneTextSearchQueryExtensionProvider.analyzer(KeywordAnalyzer.class))
                .create());

// Nested Analyzer
gigaSpace.getTypeManager().registerTypeDescriptor(new SpaceTypeDescriptorBuilder(typeName).idProperty("id")
                .addQueryExtensionInfo("author.firstName",LuceneTextSearchQueryExtensionProvider.analyzer(KeywordAnalyzer.class))
                .addQueryExtensionInfo("author.LastName",LuceneTextSearchQueryExtensionProvider.analyzer(StandardAnalyzer.class)).create());

Search the space for SpaceDocuments:

SQLQuery<SpaceDocument> query = new SQLQuery("NewsArticle", "content text:match ?").setParameter(1, "The quick brown fox jumps over the lazy dog");
SpaceDocument result = this.gigaSpace.read(query);

Refer to SpaceDocument for more information on SpaceDocument.

Configuration

Property	Description	Default
lucene.storage.location	The location of the lucene index	Deploy path of this space instance, when deployed in the service grid. When not deployed in the service grid <user.dir>/xap/full_text_search
lucene.storage.directory-type	The directory type. Available values: MMapDirectory, RAMDirectory.	MMapDirectory
lucene.max-uncommitted-changes	The buffer size of uncommitted changes. When user write indexed document to the space, the document doesn’t flushes to the lucene index immediately. It flushes after search or after overflowing the buffer.	1000
lucene.max-results	The max number of the document retrieved from lucene during the search.	Integer.MAX_VALUE

Configuration Code Example:

final Properties luceneProperties = new Properties();
				luceneProperties.setProperty("lucene.max-results", "10000");
				luceneProperties.setProperty("lucene.storage.directory-type", "RAMDirectory");
				final LuceneTextSearchQueryExtensionProvider queryExtensionProvider = new LuceneTextSearchQueryExtensionProvider(luceneProperties);

				this.gs = new GigaSpaceConfigurer(new EmbeddedSpaceConfigurer("testSpace")
				.addProperties(gsProperties)
				.addQueryExtensionProvider(queryExtensionProvider))
				.gigaSpace();

PU This is the unit of packaging and deployment in the GigaSpaces Data Grid, and is essentially the main GigaSpaces service. The Processing Unit (PU) itself is typically deployed onto the Service Grid. When a Processing Unit is deployed, a Processing Unit instance is the actual runtime entity. Example:


				<?xml version="1.0" encoding="UTF-8"?>
				<beans xmlns="http://www.springframework.org/schema/beans"
					xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:os-core="http://www.openspaces.org/schema/core"
					xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
					http://www.openspaces.org/schema/core http://www.openspaces.org/schema/core/openspaces-core.xsd">
					<bean id="propertiesConfigurer" class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">
						<property name="properties">
							<props>
								<prop key="dataGridName">dataGrid</prop>
								<prop key="maxResults">100</prop>
								<prop key="maxUncommitedChanges">100</prop>
								<prop key="directoryType">RAMDirectory</prop>
							</props>
						</property>
					</bean>
					<bean id="luceneSpatialQueryExtensionProvider" class="org.openspaces.textsearch.LuceneTextSearchQueryExtensionProvider">
					<constructor-arg name="customProperties">
							<props>
								<prop key="lucene.max-results">${maxResults}</prop>
								<prop key="lucene.max-uncommitted-changes">${maxUncommitedChanges}</prop>
								<prop key="lucene.storage.directory-type">${directoryType}</prop>
							</props>
						</constructor-arg>
					</bean>
					<os-core:space id="space" url="/./${dataGridName}">
						<os-core:query-extension-provider ref="luceneSpatialQueryExtensionProvider"/>
					</os-core:space>
			</beans>