Full Text Search
GigaSpaces products provide full text search capability, leveraging the Lucene search engine library.
The following features are supported:
- Keyword matching
- Search for phrase
- Wildcard matching
- Proximity matching
- Range searching
- Boosting a term
- Regular expressions
- Fuzzy search
Full text search queries can be used with any Space Where GigaSpaces data is stored. It is the logical cache that holds data objects in memory and might also hold them in layered in tiering. Data is hosted from multiple SoRs, consolidated as a unified data model. operation that supports SQL queries (read, readMultiple, take, etc.).
Dependencies
In order to use this feature, include the $GS_HOME/lib/optional/full-text-search/xap-full-text-search.jar
file on your classpath or use Maven dependencies:
<dependency>
<groupId>org.gigaspaces</groupId>
<artifactId>xap-full-text-search</artifactId>
<version>16.4.0-m1</version>
</dependency>
For more information about dependencies, see Maven Artifacts.
Examples
Text search queries are available through the text:
extension to the SQL query syntax.
For example, suppose we have a class called NewsArticle
with a String property called content
and a String property called type
:
// Matching
SQLQuery<NewsArticle> query = new SQLQuery<NewsArticle>(NewsArticle.class, "content text:match ?");
query.setParameter(1, "deployment");
// Wildcard search
// To perform a single character wildcard search use the "?" symbol.
SQLQuery<NewsArticle> query = new SQLQuery<NewsArticle>(NewsArticle.class, "content text:match ?");
query.setParameter(1, "GigaSpac?s");
// To perform a multiple character wildcard search use the "*" symbol.
SQLQuery<NewsArticle> query = new SQLQuery<NewsArticle>(NewsArticle.class, "content text:match ?");
query.setParameter(1, "clou*y");
//Regular Expression search
SQLQuery<NewsArticle> query = new SQLQuery<NewsArticle>(NewsArticle.class, "content text:match ?");
query.setParameter(1, "/[tp]es/");
// Fuzzy Search
SQLQuery<NewsArticle> query = new SQLQuery<NewsArticle>(NewsArticle.class, "content text:match ?");
query.setParameter(1, "space~");
// Boolean operator
SQLQuery<NewsArticle> query = new SQLQuery<NewsArticle>(NewsArticle.class, "content text:match ? AND type text:match ?");
query.setParameter(1, "space");
query.setParameter(1, "blog");
Supported Search Operations
GigaSpaces supports the Lucene Query Parser Syntax except Fields
.
Nested Properties
In the example below, the author
is a property of type Person
which is a property of NewsArticle
:
@SpaceClass
public class NewsArticle {
private UUID id;
private String content;
private Person author;
private Long articleNumber;
private String type;
public String getContent() {
return content;
}
public Person getAuthor() {
return author;
}
public void setAuthor(Person author) {
this.author = author;
}
//......
}
public class Person {
private String firstName;
private String lastName;
public String getFirstName() {
return firstName;
}
public void setFirstName(String firstName) {
this.firstName = firstName;
}
public String getLastName() {
return lastName;
}
public void setLastName(String lastName) {
this.lastName = lastName;
}
}
And here is an example how you can query for nested properties:
SQLQuery<NewsArticle> query = new SQLQuery<NewsArticle>(NewsArticle.class, "author.firstName text:match ? AND author.lastName text:match ?");
query.setParameter(1, "Friedrich");
query.setParameter(2, "Durrenmatt");
Combining Text and Standard Predicates
Suppose our NewsArticle
class contains a articleNumber property as well, and we want to enhance our query and find the NewsArticle with a articleNumber. We can simply add the relevant predicate to the query’s criteria:
SQLQuery<NewsArticle> query = new SQLQuery<NewsArticle>(NewsArticle.class, "content text:match ? AND articleNumber < ?");
query.setParameter(1, "deployment");
query.setParameter(2, new Long(1000));
Analyzer
An Analyzer is responsible for supplying a TokenStream which can be consumed by the indexing and searching processes in Lucene. There are several different Analyzers available.
You can use the @SpaceTextAnalyzer
annotation to choose the Analyzer:
import org.apache.lucene.analysis.core.KeywordAnalyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.openspaces.textsearch.SpaceTextAnalyzer;
import org.openspaces.textsearch.SpaceTextIndex;
import org.openspaces.textsearch.SpaceTextIndexes;
import com.gigaspaces.annotation.pojo.SpaceClass;
import com.gigaspaces.annotation.pojo.SpaceId;
@SpaceClass
public class NewsArticle {
private UUID id;
private String content;
private Person author;
private Long articleNumber;
private String type;
@SpaceTextIndex
@SpaceTextAnalyzer(analyzer = StandardAnalyzer.class)
public String getContent() {
return content;
}
@SpaceTextAnalyzer(analyzer = KeywordAnalyzer.class)
public String getType() {
return type;
}
// ....
}
For nested properties, you can use the @SpaceTextAnalyzers
annotation:
@SpaceClass
public class NewsArticle {
private UUID id;
private String content;
private Person author;
private Long articleNumber;
private String type;
@SpaceTextAnalyzers({ @SpaceTextAnalyzer(path = "firstName", analyzer = KeywordAnalyzer.class),
@SpaceTextAnalyzer(path = "lastName", analyzer = StandardAnalyzer.class) })
public Person getAuthor() {
return author;
}
// .....
If the @SpaceTextAnalyzer
annotation is omitted, the StandardAnalyzer
is applied.
For collection properties, you can use the @SpaceTextAnalyzers
annotation:
@SpaceClass
public class Director {
private UUID id;
private List<Movie> movies;
@SpaceTextAnalyzers({ @SpaceTextAnalyzer(analyzer = KeywordAnalyzer.class,path = "[*].title")})
public List<Movie> getMovies() {
return movies;
}
// .....
Indexing
The performance of text search queries can be vastly improved by indexing the relevant properties. For detailed information see See Indexing for more information.
Space Document
The text search is also supported with Space Documents. Lets take the above example of the NewsArticle
and use it as a SpaceDocument
:
DocumentProperties author = new DocumentProperties();
author.put("firstName", "Friedrich");
author.put("lastName", "Durrenmatt");
SpaceDocument doc = new SpaceDocument("NewsArticle")
.setProperty("id", 1)
.setProperty("content", "The quick brown fox jumps over the lazy dog")
.setProperty("author", author);
// ...
Defining the TypeDescriptor and registering with the Space is done with the addQueryExtensionInfo
method:
GigaSpace gigaSpace = new GigaSpaceConfigurer(new EmbeddedSpaceConfigurer("xapSpace")).gigaSpace();
// Simple
gigaSpace.getTypeManager().registerTypeDescriptor(new SpaceTypeDescriptorBuilder(typeName).idProperty("id").create());
// Analyzer
gigaSpace.getTypeManager().registerTypeDescriptor(new SpaceTypeDescriptorBuilder(typeName).idProperty("id")
.addQueryExtensionInfo("content",LuceneTextSearchQueryExtensionProvider.analyzer(KeywordAnalyzer.class))
.create());
// Nested Analyzer
gigaSpace.getTypeManager().registerTypeDescriptor(new SpaceTypeDescriptorBuilder(typeName).idProperty("id")
.addQueryExtensionInfo("author.firstName",LuceneTextSearchQueryExtensionProvider.analyzer(KeywordAnalyzer.class))
.addQueryExtensionInfo("author.LastName",LuceneTextSearchQueryExtensionProvider.analyzer(StandardAnalyzer.class)).create());
Search the space for SpaceDocuments:
SQLQuery<SpaceDocument> query = new SQLQuery("NewsArticle", "content text:match ?").setParameter(1, "The quick brown fox jumps over the lazy dog");
SpaceDocument result = this.gigaSpace.read(query);
Refer to SpaceDocument for more information on SpaceDocument.
Configuration
Property | Description | Default |
---|---|---|
lucene.storage.location | The location of the lucene index | Deploy path of this space instance, when deployed in the service grid. When not deployed in the service grid <user.dir>/xap/full_text_search |
lucene.storage.directory-type | The directory type. Available values: MMapDirectory, RAMDirectory. | MMapDirectory |
lucene.max-uncommitted-changes | The buffer size of uncommitted changes. When user write indexed document to the space, the document doesn’t flushes to the lucene index immediately. It flushes after search or after overflowing the buffer. | 1000 |
lucene.max-results | The max number of the document retrieved from lucene during the search. | Integer.MAX_VALUE |
Configuration Code Example:
final Properties luceneProperties = new Properties();
luceneProperties.setProperty("lucene.max-results", "10000");
luceneProperties.setProperty("lucene.storage.directory-type", "RAMDirectory");
final LuceneTextSearchQueryExtensionProvider queryExtensionProvider = new LuceneTextSearchQueryExtensionProvider(luceneProperties);
this.gs = new GigaSpaceConfigurer(new EmbeddedSpaceConfigurer("testSpace")
.addProperties(gsProperties)
.addQueryExtensionProvider(queryExtensionProvider))
.gigaSpace();
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:os-core="http://www.openspaces.org/schema/core"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.openspaces.org/schema/core http://www.openspaces.org/schema/core/openspaces-core.xsd">
<bean id="propertiesConfigurer" class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">
<property name="properties">
<props>
<prop key="dataGridName">dataGrid</prop>
<prop key="maxResults">100</prop>
<prop key="maxUncommitedChanges">100</prop>
<prop key="directoryType">RAMDirectory</prop>
</props>
</property>
</bean>
<bean id="luceneSpatialQueryExtensionProvider" class="org.openspaces.textsearch.LuceneTextSearchQueryExtensionProvider">
<constructor-arg name="customProperties">
<props>
<prop key="lucene.max-results">${maxResults}</prop>
<prop key="lucene.max-uncommitted-changes">${maxUncommitedChanges}</prop>
<prop key="lucene.storage.directory-type">${directoryType}</prop>
</props>
</constructor-arg>
</bean>
<os-core:space id="space" url="/./${dataGridName}">
<os-core:query-extension-provider ref="luceneSpatialQueryExtensionProvider"/>
</os-core:space>
</beans>