Summary: How to model application data for in-memory data grid
Moving from Centralized to Distributed Data ModelWhen moving from a centralized to a distributed data store, your data must be partitioned across multiple nodes (partitions). Implementing the partitioning mechanism technically is not a hard task; however, planning the distribution of your data for scalability and performance, requires some thinking. Planning for Data PartitioningTwo issues should be taken into consideration when planning the data partitioning:
When planning the partitioning and how much memory to allocate for the application, the important factors to consider are how much the data is expected to grow over time, and how long the data is expected to available for. Estimated data storage needs should not be confused with the data structure, which is considered separately. Use the following table to successfully predict the memory necessary to support your application:
While you might be used to modeling your data on the logical relationship of your data items, a different approach should be adopted in the case of distributed data. The key is to avoid cross cluster relationships as much as possible. Cross cluster relationships lead to cross cluster queries and updates which are usually much less scalable and run slower than their local counterparts. Thinking in terms of traditional relationships ("one to one", "one to many" and "many to many"), is deceiving with distributed data. Instead, you must consider how many different associations each entity has. If an entity is associated with several containers (parent entities), it can't be embedded within the containing entity. It might be also impossible to store it with all of its containers on the same partition. In the Pet Clinic application, a Pet is only associated with an Owner. We can therefore store each Pet with its owner on the same partition. We can even embed the Pet object within the physical Owner entry. What are Embedded and Non Embedded Relationships?Embedded Relationships mean that one object physically contains the associated objects and there is a strong lifecycle dependency between them. When the containing object is deleted, so are all of its contained objects. With this type of object association, you ensure there is always a local transaction, since the entire object graph is stored in the same entry within the Space. Data Access for Embedded RelationshipsEmbedded Object Query: The info property is an object within the Person class: SqlQuery<Person> query = new SqlQuery<Person> ("info.socialSecurity < ? and info.socialSecurity >= ?"); Embedded Map Query: The info property is a Map within the Person class: SqlQuery<Person> query = new SqlQuery<Person>("info.salary < 15000 and info.salary >= 8000"); Embedded Collection Query: The employees property is a collection within the Company class: SqlQuery<Company> query =
new SqlQuery<Company>
("employees[*].children[*].name = 'Junior Doe');
See the SqlQuery section for details about embedded entities query and indexing. Non Embedded Relationships means that one object is associated with a number of other objects, so you can navigate from one object to another. However, there is no life cycle dependency between them, so if you delete the referencing object, you don't automatically delete the referenced object(s). The association is therefore manifested in storing IDs rather than storing the actual associated object itself. This type of relationship means that you don't duplicate data but you are more likely to access more than one node in the cluster when querying or updating your data.
Embedded vs. Non Embedded RelationshipsWe have already seen that embedding objects is not ideal for distributed data storage systems. Other factors to consider when choosing a relationship type are:
When Should Objects be Embedded?
|
![]() |
GigaSpaces.com - Legal Notice - 3rd Party Licenses - Site Map - API Docs - Forum - Downloads - Blog - White Papers - Contact Tech Writing - Gen. by Atlassian Confluence |