GigaSpaces Ops Manager
The GigaSpaces Ops Manager is an enterprise-grade management platform that can be implemented for GigaSpaces-based applications across on-premise, cloud, and Kubernetes environments. Ops Manager is a robust web interface that provides visibility and real-time information about your GigaSpaces clusters and their services. It enables system administrators to better monitor their clusters, and perform troubleshooting activities based on various performance metrics, defined thresholds, and so on.
Ops Manager enables reviewing and analyzing data that resides within the cluster. Users can review managed Spaces, their types and properties, and query for data discovery purposes.
Using Ops Manager, you can:
- View the status of services and microservices to identify healthy and unhealthy clusters
- Review and monitor cluster performance in terms of CPU, latency, IOPS and memory usage metrics
- Drill down to specific services and review performance metrics and alerts
- Analyze alert trends and volume filtered by time, type, etc.
- Review individual service instances and analyze alerts per instance
- Download service logs for root-cause analysis
Accessing Ops Manager
The Ops Manager is deployed along with the Manager service and the REST Manager API, and uses the same entry point. In order to open Ops Manager in your browser window and view your GigaSpaces environment, you need to access the Manager's IP address on port 8090.
If your GigaSpaces environment is deployed on a local agent, use the following URL: http:/localhost:8090.
If your GigaSpaces environment is deployed on a cluster, you can open Ops Manager using the IP address of any machine that hosts a Manager.
If you deploy a GigaSpaces application in the cloud, you will need to manually obtain the Manager IP address. If you deploy in Amazon AWS, for example, then you can get the Manager's IP address from the AWS Console. The format of the URL is: http://<manager IP address>:8090.
If you had a previous version of GigaSpaces, when you used the Manager IP address and port in the URL field of your browser to access the REST Manager API (http://<MANAGER_IP>:8090/), you were automatically redirected to http://<MANAGER_IP>:8090/v2. This information may be stored in your browser cache and prevent you from opening Ops Manager. If this occurs, clear your browser cache to enable access to Ops Manager.
Understanding the Ops Manager Interface
If you aren't familiar with general GigaSpaces terminology, see the General Terms and Concepts topic in the Product Overview.
A service is also known as a Processing Unit in GigaSpaces products. The following services are supported.
|Stateless||Contains one or more microservices. This kind of service may act like a client that interacts with other services.|
|Stateful||Contains data in a Space instance. A set of stateful services comprise a GigaSpaces data grid.|
|Mirror||A mirror service is stateless and provides asynchronous persistence to ensure that data isn't lost. The mirror acts as a dispatcher, pushing all primary Space partition changes to the database (or any other data source).|
|Web||Allows deploying web applications using packaged or exploded WAR files.|
|WAN gateway||Replicates services between different clusters for disaster recovery planning and data locality.|
In addition to the main services, you can also deploy the following service sub-types.
|MemoryXtend||Service that has been configured for MemoryXtend|
There are three graphs in the cluster overview that display the following information:
- Usage - view the average CPU usage, actual RAM usage, and disk usage.
- Latency - view the average latency, memory latency (both disk and RAM), and MemoryXtend latency (off-heap and SSD) per second.
- IOPS - View the reads, writes, and ??? per second.
Ops Manager metrics can be reported to InfluxDB, so that you can use external monitoring and visualization tools (such as Grafana) to perform additional analysis.
You can view the services sorted by health and severity. You can also apply filters by service type and RAM utilization, or search for specific services by name. Click a service to drill through to see the individual performance metrics and alerts for that service. For deeper analysis, you can create a dump file of the service logs.
There are two types of alerts in Ops Manager, availability and resource. Alert summaries can be viewed at the service level, and users can drill through to specific service instances to view and analyze alerts by group and individually, to assess system and cluster performance and identify problem areas.
Availability alerts describe issues with service availability. The following alerts are used:
- Disconnected - there is no backup service for this primary service.
- Unavailable - the service isn't responding.
- Overprovisioned - the number of primary and backup services don't match.
Resource alerts have to do with system resources, and the threshold is set by default to 80%. The following resource alerts are available by default:
- CPU usage
- Redo log
You can configure custom resource alerts per service types, and modify the resource thresholds via the configuration file.