Monitoring Health of Your Cluster
  • 5 Minutes to read
  • Contributors
  • Dark
    Light
  • PDF

Monitoring Health of Your Cluster

  • Dark
    Light
  • PDF

Health Monitoring provides an insight into your DNIF cluster both at system and application level. It displays how the underlying infrastructure health, performance and availability affects your Business Services in the long term.

Why Health Metrics?

  • Health metrics collection is an excellent way to track the components and measure performance.
  • It provides objective information throughout the environment and reduces ambiguity.
  • It accurately describes the status of the utilities and processes of the components.
  • Usage metrics facilitate a proactive management strategy by identifying risks to be assessed and manage or evaluate and prioritize problems.
  • Metrics fosters an early discovery and correction of problems that can be more difficult or costly to resolve later.

How to view Health Charts?

  • Hover on the Administration icon on the left navigation panel of the Home screen, from the option displayed select Manage Components, the following screen will be displayed.

image.png

  • The charts listed on the above components list page, is for the overall components in that cluster.
Health Charts Description
EPS Total It is the cumulative EPS, the total EPS received from all the listeners at a particular time.
Storage Utilization Displays the total storage used by the cluster at a particular time.
Compute CPU Displays the total compute percentage used by each datanode in the cluster at a particular time.
Compute Memory Displays the total memory utilized by each datanode at a particular time.
Running Queries Displays the total number of queries running per server at a particular time.
Query Load Displays the query load per server at a particular time.
Terminated Queries Displays the total number of queries terminated per server at a particular time.
  • To view the health of each component, click on the component name, a new health page will be displayed. This page displays the health status of the particular component in chart form.

Component Page Functions

Field Name Description
Refresh Used to refresh the details on the components page
Snapshot Logs image.png Used to download logs, click on the vertical ellipses displayed on the extreme right against each component. Note: Logs can be downloaded for a specific component or globally for all the components.

To edit the component name, click on the Component Name displayed on the top left corner of the Health Metrics screen and enter a name of your preference.

Core

The following health charts will be displayed for Core, each chart displays the health status of a particular Core characteristic.

image.png

Health Charts Description
CPU Utilization Displays the percentage of CPU Utilization and the IOWait (input/output wait time) utilized by core in the cluster at a particular time.
Memory Utilization Displays the total memory utilized (used and Cached) at a particular time by Core
Disk Utilization Displays the total disk utilization used by Core at a particular time.
Network Performance Displays the network performance (bytes transferred in millions) in Core.
IO Workload Displays the read and write bytes (in million) in Core at a particular time.
IO Availability Displays the duration (in seconds) spent by Core in different stages (read, write, and busy) in a particular minute.

Datanode

The Datanode component displays the use of the following health charts displayed in the table. Each chart displays the health status of a particular Datanode characteristic.

image.png

Health Chart Description
CPU Utilization Displays the CPU Utilization and the IOWait (input/output wait time) utilized by this datanode in the cluster at a particular time.
Memory Utilization Displays the total memory utilized (used and Cached) at a particular time by this datanode.
Disk Utilization Displays the total disk utilization used by this datanode at a particular time.
Network Performance Displays the network performance (total bytes in and out) of this particular datanode in Cluster.
IO Workload Displays the read and write bytes (in million) by this particular datanode at a particular time.
IO Availability Displays the duration (in seconds) spent by this particular datanode in different stages (read, write, and busy) at a particular minute.

Adapter

The following health charts will be displayed for an Adapter. Each chart displays the health status of a particular Adapter characteristic.

image.png

Health Charts Description
EPS - Timeline It is the cumulative EPS, the total EPS received by the adapter at a particular time.
EPS By Connector EPS received by a connector at a particular time.
Parsing Status Total count of PER, NLF, PAD logs at a particular time.
EPS by Process Total count of logs streamed at each process(Parser, Enricher, Compacto) for a particular time.
Indexing Rate Count of indexed events written into data nodes at a particular minute.
CPU Utilization Displays the CPU Utilization and the IOWait (input/output wait time) utilized by data nodes in the cluster at a particular time.
Memory Utilization The percentage of memory consumed, available.
Disk Utilization Displays the total memory utilized (used and Cached) at a particular time by Compute Node
Network Performance Displays the network performance (total bytes in and out) of the Compute node in Cluster.
IO Workload Displays the read and write bytes (in million) in the compute node at a particular time.
IO Availability Displays the duration (in seconds) spent by the Compute node in different stages (read, write, and busy) in a particular minute.
Queue Utilisation Displays the rate at which events are published to the queue

PICO

The following health charts will be displayed for a PICO. Each chart displays the health status of a particular PICO characteristic.

image.png

Health Charts Description
EPS By Connector EPS received by all the connectors that are configured in PICO at a particular time.
Filter Engine EPS Displays the incoming and outgoing EPS of filter engine at a particular time
Native Forwarder EPS EPS forwarded by Native Forwarder at a particular time
Raw Forwarder EPS EPS forwarded by Raw Forwarder at a particular time
CPU Utilization Displays the CPU Utilization and the IOWait (input/output wait time) utilized by PICO in the cluster at a particular time.
Memory Utilization Displays the total memory utilized (used and cached) at a particular time by PICO
Disk Utilization Displays the total memory utilized (used and Cached) at a particular time by PICO
Network Performance Displays the network performance (total bytes in and out) of PICO in Cluster.
IO Workload Displays the read and write bytes (in million) in PICO at a particular time.
IO Availability Displays the duration (in seconds) spent by PICO in different stages (read, write, and busy) in a particular minute.
Queue Utilization Displays the rate at which events are published to the Pico queue
  • The health page of each component gets refreshed every minute, timer on the right corner of the page indicates the time remaining for next refresh.
  • If a component's health is not reported for continuous 15 minutes, the component state is automatically changed to OFFLINE from ACTIVE state.
  • If the component's health is not reported for two continuous days, then the component state changes to UNKNOWN from OFFLINE state.

Was this article helpful?