What are the recommended Prometheus metrics to scrape?

My YB cluster has a Prometheus version 2.2.1 instance running. I want to send a number of these metrics to DataDog for monitoring. Does YB team have a set of recommended metrics to send?

Right now, I have the “up” and “node_filesystem_avail” metrics sent to DataDog. There are a number of "handler_latency_yb_" metrics. We have YCQL but the handler_latency_yb_ metrics do not include YCQL. We also have YEDIS, but not sure what metrics to send.

So, I am looking for a set of YCQL and YEDIS metrics to send to DataDog. The purpose is to monitor cluster health.

Hi @Steve_Liang

Have you seen Prometheus integration | YugabyteDB Docs ?

Hello @Steve_Liang!

Please use the documentation provided by @dorian_yugabyte to set up the relabel configs for Prometheus if you haven’t already.

A full list of useful metrics should be available in the Grafana JSON config at YugabyteDB | Grafana Labs. I would recommend bringing up the Grafana dashboard after configuring Prometheus as described at Prometheus integration | YugabyteDB Docs so you can see the key metrics organized by category (YCQL, YSQL, DocDB etc.

Hi
There are few prerequisite where you can monitor cluster health.
Create the universe
If you are running local universe then destroy it. Then start a new yugabyte cluster by following:
$ ./bin/yb-ctl create --rf
Run the YugabyteDB workload generator
$ wget https://github.com/yugabyte/yb-sample-apps/releases/download/1.3.1/yb-sample-apps.jar?raw=true -O yb-sample-apps.j
Run the cassandrakeyvalue workload application in a separate shell by following:
$ java -jar ./yb-sample-apps.jar
–workload CassandraKeyValue
–nodes 127.0.0.1:9042
–num_threads_read 1
–num_threads_write 1
Prepare Prometheus configuration file.
Start Prometheus server
Go to the directory where Prometheus is installed and start the Prometheus server as below.
$ ./prometheus --config.file=yugabytedb.yml
Analyze Key Metrics
On the Prometheus Graph UI, you can now plot the read/write throughput and latency for the CassandraKeyValue sample app. As you can see from the source code of the app, it uses only SELECT statements for reads and INSERT statements for writes (aside from the initial CREATE TABLE).
sum(irate(rpc_latency_count{server_type=“yb_cqlserver”, service_type=“SQLProcessor”, service_method=“SelectStmt”}[1m]))
Clean Up
Optionally, you can shut down the local cluster created in Step 1.
$ ./bin/yb-ctl destroy