====== [HOWTO] Ceph+grafana+prometheus ======
^ Documentation ^|
^Name:| [HOWTO] Ceph+grafana+prometheus |
^Description:| How to setup ceph with prometheus and grafana for advanced statistics |
^Modification date :| 24/12/2021 |
^Owner:|dodger|
^Notify changes to:|Owner |
^Tags:|ceph, object storage |
^Scalate to:|The_fucking_bofh|
====== Documentation ======
* [[https://docs.ceph.com/en/nautilus/mgr/dashboard/?#enabling-the-embedding-of-grafana-dashboards|Ceph Dashboard + grafana integration]]
* [[https://docs.ceph.com/en/latest/mgr/dashboard/#dashboard-grafana|Same as previous but for latest version of ceph (has additional info)]]
* [[https://docs.ceph.com/en/nautilus/mgr/prometheus/|Ceph+Prometheus official documentation]]
* [[https://github.com/ceph/ceph/tree/master/monitoring/grafana/dashboards|Official ceph grafana dashboards]]
* [[https://dev.to/ingoleajinkya/ceph-cluster-monitoring-using-prometheus-and-grafana-472i|Non-official howto]]
====== Pre-Requisites ======
===== Prometheus node exporter =====
From the salt-master:
export THEHOSTNAME='avmlp-os*'
salt "${THEHOSTNAME}" test.ping
salt "${THEHOSTNAME}" pkg.install golang-github-prometheus-node-exporter
salt "${THEHOSTNAME}" service.start node_exporter
salt "${THEHOSTNAME}" service.enable node_exporter
salt "${THEHOSTNAME}" service.status node_exporter
Check:
salt "${THEHOSTNAME}" cmd.run "netstat -nap | egrep 9100 | egrep LISTEN"
Obtain the list of nodes for configuring prometheus to scrape the ''node_exporter'':
salt "${THEHOSTNAME}" service.status node_exporter | grep "^${THEHOSTNAME}" | awk -F\: '{print "\047"$1":9100\047,"}'
Example:
root@avmlm-salt-001 /home/bofher/scripts/nutanix_buster $ salt "${THEHOSTNAME}" service.status node_exporter | grep "^${THEHOSTNAME}" | awk -F\: '{print "\047"$1":9100\047,"}'
'bvmlm-osd-001.ciberterminal.net:9100',
'bvmlm-osd-019.ciberterminal.net:9100',
'bvmlm-osd-013.ciberterminal.net:9100',
'bvmlm-osm-003.ciberterminal.net:9100',
'bvmlm-osd-005.ciberterminal.net:9100',
'bvmlm-oslb-001.ciberterminal.net:9100',
'bvmlm-osd-010.ciberterminal.net:9100',
'bvmlm-osd-003.ciberterminal.net:9100',
'bvmlm-osd-020.ciberterminal.net:9100',
'bvmlm-osfs-003.ciberterminal.net:9100',
'bvmlm-osd-002.ciberterminal.net:9100',
'bvmlm-osm-001.ciberterminal.net:9100',
'bvmlm-osm-004.ciberterminal.net:9100',
'bvmlm-osd-015.ciberterminal.net:9100',
'bvmlm-osd-018.ciberterminal.net:9100',
'bvmlm-osgw-001.ciberterminal.net:9100',
'bvmlm-osd-017.ciberterminal.net:9100',
'bvmlm-osd-011.ciberterminal.net:9100',
'bvmlm-osd-007.ciberterminal.net:9100',
'bvmlm-osgw-004.ciberterminal.net:9100',
'bvmlm-osgw-003.ciberterminal.net:9100',
'bvmlm-osd-006.ciberterminal.net:9100',
'bvmlm-osfs-004.ciberterminal.net:9100',
'bvmlm-osm-002.ciberterminal.net:9100',
'bvmlm-osd-008.ciberterminal.net:9100',
'bvmlm-osfs-002.ciberterminal.net:9100',
'bvmlm-osfs-001.ciberterminal.net:9100',
'bvmlm-osd-004.ciberterminal.net:9100',
'bvmlm-oslb-002.ciberterminal.net:9100',
'bvmlm-osd-012.ciberterminal.net:9100',
'bvmlm-osd-009.ciberterminal.net:9100',
'bvmlm-osgw-002.ciberterminal.net:9100',
'bvmlm-osd-014.ciberterminal.net:9100',
'bvmlm-osm-005.ciberterminal.net:9100',
'bvmlm-osnx-002.ciberterminal.net:9100',
'bvmlm-osd-016.ciberterminal.net:9100',
===== Prometheus =====
Bare minimal install instructions:
cat >/etc/yum.repos.d/prometheus.repo<
Prometheus setup, add scrape config for ceph, for example, in dev with larry:
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['0.0.0.0:9090']
- job_name: 'ceph-larry'
static_configs:
- targets: ['larry.ciberterminal.net:9283']
- job_name: 'node-exporter'
static_configs:
- targets: [
'bvmlm-osd-001.ciberterminal.net:9100',
'bvmlm-osd-019.ciberterminal.net:9100',
'bvmlm-osd-013.ciberterminal.net:9100',
'bvmlm-osm-003.ciberterminal.net:9100',
'bvmlm-osd-005.ciberterminal.net:9100',
'bvmlm-oslb-001.ciberterminal.net:9100',
'bvmlm-osd-010.ciberterminal.net:9100',
'bvmlm-osd-003.ciberterminal.net:9100',
'bvmlm-osd-020.ciberterminal.net:9100',
'bvmlm-osfs-003.ciberterminal.net:9100',
'bvmlm-osd-002.ciberterminal.net:9100',
'bvmlm-osm-001.ciberterminal.net:9100',
'bvmlm-osm-004.ciberterminal.net:9100',
'bvmlm-osd-015.ciberterminal.net:9100',
'bvmlm-osd-018.ciberterminal.net:9100',
'bvmlm-osgw-001.ciberterminal.net:9100',
'bvmlm-osd-017.ciberterminal.net:9100',
'bvmlm-osd-011.ciberterminal.net:9100',
'bvmlm-osd-007.ciberterminal.net:9100',
'bvmlm-osgw-004.ciberterminal.net:9100',
'bvmlm-osgw-003.ciberterminal.net:9100',
'bvmlm-osd-006.ciberterminal.net:9100',
'bvmlm-osfs-004.ciberterminal.net:9100',
'bvmlm-osm-002.ciberterminal.net:9100',
'bvmlm-osd-008.ciberterminal.net:9100',
'bvmlm-osfs-002.ciberterminal.net:9100',
'bvmlm-osfs-001.ciberterminal.net:9100',
'bvmlm-osd-004.ciberterminal.net:9100',
'bvmlm-oslb-002.ciberterminal.net:9100',
'bvmlm-osd-012.ciberterminal.net:9100',
'bvmlm-osd-009.ciberterminal.net:9100',
'bvmlm-osgw-002.ciberterminal.net:9100',
'bvmlm-osd-014.ciberterminal.net:9100',
'bvmlm-osm-005.ciberterminal.net:9100',
'bvmlm-osnx-002.ciberterminal.net:9100',
'bvmlm-osd-016.ciberterminal.net:9100'
]
We will restart and check after setting up the rest of elements :-)
===== grafana =====
* Grafana working
I haven't setup it, so I can't give instructions here xD\\
\\
Additional setup for grafana to work with ceph:
--- grafana.ini 2021-12-24 10:38:20.669668776 +0100
+++ grafana.ini.orig 2021-12-24 12:36:44.083311253 +0100
@@ -185,7 +185,6 @@
# set to true if you want to allow browsers to render Grafana in a ,
\\
But you'll need the following plugins for grafana:
grafana-cli plugins install vonage-status-panel
grafana-cli plugins install grafana-piechart-panel
\\
Import **all** of the officia dashboards :-)\\
Here you have some nice oneliners to simplify the process:
wget "https://github.com/ceph/ceph/tree/master/monitoring/grafana/dashboards"
for i in $(cat dashboards| egrep json |egrep "dashboard" | awk -F\" '{print $6}' | egrep "\.json") ; do wget "https://raw.githubusercontent.com/ceph/ceph/master/monitoring/grafana/dashboards/${i}" ; done
for i in *json ; do cat ${i} | jq . >/dev/null && echo "### OK ${i}" || echo "@@@ KO ${i}" ; done
And import them with the web-ui (I couldn't import them through API).
\\
Also you'll have to setup prometheus as data-source for grafana and setup the prometheus server:
{{:documentation:dba:ceph:howtos:ceph_grafana_prometheus:2021-12-24_12-28.png|}}
====== Instructions ======
Following official documentation, on **any** of the ceph admin nodes:
ceph mgr module enable prometheus
ceph config set mgr mgr/prometheus/server_port 9283
ceph config set mgr mgr/prometheus/server_addr 0.0.0.0
ceph config set mgr mgr/prometheus/scrape_interval 15
ceph dashboard set-grafana-api-url http://avvmld-graf-001.ciberterminal.net:3000/
ceph dashboard set-grafana-api-ssl-verify False
You **must** change grafana url according your setup.
\\
check:
bvmlm-osm-001 /home/bofher # ceph config dump | egrep -v "KEY"
WHO MASK LEVEL OPTION VALUE RO
mgr advanced mgr/dashboard/GRAFANA_API_URL https://grafana-bavel.ciberterminal.net/ *
mgr advanced mgr/prometheus/scrape_interval 15 *
mgr advanced mgr/prometheus/server_addr 0.0.0.0 *
mgr advanced mgr/prometheus/server_port 9283 *
bvmlm-osm-001 /home/bofher # ceph mgr services
{
"dashboard": "https://bvmlm-osm-002.ciberterminal.net:8443/",
"prometheus": "http://bvmlm-osm-002.ciberterminal.net:9283/"
}
\\
haproxy configuration so it **magically** balance to the working monitor server running dashboard & prometheus module:
# Fronted for prometheus scrapper
frontend http_web *:9283
mode http
default_backend ceph_prometheus
backend ceph_prometheus
mode http
option httpchk GET /
http-check expect status 200
server monscraper1 bvmlm-osm-001.ciberterminal.net:9283 check verify none
server monscraper2 bvmlm-osm-002.ciberterminal.net:9283 check verify none
server monscraper3 bvmlm-osm-003.ciberterminal.net:9283 check verify none
server monscraper4 bvmlm-osm-004.ciberterminal.net:9283 check verify none
server monscraper5 bvmlm-osm-005.ciberterminal.net:9283 check verify none
\\
Go and restart prometheus to begin scrapping ceph:
systemctl restart prometheus
systemctl status prometheus
Check targets on prometheus: [[http://avmlm-prom-001:9090/targets]] (**change** the prometheus server...)
== Need more instructions? RTFM! ==
====== For NX nodes (nginx) ======
Add firewall rules:
firewall-cmd --permanent --zone=public --add-rich-rule='rule family=ipv4 source address=10.40.3.64/32 port port=9100 protocol=tcp accept'
firewall-cmd --zone=public --add-rich-rule='rule family=ipv4 source address=10.40.3.64/32 port port=9100 protocol=tcp accept'
====== Final thoughts ======
{{:documentation:dba:ceph:howtos:ceph_grafana_prometheus:2021-12-24_13-48.png|}}