Table of Contents
[HOWTO] Ceph+grafana+prometheus
Documentation | |
---|---|
Name: | [HOWTO] Ceph+grafana+prometheus |
Description: | How to setup ceph with prometheus and grafana for advanced statistics |
Modification date : | 24/12/2021 |
Owner: | dodger |
Notify changes to: | Owner |
Tags: | ceph, object storage |
Scalate to: | Thefuckingbofh |
Documentation
Pre-Requisites
Prometheus node exporter
From the salt-master:
export THEHOSTNAME='avmlp-os*' salt "${THEHOSTNAME}" test.ping salt "${THEHOSTNAME}" pkg.install golang-github-prometheus-node-exporter salt "${THEHOSTNAME}" service.start node_exporter salt "${THEHOSTNAME}" service.enable node_exporter salt "${THEHOSTNAME}" service.status node_exporter
Check:
salt "${THEHOSTNAME}" cmd.run "netstat -nap | egrep 9100 | egrep LISTEN"
Obtain the list of nodes for configuring prometheus to scrape the nodeexporter'':
<code bash>
salt “${THEHOSTNAME}” service.status nodeexporter | grep “^${THEHOSTNAME}” | awk -F\: '{print “\047”$1“:9100\047,”}'
</code>
Example:
<code bash>
root@avmlm-salt-001 /home/bofher/scripts/nutanixbuster $ salt “${THEHOSTNAME}” service.status nodeexporter | grep “^${THEHOSTNAME}” | awk -F\: '{print “\047”$1“:9100\047,”}'
'bvmlm-osd-001.ciberterminal.net:9100',
'bvmlm-osd-019.ciberterminal.net:9100',
'bvmlm-osd-013.ciberterminal.net:9100',
'bvmlm-osm-003.ciberterminal.net:9100',
'bvmlm-osd-005.ciberterminal.net:9100',
'bvmlm-oslb-001.ciberterminal.net:9100',
'bvmlm-osd-010.ciberterminal.net:9100',
'bvmlm-osd-003.ciberterminal.net:9100',
'bvmlm-osd-020.ciberterminal.net:9100',
'bvmlm-osfs-003.ciberterminal.net:9100',
'bvmlm-osd-002.ciberterminal.net:9100',
'bvmlm-osm-001.ciberterminal.net:9100',
'bvmlm-osm-004.ciberterminal.net:9100',
'bvmlm-osd-015.ciberterminal.net:9100',
'bvmlm-osd-018.ciberterminal.net:9100',
'bvmlm-osgw-001.ciberterminal.net:9100',
'bvmlm-osd-017.ciberterminal.net:9100',
'bvmlm-osd-011.ciberterminal.net:9100',
'bvmlm-osd-007.ciberterminal.net:9100',
'bvmlm-osgw-004.ciberterminal.net:9100',
'bvmlm-osgw-003.ciberterminal.net:9100',
'bvmlm-osd-006.ciberterminal.net:9100',
'bvmlm-osfs-004.ciberterminal.net:9100',
'bvmlm-osm-002.ciberterminal.net:9100',
'bvmlm-osd-008.ciberterminal.net:9100',
'bvmlm-osfs-002.ciberterminal.net:9100',
'bvmlm-osfs-001.ciberterminal.net:9100',
'bvmlm-osd-004.ciberterminal.net:9100',
'bvmlm-oslb-002.ciberterminal.net:9100',
'bvmlm-osd-012.ciberterminal.net:9100',
'bvmlm-osd-009.ciberterminal.net:9100',
'bvmlm-osgw-002.ciberterminal.net:9100',
'bvmlm-osd-014.ciberterminal.net:9100',
'bvmlm-osm-005.ciberterminal.net:9100',
'bvmlm-osnx-002.ciberterminal.net:9100',
'bvmlm-osd-016.ciberterminal.net:9100',
</code>
===== Prometheus =====
Bare minimal install instructions:
<code bash>
cat >/etc/yum.repos.d/prometheus.repo«EOF
[prometheus]
name=prometheus
baseurl=https://packagecloud.io/prometheus-rpm/release/el/$releasever/$basearch
repogpgcheck=1
enabled=1
gpgkey=https://packagecloud.io/prometheus-rpm/release/gpgkey
https://raw.githubusercontent.com/lest/prometheus-rpm/master/RPM-GPG-KEY-prometheus-rpm
gpgcheck=1
metadataexpire=300
EOF
yum install prometheus2.x8664 \
apacheexporter.x8664 \
collectdexporter.x8664
consulexporter.x8664 \
elasticsearchexporter.x8664 \
graphiteexporter.x8664 \
haproxyexporter.x8664 \
kafkaexporter.x8664 \
memcachedexporter.x8664 \
mysqldexporter.x8664 \
nginxexporter.x8664 \
nodeexporter.x8664 \
postgresexporter.x8664 \
processexporter.x8664 \
pushgateway.x8664 \
rabbitmqexporter.x8664 \
redisexporter.x8664 \
sachet.x8664 \
smokepingprober.x8664 \
snmpexporter.x8664 \
statsdexporter.x8664 \
thanos.x8664
systemctl start prometheus
systemctl enable prometheus
systemctl status prometheus
</code>
Prometheus setup, add scrape config for ceph, for example, in dev with larry:
<file yaml prometheus.yml>
# my global config
global:
scrapeinterval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluationinterval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluationinterval'.
rulefiles:
# - “firstrules.yml”
# - “secondrules.yml”
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrapeconfigs:
# The job name is added as a label
job=<job_name>
to any timeseries scraped from this config.
- jobname: 'prometheus'
# metricspath defaults to '/metrics'
# scheme defaults to 'http'.
staticconfigs:
- targets: ['0.0.0.0:9090']
- jobname: 'ceph-larry'
staticconfigs:
- targets: ['larry.ciberterminal.net:9283']
- jobname: 'node-exporter'
staticconfigs:
- targets: [
'bvmlm-osd-001.ciberterminal.net:9100',
'bvmlm-osd-019.ciberterminal.net:9100',
'bvmlm-osd-013.ciberterminal.net:9100',
'bvmlm-osm-003.ciberterminal.net:9100',
'bvmlm-osd-005.ciberterminal.net:9100',
'bvmlm-oslb-001.ciberterminal.net:9100',
'bvmlm-osd-010.ciberterminal.net:9100',
'bvmlm-osd-003.ciberterminal.net:9100',
'bvmlm-osd-020.ciberterminal.net:9100',
'bvmlm-osfs-003.ciberterminal.net:9100',
'bvmlm-osd-002.ciberterminal.net:9100',
'bvmlm-osm-001.ciberterminal.net:9100',
'bvmlm-osm-004.ciberterminal.net:9100',
'bvmlm-osd-015.ciberterminal.net:9100',
'bvmlm-osd-018.ciberterminal.net:9100',
'bvmlm-osgw-001.ciberterminal.net:9100',
'bvmlm-osd-017.ciberterminal.net:9100',
'bvmlm-osd-011.ciberterminal.net:9100',
'bvmlm-osd-007.ciberterminal.net:9100',
'bvmlm-osgw-004.ciberterminal.net:9100',
'bvmlm-osgw-003.ciberterminal.net:9100',
'bvmlm-osd-006.ciberterminal.net:9100',
'bvmlm-osfs-004.ciberterminal.net:9100',
'bvmlm-osm-002.ciberterminal.net:9100',
'bvmlm-osd-008.ciberterminal.net:9100',
'bvmlm-osfs-002.ciberterminal.net:9100',
'bvmlm-osfs-001.ciberterminal.net:9100',
'bvmlm-osd-004.ciberterminal.net:9100',
'bvmlm-oslb-002.ciberterminal.net:9100',
'bvmlm-osd-012.ciberterminal.net:9100',
'bvmlm-osd-009.ciberterminal.net:9100',
'bvmlm-osgw-002.ciberterminal.net:9100',
'bvmlm-osd-014.ciberterminal.net:9100',
'bvmlm-osm-005.ciberterminal.net:9100',
'bvmlm-osnx-002.ciberterminal.net:9100',
'bvmlm-osd-016.ciberterminal.net:9100'
]
</file>
We will restart and check after setting up the rest of elements
===== grafana =====
* Grafana working
I haven't setup it, so I can't give instructions here xD
Additional setup for grafana to work with ceph:
<code diff>
— grafana.ini 2021-12-24 10:38:20.669668776 +0100
+++ grafana.ini.orig 2021-12-24 12:36:44.083311253 +0100
@@ -185,7 +185,6 @@
# set to true if you want to allow browsers to render Grafana in a <frame>, <iframe>, <embed> or <object>. default is false.
;allowembedding = false
-allowembedding = true
# Set to true if you want to enable http strict transport security (HSTS) response header.
# This is only sent when HTTPS is enabled in this configuration.
@@ -308,16 +307,12 @@
[auth.anonymous]
# enable anonymous access
;enabled = false
-enabled = true
# specify organization name that should be used for unauthenticated users
;orgname = Main Org.
-;orgname = ciberterminal.net
-orgname = ciberterminal DEMO
# specify role for unauthenticated users
;orgrole = Viewer
-org_role = Viewer
#################################### Github Auth ##########################
[auth.github]
</code>
But you'll need the following plugins for grafana:
<code bash>
grafana-cli plugins install vonage-status-panel
grafana-cli plugins install grafana-piechart-panel
</code>
Import all of the officia dashboards
Here you have some nice oneliners to simplify the process:
<code bash>
wget “https://github.com/ceph/ceph/tree/master/monitoring/grafana/dashboards”
for i in $(cat dashboards| egrep json |egrep “dashboard” | awk -F\“ '{print $6}' | egrep “.json”) ; do wget “https://raw.githubusercontent.com/ceph/ceph/master/monitoring/grafana/dashboards/${i}” ; done
for i in *json ; do cat ${i} | jq . >/dev/null && echo ”### OK ${i}“ || echo ”@@@ KO ${i}“ ; done
</code>
And import them with the web-ui (I couldn't import them through API).
Also you'll have to setup prometheus as data-source for grafana and setup the prometheus server:
====== Instructions ======
Following official documentation, on any of the ceph admin nodes:
<code bash>
ceph mgr module enable prometheus
ceph config set mgr mgr/prometheus/serverport 9283
ceph config set mgr mgr/prometheus/serveraddr 0.0.0.0
ceph config set mgr mgr/prometheus/scrape_interval 15
ceph dashboard set-grafana-api-url http://avvmld-graf-001.ciberterminal.net:3000/
ceph dashboard set-grafana-api-ssl-verify False
</code>
You must change grafana url according your setup.
check:
<code bash>
bvmlm-osm-001 /home/bofher # ceph config dump | egrep -v “KEY”
WHO MASK LEVEL OPTION VALUE RO
mgr advanced mgr/dashboard/GRAFANAAPIURL https://grafana-bavel.ciberterminal.net/ *
mgr advanced mgr/prometheus/scrapeinterval 15 *
mgr advanced mgr/prometheus/serveraddr 0.0.0.0 *
mgr advanced mgr/prometheus/server_port 9283 *
bvmlm-osm-001 /home/bofher # ceph mgr services
{
“dashboard”: “https://bvmlm-osm-002.ciberterminal.net:8443/”,
“prometheus”: “http://bvmlm-osm-002.ciberterminal.net:9283/”
}
</code>
haproxy configuration so it magically balance to the working monitor server running dashboard & prometheus module:
<code yaml>
# Fronted for prometheus scrapper
frontend httpweb *:9283
mode http
defaultbackend ceph_prometheus
backend ceph_prometheus
mode http
option httpchk GET /
http-check expect status 200
server monscraper1 bvmlm-osm-001.ciberterminal.net:9283 check verify none
server monscraper2 bvmlm-osm-002.ciberterminal.net:9283 check verify none
server monscraper3 bvmlm-osm-003.ciberterminal.net:9283 check verify none
server monscraper4 bvmlm-osm-004.ciberterminal.net:9283 check verify none
server monscraper5 bvmlm-osm-005.ciberterminal.net:9283 check verify none
</code>
Go and restart prometheus to begin scrapping ceph:
<code bash>
systemctl restart prometheus
systemctl status prometheus
</code>
Check targets on prometheus: http://avmlm-prom-001:9090/targets (change the prometheus server…)
== Need more instructions? RTFM! ==
====== For NX nodes (nginx) ======
Add firewall rules:
<code bash>
firewall-cmd –permanent –zone=public –add-rich-rule='rule family=ipv4 source address=10.40.3.64/32 port port=9100 protocol=tcp accept'
firewall-cmd –zone=public –add-rich-rule='rule family=ipv4 source address=10.40.3.64/32 port port=9100 protocol=tcp accept'
</code>
====== Final thoughts ======