Documentation | |
---|---|
Name: | [HOWTO] Ceph+grafana+prometheus |
Description: | How to setup ceph with prometheus and grafana for advanced statistics |
Modification date : | 24/12/2021 |
Owner: | dodger |
Notify changes to: | Owner |
Tags: | ceph, object storage |
Scalate to: | The_fucking_bofh |
From the salt-master:
export THEHOSTNAME='avmlp-os*' salt "${THEHOSTNAME}" test.ping salt "${THEHOSTNAME}" pkg.install golang-github-prometheus-node-exporter salt "${THEHOSTNAME}" service.start node_exporter salt "${THEHOSTNAME}" service.enable node_exporter salt "${THEHOSTNAME}" service.status node_exporter
Check:
salt "${THEHOSTNAME}" cmd.run "netstat -nap | egrep 9100 | egrep LISTEN"
Obtain the list of nodes for configuring prometheus to scrape the node_exporter
:
salt "${THEHOSTNAME}" service.status node_exporter | grep "^${THEHOSTNAME}" | awk -F\: '{print "\047"$1":9100\047,"}'
Example:
root@avmlm-salt-001 /home/bofher/scripts/nutanix_buster $ salt "${THEHOSTNAME}" service.status node_exporter | grep "^${THEHOSTNAME}" | awk -F\: '{print "\047"$1":9100\047,"}' 'bvmlm-osd-001.ciberterminal.net:9100', 'bvmlm-osd-019.ciberterminal.net:9100', 'bvmlm-osd-013.ciberterminal.net:9100', 'bvmlm-osm-003.ciberterminal.net:9100', 'bvmlm-osd-005.ciberterminal.net:9100', 'bvmlm-oslb-001.ciberterminal.net:9100', 'bvmlm-osd-010.ciberterminal.net:9100', 'bvmlm-osd-003.ciberterminal.net:9100', 'bvmlm-osd-020.ciberterminal.net:9100', 'bvmlm-osfs-003.ciberterminal.net:9100', 'bvmlm-osd-002.ciberterminal.net:9100', 'bvmlm-osm-001.ciberterminal.net:9100', 'bvmlm-osm-004.ciberterminal.net:9100', 'bvmlm-osd-015.ciberterminal.net:9100', 'bvmlm-osd-018.ciberterminal.net:9100', 'bvmlm-osgw-001.ciberterminal.net:9100', 'bvmlm-osd-017.ciberterminal.net:9100', 'bvmlm-osd-011.ciberterminal.net:9100', 'bvmlm-osd-007.ciberterminal.net:9100', 'bvmlm-osgw-004.ciberterminal.net:9100', 'bvmlm-osgw-003.ciberterminal.net:9100', 'bvmlm-osd-006.ciberterminal.net:9100', 'bvmlm-osfs-004.ciberterminal.net:9100', 'bvmlm-osm-002.ciberterminal.net:9100', 'bvmlm-osd-008.ciberterminal.net:9100', 'bvmlm-osfs-002.ciberterminal.net:9100', 'bvmlm-osfs-001.ciberterminal.net:9100', 'bvmlm-osd-004.ciberterminal.net:9100', 'bvmlm-oslb-002.ciberterminal.net:9100', 'bvmlm-osd-012.ciberterminal.net:9100', 'bvmlm-osd-009.ciberterminal.net:9100', 'bvmlm-osgw-002.ciberterminal.net:9100', 'bvmlm-osd-014.ciberterminal.net:9100', 'bvmlm-osm-005.ciberterminal.net:9100', 'bvmlm-osnx-002.ciberterminal.net:9100', 'bvmlm-osd-016.ciberterminal.net:9100',
Bare minimal install instructions:
cat >/etc/yum.repos.d/prometheus.repo<<EOF [prometheus] name=prometheus baseurl=https://packagecloud.io/prometheus-rpm/release/el/$releasever/$basearch repo_gpgcheck=1 enabled=1 gpgkey=https://packagecloud.io/prometheus-rpm/release/gpgkey https://raw.githubusercontent.com/lest/prometheus-rpm/master/RPM-GPG-KEY-prometheus-rpm gpgcheck=1 metadata_expire=300 EOF yum install prometheus2.x86_64 \ apache_exporter.x86_64 \ collectd_exporter.x86_64 consul_exporter.x86_64 \ elasticsearch_exporter.x86_64 \ graphite_exporter.x86_64 \ haproxy_exporter.x86_64 \ kafka_exporter.x86_64 \ memcached_exporter.x86_64 \ mysqld_exporter.x86_64 \ nginx_exporter.x86_64 \ node_exporter.x86_64 \ postgres_exporter.x86_64 \ process_exporter.x86_64 \ pushgateway.x86_64 \ rabbitmq_exporter.x86_64 \ redis_exporter.x86_64 \ sachet.x86_64 \ smokeping_prober.x86_64 \ snmp_exporter.x86_64 \ statsd_exporter.x86_64 \ thanos.x86_64 systemctl start prometheus systemctl enable prometheus systemctl status prometheus
Prometheus setup, add scrape config for ceph, for example, in dev with larry:
# my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: # - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['0.0.0.0:9090'] - job_name: 'ceph-larry' static_configs: - targets: ['larry.ciberterminal.net:9283'] - job_name: 'node-exporter' static_configs: - targets: [ 'bvmlm-osd-001.ciberterminal.net:9100', 'bvmlm-osd-019.ciberterminal.net:9100', 'bvmlm-osd-013.ciberterminal.net:9100', 'bvmlm-osm-003.ciberterminal.net:9100', 'bvmlm-osd-005.ciberterminal.net:9100', 'bvmlm-oslb-001.ciberterminal.net:9100', 'bvmlm-osd-010.ciberterminal.net:9100', 'bvmlm-osd-003.ciberterminal.net:9100', 'bvmlm-osd-020.ciberterminal.net:9100', 'bvmlm-osfs-003.ciberterminal.net:9100', 'bvmlm-osd-002.ciberterminal.net:9100', 'bvmlm-osm-001.ciberterminal.net:9100', 'bvmlm-osm-004.ciberterminal.net:9100', 'bvmlm-osd-015.ciberterminal.net:9100', 'bvmlm-osd-018.ciberterminal.net:9100', 'bvmlm-osgw-001.ciberterminal.net:9100', 'bvmlm-osd-017.ciberterminal.net:9100', 'bvmlm-osd-011.ciberterminal.net:9100', 'bvmlm-osd-007.ciberterminal.net:9100', 'bvmlm-osgw-004.ciberterminal.net:9100', 'bvmlm-osgw-003.ciberterminal.net:9100', 'bvmlm-osd-006.ciberterminal.net:9100', 'bvmlm-osfs-004.ciberterminal.net:9100', 'bvmlm-osm-002.ciberterminal.net:9100', 'bvmlm-osd-008.ciberterminal.net:9100', 'bvmlm-osfs-002.ciberterminal.net:9100', 'bvmlm-osfs-001.ciberterminal.net:9100', 'bvmlm-osd-004.ciberterminal.net:9100', 'bvmlm-oslb-002.ciberterminal.net:9100', 'bvmlm-osd-012.ciberterminal.net:9100', 'bvmlm-osd-009.ciberterminal.net:9100', 'bvmlm-osgw-002.ciberterminal.net:9100', 'bvmlm-osd-014.ciberterminal.net:9100', 'bvmlm-osm-005.ciberterminal.net:9100', 'bvmlm-osnx-002.ciberterminal.net:9100', 'bvmlm-osd-016.ciberterminal.net:9100' ]
We will restart and check after setting up the rest of elements
I haven't setup it, so I can't give instructions here xD
Additional setup for grafana to work with ceph:
--- grafana.ini 2021-12-24 10:38:20.669668776 +0100 +++ grafana.ini.orig 2021-12-24 12:36:44.083311253 +0100 @@ -185,7 +185,6 @@ # set to true if you want to allow browsers to render Grafana in a <frame>, <iframe>, <embed> or <object>. default is false. ;allow_embedding = false -allow_embedding = true # Set to true if you want to enable http strict transport security (HSTS) response header. # This is only sent when HTTPS is enabled in this configuration. @@ -308,16 +307,12 @@ [auth.anonymous] # enable anonymous access ;enabled = false -enabled = true # specify organization name that should be used for unauthenticated users ;org_name = Main Org. -;org_name = ciberterminal.net -org_name = ciberterminal DEMO # specify role for unauthenticated users ;org_role = Viewer -org_role = Viewer #################################### Github Auth ########################## [auth.github]
But you'll need the following plugins for grafana:
grafana-cli plugins install vonage-status-panel grafana-cli plugins install grafana-piechart-panel
Import all of the officia dashboards
Here you have some nice oneliners to simplify the process:
wget "https://github.com/ceph/ceph/tree/master/monitoring/grafana/dashboards" for i in $(cat dashboards| egrep json |egrep "dashboard" | awk -F\" '{print $6}' | egrep "\.json") ; do wget "https://raw.githubusercontent.com/ceph/ceph/master/monitoring/grafana/dashboards/${i}" ; done for i in *json ; do cat ${i} | jq . >/dev/null && echo "### OK ${i}" || echo "@@@ KO ${i}" ; done
And import them with the web-ui (I couldn't import them through API).
Also you'll have to setup prometheus as data-source for grafana and setup the prometheus server:
Following official documentation, on any of the ceph admin nodes:
ceph mgr module enable prometheus ceph config set mgr mgr/prometheus/server_port 9283 ceph config set mgr mgr/prometheus/server_addr 0.0.0.0 ceph config set mgr mgr/prometheus/scrape_interval 15 ceph dashboard set-grafana-api-url http://avvmld-graf-001.ciberterminal.net:3000/ ceph dashboard set-grafana-api-ssl-verify False
You must change grafana url according your setup.
check:
bvmlm-osm-001 /home/bofher # ceph config dump | egrep -v "KEY" WHO MASK LEVEL OPTION VALUE RO mgr advanced mgr/dashboard/GRAFANA_API_URL https://grafana-bavel.ciberterminal.net/ * mgr advanced mgr/prometheus/scrape_interval 15 * mgr advanced mgr/prometheus/server_addr 0.0.0.0 * mgr advanced mgr/prometheus/server_port 9283 * bvmlm-osm-001 /home/bofher # ceph mgr services { "dashboard": "https://bvmlm-osm-002.ciberterminal.net:8443/", "prometheus": "http://bvmlm-osm-002.ciberterminal.net:9283/" }
haproxy configuration so it magically balance to the working monitor server running dashboard & prometheus module:
# Fronted for prometheus scrapper frontend http_web *:9283 mode http default_backend ceph_prometheus backend ceph_prometheus mode http option httpchk GET / http-check expect status 200 server monscraper1 bvmlm-osm-001.ciberterminal.net:9283 check verify none server monscraper2 bvmlm-osm-002.ciberterminal.net:9283 check verify none server monscraper3 bvmlm-osm-003.ciberterminal.net:9283 check verify none server monscraper4 bvmlm-osm-004.ciberterminal.net:9283 check verify none server monscraper5 bvmlm-osm-005.ciberterminal.net:9283 check verify none
Go and restart prometheus to begin scrapping ceph:
systemctl restart prometheus systemctl status prometheus
Check targets on prometheus: http://avmlm-prom-001:9090/targets (change the prometheus server…)
Add firewall rules:
firewall-cmd --permanent --zone=public --add-rich-rule='rule family=ipv4 source address=10.40.3.64/32 port port=9100 protocol=tcp accept' firewall-cmd --zone=public --add-rich-rule='rule family=ipv4 source address=10.40.3.64/32 port port=9100 protocol=tcp accept'