====== [TROUBLESHOOT] ceph-mon: cant start daemon ======
^ Documentation ^|
^Name:| [TROUBLESHOOT] ceph-mon: cant start daemon |
^Description:| how to solve this "issue" |
^Modification date :| 04/06/2020|
^Owner:|dodger|
^Notify changes to:|Owner |
^Tags:|ceph, object storage |
^Scalate to:|The_fucking_bofh|
====== The errors ======
===== On mon server =====
This is a summary, the stack is longer.
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/ceph-14.2.9/src/mon/AuthMonitor.cc: In fu
nction 'virtual void AuthMonitor::update_from_paxos(bool*)' thread 7efccfa0e040 time 2020-06-04 10:30:26.887956
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/ceph-14.2.9/src/mon/AuthMonitor.cc: 278:
FAILED ceph_assert(ret == 0)
ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7efcc6c7e875]
2: (()+0x253a3d) [0x7efcc6c7ea3d]
3: (AuthMonitor::update_from_paxos(bool*)+0x1b0a) [0x555ef6812f3a]
4: (PaxosService::refresh(bool*)+0x103) [0x555ef68a63a3]
5: (Monitor::refresh_from_paxos(bool*)+0x194) [0x555ef6794514]
6: (Monitor::init_paxos()+0xfc) [0x555ef67947ec]
7: (Monitor::preinit()+0xa32) [0x555ef67b3532]
8: (main()+0x23e2) [0x555ef674cfc2]
9: (__libc_start_main()+0xf5) [0x7efcc2854555]
10: (()+0x2332d0) [0x555ef677e2d0]
*** Caught signal (Aborted) **
in thread 7efccfa0e040 thread_name:ceph-mon
2020-06-04 10:30:26.888 7efccfa0e040 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/c
eph-14.2.9/src/mon/AuthMonitor.cc: In function 'virtual void AuthMonitor::update_from_paxos(bool*)' thread 7efccfa0e040 time 2020-06-04 10:30:26.887956
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/ceph-14.2.9/src/mon/AuthMonitor.cc: 278:
FAILED ceph_assert(ret == 0)
Keywords are:
'virtual void AuthMonitor::update_from_paxos(bool*)'
===== On ceph health =====
mon: 5 daemons, quorum bvmlm-osm-001,bvmlm-osm-003,bvmlm-osm-004,bvmlm-osm-005 (age 2d), out of quorum: bvmlm-osm-002
====== The solution ======
Re-deploy the monitor, on any admin node:
ceph-deploy mon destroy bvmlm-osm-002
ceph-deploy mon create bvmlm-osm-002.ciberterminal.net
====== The Reason ======
Found on: [[https://access.redhat.com/solutions/4721981]]\\
Quote from there:\\
''It is likely that monitor store.db is corrupted and hence asserts are happening.''