====== [TROUBLESHOOT] ceph-mon: cant start daemon ====== ^ Documentation ^| ^Name:| [TROUBLESHOOT] ceph-mon: cant start daemon | ^Description:| how to solve this "issue" | ^Modification date :| 04/06/2020| ^Owner:|dodger| ^Notify changes to:|Owner | ^Tags:|ceph, object storage | ^Scalate to:|The_fucking_bofh| ====== The errors ====== ===== On mon server ===== This is a summary, the stack is longer. /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/ceph-14.2.9/src/mon/AuthMonitor.cc: In fu nction 'virtual void AuthMonitor::update_from_paxos(bool*)' thread 7efccfa0e040 time 2020-06-04 10:30:26.887956 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/ceph-14.2.9/src/mon/AuthMonitor.cc: 278: FAILED ceph_assert(ret == 0) ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7efcc6c7e875] 2: (()+0x253a3d) [0x7efcc6c7ea3d] 3: (AuthMonitor::update_from_paxos(bool*)+0x1b0a) [0x555ef6812f3a] 4: (PaxosService::refresh(bool*)+0x103) [0x555ef68a63a3] 5: (Monitor::refresh_from_paxos(bool*)+0x194) [0x555ef6794514] 6: (Monitor::init_paxos()+0xfc) [0x555ef67947ec] 7: (Monitor::preinit()+0xa32) [0x555ef67b3532] 8: (main()+0x23e2) [0x555ef674cfc2] 9: (__libc_start_main()+0xf5) [0x7efcc2854555] 10: (()+0x2332d0) [0x555ef677e2d0] *** Caught signal (Aborted) ** in thread 7efccfa0e040 thread_name:ceph-mon 2020-06-04 10:30:26.888 7efccfa0e040 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/c eph-14.2.9/src/mon/AuthMonitor.cc: In function 'virtual void AuthMonitor::update_from_paxos(bool*)' thread 7efccfa0e040 time 2020-06-04 10:30:26.887956 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/ceph-14.2.9/src/mon/AuthMonitor.cc: 278: FAILED ceph_assert(ret == 0) Keywords are: 'virtual void AuthMonitor::update_from_paxos(bool*)' ===== On ceph health ===== mon: 5 daemons, quorum bvmlm-osm-001,bvmlm-osm-003,bvmlm-osm-004,bvmlm-osm-005 (age 2d), out of quorum: bvmlm-osm-002 ====== The solution ====== Re-deploy the monitor, on any admin node: ceph-deploy mon destroy bvmlm-osm-002 ceph-deploy mon create bvmlm-osm-002.ciberterminal.net ====== The Reason ====== Found on: [[https://access.redhat.com/solutions/4721981]]\\ Quote from there:\\ ''It is likely that monitor store.db is corrupted and hence asserts are happening.''