User Tools

Site Tools


linux:ceph:troubleshooting:monitor_crash

[TROUBLESHOOT] ceph-mon: cant start daemon

Documentation
Name: [TROUBLESHOOT] ceph-mon: cant start daemon
Description: how to solve this “issue”
Modification date : 04/06/2020
Owner:dodger
Notify changes to:Owner
Tags:ceph, object storage
Scalate to:The_fucking_bofh

The errors

On mon server

This is a summary, the stack is longer.

/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/ceph-14.2.9/src/mon/AuthMonitor.cc: In fu
nction 'virtual void AuthMonitor::update_from_paxos(bool*)' thread 7efccfa0e040 time 2020-06-04 10:30:26.887956
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/ceph-14.2.9/src/mon/AuthMonitor.cc: 278: 
FAILED ceph_assert(ret == 0)
 ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7efcc6c7e875]
 2: (()+0x253a3d) [0x7efcc6c7ea3d]
 3: (AuthMonitor::update_from_paxos(bool*)+0x1b0a) [0x555ef6812f3a]
 4: (PaxosService::refresh(bool*)+0x103) [0x555ef68a63a3]
 5: (Monitor::refresh_from_paxos(bool*)+0x194) [0x555ef6794514]
 6: (Monitor::init_paxos()+0xfc) [0x555ef67947ec]
 7: (Monitor::preinit()+0xa32) [0x555ef67b3532]
 8: (main()+0x23e2) [0x555ef674cfc2]
 9: (__libc_start_main()+0xf5) [0x7efcc2854555]
 10: (()+0x2332d0) [0x555ef677e2d0]
*** Caught signal (Aborted) **
 in thread 7efccfa0e040 thread_name:ceph-mon
2020-06-04 10:30:26.888 7efccfa0e040 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/c
eph-14.2.9/src/mon/AuthMonitor.cc: In function 'virtual void AuthMonitor::update_from_paxos(bool*)' thread 7efccfa0e040 time 2020-06-04 10:30:26.887956
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/ceph-14.2.9/src/mon/AuthMonitor.cc: 278: 
FAILED ceph_assert(ret == 0)

Keywords are:

'virtual void AuthMonitor::update_from_paxos(bool*)'

On ceph health

    mon: 5 daemons, quorum bvmlm-osm-001,bvmlm-osm-003,bvmlm-osm-004,bvmlm-osm-005 (age 2d), out of quorum: bvmlm-osm-002

The solution

Re-deploy the monitor, on any admin node:

ceph-deploy mon destroy bvmlm-osm-002
ceph-deploy mon create bvmlm-osm-002.ciberterminal.net

The Reason

Found on: https://access.redhat.com/solutions/4721981
Quote from there:
It is likely that monitor store.db is corrupted and hence asserts are happening.

linux/ceph/troubleshooting/monitor_crash.txt · Last modified: 2022/02/11 11:36 (external edit)