User Tools

Site Tools


linux:ceph:troubleshooting:pg_degraded_undersized

[TROUBLESHOOT] PG_DEGRADED: inactive

Documentation
Name: [TROUBLESHOOT] PG_DEGRADED: inactive
Description: how to solve this “issue”
Modification date : 25/07/2019
Owner:dodger
Notify changes to:Owner
Tags:ceph, object storage
Scalate to:The_fucking_bofh

The errors

HEALTH_WARN Reduced data availability: 40 pgs inactive; Degraded data redundancy: 52656/2531751 objects degraded (2.080%), 30 pgs degraded, 780 pgs undersized
PG_AVAILABILITY Reduced data availability: 40 pgs inactive
    pg 24.1 is stuck inactive for 57124.776905, current state undersized+peered, last acting [16]
    pg 24.3 is stuck inactive for 57196.756183, current state undersized+peered, last acting [14]
    pg 24.15 is stuck inactive for 57196.769225, current state undersized+peered, last acting [6]
    pg 24.22 is stuck inactive for 57124.781368, current state undersized+peered, last acting [18]
    pg 24.2a is stuck inactive for 57124.776592, current state undersized+peered, last acting [16]
    pg 26.39 is stuck inactive for 57148.799116, current state undersized+peered, last acting [16]
    pg 27.13 is stuck inactive for 57148.794318, current state undersized+degraded+peered, last acting [10]
    pg 27.1c is stuck inactive for 57196.754097, current state undersized+degraded+peered, last acting [16]
    pg 27.22 is stuck inactive for 57124.769972, current state undersized+degraded+peered, last acting [10]
...
...
PG_DEGRADED Degraded data redundancy: 52656/2531751 objects degraded (2.080%), 30 pgs degraded, 780 pgs undersized
    pg 29.5b is stuck undersized for 57219.217454, current state active+undersized+remapped, last acting [6,14]
    pg 29.5c is stuck undersized for 57110.686713, current state active+undersized+remapped, last acting [12,2]
    pg 29.5d is stuck undersized for 57131.448252, current state active+undersized+remapped, last acting [8,10]
    pg 29.5e is stuck undersized for 57154.989293, current state active+undersized+remapped, last acting [14,18]
    pg 29.5f is stuck undersized for 57194.741017, current state active+undersized+remapped, last acting [6,16]
    pg 29.60 is stuck undersized for 57170.144684, current state active+undersized+remapped, last acting [0,10]
    pg 29.63 is stuck undersized for 57147.771698, current state active+undersized+remapped, last acting [10,0]
...
...
avmlp-osm-001 /var/log/ceph # ceph -s   0         0           0          0  181      active+clean+remapped   15h    4052'181    4176:1121         [8]p8   [8,16,18]p8 2019-07-24 18:32:14.577516 2019-07-18 09:13:17.981502 
  cluster:
    id:     aefcf554-f949-4457-a049-0bfb432e40c4
    health: HEALTH_WARN
            Reduced data availability: 40 pgs inactive
            Degraded data redundancy: 52656/2531751 objects degraded (2.080%), 30 pgs degraded, 780 pgs undersized
 
  services:
    mon: 6 daemons, quorum avmlp-osm-001,avmlp-osm-002,avmlp-osm-003,avmlp-osm-004,avmlp-osm-006,avmlp-osm-005 (age 22h)
    mgr: avmlp-osm-002.ciberterminal.net(active, since 23h), standbys: avmlp-osm-004.ciberterminal.net, avmlp-osm-003.ciberterminal.net, avmlp-osm-001.ciberterminal.net
    mds: cephfs:1 {0=avmlp-osfs-002.ciberterminal.net=up:active} 3 up:standby
    osd: 20 osds: 20 up (since 15h), 20 in (since 6w); 1132 remapped pgs
    rgw: 1 daemon active (avmlp-osgw-004.ciberterminal.net)
 
  data:
    pools:   10 pools, 1232 pgs
    objects: 843.92k objects, 49 GiB
    usage:   264 GiB used, 40 TiB / 40 TiB avail
    pgs:     3.247% pgs not active
             52656/2531751 objects degraded (2.080%)
             1635162/2531751 objects misplaced (64.586%)
             740 active+undersized+remapped
             392 active+clean+remapped
             60  active+clean
             30  undersized+degraded+peered
             10  undersized+peered

Official Documentation: http://docs.ceph.com/docs/master/rados/operations/health-checks/#pg-degraded

The solution

Force ceph to move… I think that this is not a real solution but a patch…
Just change pool size and min_size forcing ceph to re-balance data:
See actual values:

ceph osd pool ls detail


And increase them by 1 for example:

for POOL_NAME in $(ceph osd pool ls) ; do let SIZE=$(ceph osd pool get ${POOL_NAME} size|awk '{print $2}') ; let MINSIZE=$(ceph osd pool get ${POOL_NAME} min_size|awk '{print $2}') ; let SIZE++ ; let MINSIZE++ ;echo "ceph osd set ${POOL_NAME} size ${SIZE} min_size ${MINSIZE}"   ; done

Upper one-liner-of-the-dead only shows the command to execute, it does not execute the command :-P

linux/ceph/troubleshooting/pg_degraded_undersized.txt · Last modified: 2022/02/11 11:36 (external edit)