User Tools

Site Tools


ceph:troubleshooting:pg_degraded_redundancy

[TROUBLESHOOT] PG_DEGRADED Degraded data redundancy

Documentation
Name: [TROUBLESHOOT] PG_DEGRADED Degraded data redundancy
Description: how to solve this “issue”
Modification date :13/06/2019
Owner:dodger@ciberterminal.net
Notify changes to:Owner
Tags:ceph, object storage

The error

avmlp-osm-001 /etc/ceph # ceph health 
HEALTH_WARN Degraded data redundancy: 1329 pgs undersized
avmlp-osm-001 /etc/ceph # ceph health detail
HEALTH_WARN Degraded data redundancy: 1329 pgs undersized
PG_DEGRADED Degraded data redundancy: 1329 pgs undersized
    pg 1.168 is stuck undersized for 72516.102692, current state active+undersized+remapped, last acting [14,4,2]
    pg 1.169 is stuck undersized for 72516.105401, current state active+undersized+remapped, last acting [18,2,6]
    pg 1.16a is stuck undersized for 72516.098682, current state active+undersized+remapped, last acting [8,0,10]
    pg 1.16b is stuck undersized for 72516.102384, current state active+undersized+remapped, last acting [8,12,6]
    pg 1.16d is stuck undersized for 72516.105526, current state active+undersized+remapped, last acting [18,2,10]
    pg 1.16e is stuck undersized for 72516.097173, current state active+undersized+remapped, last acting [4,8,2]
    pg 1.16f is stuck undersized for 72516.105602, current state active+undersized+remapped, last acting [6,18,2]
    pg 1.170 is stuck undersized for 72516.102068, current state active+undersized+remapped, last acting [14,4,2]
    pg 1.171 is stuck undersized for 72516.101080, current state active+undersized+remapped, last acting [14,2,6]
    pg 1.172 is stuck undersized for 72516.108720, current state active+undersized+remapped, last acting [16,0,10]
    pg 1.173 is stuck undersized for 72516.102557, current state active+undersized+remapped, last acting [12,6,14]
    pg 1.174 is stuck undersized for 72516.103496, current state active+undersized+remapped, last acting [18,4,2]
    pg 1.175 is stuck undersized for 72516.101506, current state active+undersized+remapped, last acting [4,2,10]
    pg 1.176 is stuck undersized for 72516.098652, current state active+undersized+remapped, last acting [2,8,4]
    pg 1.177 is stuck undersized for 72516.104113, current state active+undersized+remapped, last acting [0,18,2]
    pg 1.1f0 is stuck undersized for 72516.100203, current state active+undersized+remapped, last acting [8,4,2]
    pg 1.1f1 is stuck undersized for 72516.104306, current state active+undersized+remapped, last acting [18,2,6]
    pg 1.1f2 is stuck undersized for 72516.101514, current state active+undersized+remapped, last acting [14,0,10]
    pg 1.1f3 is stuck undersized for 72516.101139, current state active+undersized+remapped, last acting [12,6,14]
    pg 2.168 is stuck undersized for 72491.414882, current state active+undersized+remapped, last acting [18,14,16]
    pg 2.169 is stuck undersized for 72491.416090, current state active+undersized+remapped, last acting [6,0,10]
    pg 2.16a is stuck undersized for 72491.408729, current state active+undersized+remapped, last acting [10,0,14]
    pg 2.16b is stuck undersized for 72491.409007, current state active+undersized+remapped, last acting [10,2,18]
    pg 2.16c is stuck undersized for 72491.413090, current state active+undersized+remapped, last acting [0,12,18]
    pg 2.16d is stuck undersized for 72491.415706, current state active+undersized+remapped, last acting [16,0,12]
    pg 2.16e is stuck undersized for 72491.411571, current state active+undersized+remapped, last acting [8,2,12]
    pg 2.16f is stuck undersized for 72491.414464, current state active+undersized+remapped, last acting [10,16,18]
    pg 2.170 is stuck undersized for 72491.405008, current state active+undersized+remapped, last acting [18,14,16]
    pg 2.171 is stuck undersized for 72491.411614, current state active+undersized+remapped, last acting [6,0,10]
    pg 2.172 is stuck undersized for 72491.411492, current state active+undersized+remapped, last acting [10,0,14]
    pg 2.173 is stuck undersized for 72491.413998, current state active+undersized+remapped, last acting [10,2,18]
    pg 2.174 is stuck undersized for 72491.406730, current state active+undersized+remapped, last acting [0,12,18]
    pg 2.175 is stuck undersized for 72491.416080, current state active+undersized+remapped, last acting [16,0,12]
    pg 2.176 is stuck undersized for 72491.409336, current state active+undersized+remapped, last acting [8,2,12]
    pg 2.177 is stuck undersized for 72491.415159, current state active+undersized+remapped, last acting [10,16,18]
    pg 3.168 is stuck undersized for 72490.394675, current state active+undersized+remapped, last acting [2,4,6]
    pg 3.169 is stuck undersized for 72490.399679, current state active+undersized+remapped, last acting [6,2,18]
    pg 3.16a is stuck undersized for 72490.381105, current state active+undersized+remapped, last acting [12,2,14]
    pg 3.16b is stuck undersized for 72490.386668, current state active+undersized+remapped, last acting [6,0,18]
    pg 3.16c is stuck undersized for 72490.396841, current state active+undersized+remapped, last acting [0,12,14]
    pg 3.16d is stuck undersized for 72490.393868, current state active+undersized+remapped, last acting [12,10,18]
    pg 3.16e is stuck undersized for 72490.396822, current state active+undersized+remapped, last acting [12,4,6]
    pg 3.16f is stuck undersized for 72490.393344, current state active+undersized+remapped, last acting [6,2,8]
    pg 3.170 is stuck undersized for 72490.391844, current state active+undersized+remapped, last acting [2,4,6]
    pg 3.171 is stuck undersized for 72490.398862, current state active+undersized+remapped, last acting [6,2,18]
    pg 3.172 is stuck undersized for 72490.393152, current state active+undersized+remapped, last acting [12,2,14]
    pg 3.173 is stuck undersized for 72490.398763, current state active+undersized+remapped, last acting [6,0,18]
    pg 3.174 is stuck undersized for 72490.397356, current state active+undersized+remapped, last acting [0,12,14]
    pg 3.175 is stuck undersized for 72490.398161, current state active+undersized+remapped, last acting [12,10,18]
    pg 3.176 is stuck undersized for 72490.393288, current state active+undersized+remapped, last acting [12,4,6]
    pg 3.177 is stuck undersized for 72490.397813, current state active+undersized+remapped, last acting [6,2,8]

Official Documentation: http://docs.ceph.com/docs/master/rados/operations/health-checks/#pg-degraded

The solution

The problem is that the pool redundancy is higher than available, reducing pool redundancy will remove the alert.
THIS MUST BE STUDIED!!

ceph@avmlp-osm-001 ~/ceph-deploy $ sudo ceph osd pool ls detail
pool 1 '.rgw.root' replicated size 4 min_size 2 crush_rule 1 object_hash rjenkins pg_num 500 pgp_num 500 autoscale_mode warn last_change 665 lfor 0/0/220 flags hashpspool stripe_width 0 application rgw
pool 2 'default.rgw.control' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 390 pgp_num 8 pg_num_target 60 pgp_num_target 60 pg_num_pending 389 autoscale_mode warn last_change 668 lfor 0/668/668 flags hashpspool stripe_width 0 application rgw
pool 3 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 390 pgp_num 8 pg_num_target 60 pgp_num_target 60 pg_num_pending 389 autoscale_mode warn last_change 668 lfor 0/668/668 flags hashpspool stripe_width 0 application rgw
pool 4 'default.rgw.log' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 60 pgp_num 8 pgp_num_target 60 autoscale_mode warn last_change 269 lfor 0/0/243 flags hashpspool stripe_width 0 application rgw
avmlp-osm-001 /home/ceph # ceph health
HEALTH_WARN Degraded data redundancy: 1329 pgs undersized
avmlp-osm-001 /home/ceph # for i in $(ceph osd pool ls) ; do  ceph osd pool set ${i} size 3 ; done                                                                                                                                                                                     
set pool 1 size to 3
set pool 2 size to 3
set pool 3 size to 3
set pool 4 size to 3
avmlp-osm-001 /home/ceph # ceph health
HEALTH_WARN Reduced data availability: 197 pgs peering; Degraded data redundancy: 196 pgs undersized
avmlp-osm-001 /home/ceph # ceph health
HEALTH_OK
avmlp-osm-001 /home/ceph # ceph osd pool ls detail
pool 1 '.rgw.root' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 500 pgp_num 500 autoscale_mode warn last_change 696 lfor 0/0/220 flags hashpspool stripe_width 0 application rgw
pool 2 'default.rgw.control' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 339 pgp_num 8 pg_num_target 60 pgp_num_target 60 pg_num_pending 338 autoscale_mode warn last_change 880 lfor 0/880/880 flags hashpspool stripe_width 0 application rgw
pool 3 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 339 pgp_num 8 pg_num_target 60 pgp_num_target 60 pg_num_pending 338 autoscale_mode warn last_change 880 lfor 0/880/880 flags hashpspool stripe_width 0 application rgw
pool 4 'default.rgw.log' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 60 pgp_num 8 pgp_num_target 60 autoscale_mode warn last_change 699 lfor 0/0/243 flags hashpspool stripe_width 0 application rgw

Maybe also ceph osd repair will work:

ceph/troubleshooting/pg_degraded_redundancy.txt · Last modified: 2019/07/18 09:17 (external edit)