User Tools

Site Tools


ceph:troubleshooting:pg_degraded_redundancy

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

ceph:troubleshooting:pg_degraded_redundancy [2019/07/18 07:17] (current)
Line 1: Line 1:
 +====== [TROUBLESHOOT] PG_DEGRADED Degraded data redundancy ======
 +
 +^  Documentation ​ ^|
 +^Name:| [TROUBLESHOOT] PG_DEGRADED Degraded data redundancy |
 +^Description:​| how to solve this "​issue"​ |
 +^Modification date :​|13/​06/​2019|
 +^Owner:​|dodger@ciberterminal.net|
 +^Notify changes to:|Owner |
 +^Tags:​|ceph,​ object storage |
 +
 +
 +====== The error ======
 +
 +<code bash>
 +avmlp-osm-001 /etc/ceph # ceph health ​
 +HEALTH_WARN Degraded data redundancy: 1329 pgs undersized
 +avmlp-osm-001 /etc/ceph # ceph health detail
 +HEALTH_WARN Degraded data redundancy: 1329 pgs undersized
 +PG_DEGRADED Degraded data redundancy: 1329 pgs undersized
 +    pg 1.168 is stuck undersized for 72516.102692,​ current state active+undersized+remapped,​ last acting [14,4,2]
 +    pg 1.169 is stuck undersized for 72516.105401,​ current state active+undersized+remapped,​ last acting [18,2,6]
 +    pg 1.16a is stuck undersized for 72516.098682,​ current state active+undersized+remapped,​ last acting [8,0,10]
 +    pg 1.16b is stuck undersized for 72516.102384,​ current state active+undersized+remapped,​ last acting [8,12,6]
 +    pg 1.16d is stuck undersized for 72516.105526,​ current state active+undersized+remapped,​ last acting [18,2,10]
 +    pg 1.16e is stuck undersized for 72516.097173,​ current state active+undersized+remapped,​ last acting [4,8,2]
 +    pg 1.16f is stuck undersized for 72516.105602,​ current state active+undersized+remapped,​ last acting [6,18,2]
 +    pg 1.170 is stuck undersized for 72516.102068,​ current state active+undersized+remapped,​ last acting [14,4,2]
 +    pg 1.171 is stuck undersized for 72516.101080,​ current state active+undersized+remapped,​ last acting [14,2,6]
 +    pg 1.172 is stuck undersized for 72516.108720,​ current state active+undersized+remapped,​ last acting [16,0,10]
 +    pg 1.173 is stuck undersized for 72516.102557,​ current state active+undersized+remapped,​ last acting [12,6,14]
 +    pg 1.174 is stuck undersized for 72516.103496,​ current state active+undersized+remapped,​ last acting [18,4,2]
 +    pg 1.175 is stuck undersized for 72516.101506,​ current state active+undersized+remapped,​ last acting [4,2,10]
 +    pg 1.176 is stuck undersized for 72516.098652,​ current state active+undersized+remapped,​ last acting [2,8,4]
 +    pg 1.177 is stuck undersized for 72516.104113,​ current state active+undersized+remapped,​ last acting [0,18,2]
 +    pg 1.1f0 is stuck undersized for 72516.100203,​ current state active+undersized+remapped,​ last acting [8,4,2]
 +    pg 1.1f1 is stuck undersized for 72516.104306,​ current state active+undersized+remapped,​ last acting [18,2,6]
 +    pg 1.1f2 is stuck undersized for 72516.101514,​ current state active+undersized+remapped,​ last acting [14,0,10]
 +    pg 1.1f3 is stuck undersized for 72516.101139,​ current state active+undersized+remapped,​ last acting [12,6,14]
 +    pg 2.168 is stuck undersized for 72491.414882,​ current state active+undersized+remapped,​ last acting [18,14,16]
 +    pg 2.169 is stuck undersized for 72491.416090,​ current state active+undersized+remapped,​ last acting [6,0,10]
 +    pg 2.16a is stuck undersized for 72491.408729,​ current state active+undersized+remapped,​ last acting [10,0,14]
 +    pg 2.16b is stuck undersized for 72491.409007,​ current state active+undersized+remapped,​ last acting [10,2,18]
 +    pg 2.16c is stuck undersized for 72491.413090,​ current state active+undersized+remapped,​ last acting [0,12,18]
 +    pg 2.16d is stuck undersized for 72491.415706,​ current state active+undersized+remapped,​ last acting [16,0,12]
 +    pg 2.16e is stuck undersized for 72491.411571,​ current state active+undersized+remapped,​ last acting [8,2,12]
 +    pg 2.16f is stuck undersized for 72491.414464,​ current state active+undersized+remapped,​ last acting [10,16,18]
 +    pg 2.170 is stuck undersized for 72491.405008,​ current state active+undersized+remapped,​ last acting [18,14,16]
 +    pg 2.171 is stuck undersized for 72491.411614,​ current state active+undersized+remapped,​ last acting [6,0,10]
 +    pg 2.172 is stuck undersized for 72491.411492,​ current state active+undersized+remapped,​ last acting [10,0,14]
 +    pg 2.173 is stuck undersized for 72491.413998,​ current state active+undersized+remapped,​ last acting [10,2,18]
 +    pg 2.174 is stuck undersized for 72491.406730,​ current state active+undersized+remapped,​ last acting [0,12,18]
 +    pg 2.175 is stuck undersized for 72491.416080,​ current state active+undersized+remapped,​ last acting [16,0,12]
 +    pg 2.176 is stuck undersized for 72491.409336,​ current state active+undersized+remapped,​ last acting [8,2,12]
 +    pg 2.177 is stuck undersized for 72491.415159,​ current state active+undersized+remapped,​ last acting [10,16,18]
 +    pg 3.168 is stuck undersized for 72490.394675,​ current state active+undersized+remapped,​ last acting [2,4,6]
 +    pg 3.169 is stuck undersized for 72490.399679,​ current state active+undersized+remapped,​ last acting [6,2,18]
 +    pg 3.16a is stuck undersized for 72490.381105,​ current state active+undersized+remapped,​ last acting [12,2,14]
 +    pg 3.16b is stuck undersized for 72490.386668,​ current state active+undersized+remapped,​ last acting [6,0,18]
 +    pg 3.16c is stuck undersized for 72490.396841,​ current state active+undersized+remapped,​ last acting [0,12,14]
 +    pg 3.16d is stuck undersized for 72490.393868,​ current state active+undersized+remapped,​ last acting [12,10,18]
 +    pg 3.16e is stuck undersized for 72490.396822,​ current state active+undersized+remapped,​ last acting [12,4,6]
 +    pg 3.16f is stuck undersized for 72490.393344,​ current state active+undersized+remapped,​ last acting [6,2,8]
 +    pg 3.170 is stuck undersized for 72490.391844,​ current state active+undersized+remapped,​ last acting [2,4,6]
 +    pg 3.171 is stuck undersized for 72490.398862,​ current state active+undersized+remapped,​ last acting [6,2,18]
 +    pg 3.172 is stuck undersized for 72490.393152,​ current state active+undersized+remapped,​ last acting [12,2,14]
 +    pg 3.173 is stuck undersized for 72490.398763,​ current state active+undersized+remapped,​ last acting [6,0,18]
 +    pg 3.174 is stuck undersized for 72490.397356,​ current state active+undersized+remapped,​ last acting [0,12,14]
 +    pg 3.175 is stuck undersized for 72490.398161,​ current state active+undersized+remapped,​ last acting [12,10,18]
 +    pg 3.176 is stuck undersized for 72490.393288,​ current state active+undersized+remapped,​ last acting [12,4,6]
 +    pg 3.177 is stuck undersized for 72490.397813,​ current state active+undersized+remapped,​ last acting [6,2,8]
 +</​code>​
 +
 +Official Documentation:​ [[http://​docs.ceph.com/​docs/​master/​rados/​operations/​health-checks/#​pg-degraded]]
 +
 +
 +
 +====== The solution ======
 +The problem is that the pool redundancy is higher than available, reducing pool redundancy will remove the alert.\\
 +THIS MUST BE STUDIED!!
 +
 +<code bash>
 +ceph@avmlp-osm-001 ~/​ceph-deploy $ sudo ceph osd pool ls detail
 +pool 1 '​.rgw.root'​ replicated size 4 min_size 2 crush_rule 1 object_hash rjenkins pg_num 500 pgp_num 500 autoscale_mode warn last_change 665 lfor 0/0/220 flags hashpspool stripe_width 0 application rgw
 +pool 2 '​default.rgw.control'​ replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 390 pgp_num 8 pg_num_target 60 pgp_num_target 60 pg_num_pending 389 autoscale_mode warn last_change 668 lfor 0/668/668 flags hashpspool stripe_width 0 application rgw
 +pool 3 '​default.rgw.meta'​ replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 390 pgp_num 8 pg_num_target 60 pgp_num_target 60 pg_num_pending 389 autoscale_mode warn last_change 668 lfor 0/668/668 flags hashpspool stripe_width 0 application rgw
 +pool 4 '​default.rgw.log'​ replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 60 pgp_num 8 pgp_num_target 60 autoscale_mode warn last_change 269 lfor 0/0/243 flags hashpspool stripe_width 0 application rgw
 +avmlp-osm-001 /home/ceph # ceph health
 +HEALTH_WARN Degraded data redundancy: 1329 pgs undersized
 +avmlp-osm-001 /home/ceph # for i in $(ceph osd pool ls) ; do  ceph osd pool set ${i} size 3 ; done                                                                                                                                                                                     
 +set pool 1 size to 3
 +set pool 2 size to 3
 +set pool 3 size to 3
 +set pool 4 size to 3
 +avmlp-osm-001 /home/ceph # ceph health
 +HEALTH_WARN Reduced data availability:​ 197 pgs peering; Degraded data redundancy: 196 pgs undersized
 +avmlp-osm-001 /home/ceph # ceph health
 +HEALTH_OK
 +avmlp-osm-001 /home/ceph # ceph osd pool ls detail
 +pool 1 '​.rgw.root'​ replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 500 pgp_num 500 autoscale_mode warn last_change 696 lfor 0/0/220 flags hashpspool stripe_width 0 application rgw
 +pool 2 '​default.rgw.control'​ replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 339 pgp_num 8 pg_num_target 60 pgp_num_target 60 pg_num_pending 338 autoscale_mode warn last_change 880 lfor 0/880/880 flags hashpspool stripe_width 0 application rgw
 +pool 3 '​default.rgw.meta'​ replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 339 pgp_num 8 pg_num_target 60 pgp_num_target 60 pg_num_pending 338 autoscale_mode warn last_change 880 lfor 0/880/880 flags hashpspool stripe_width 0 application rgw
 +pool 4 '​default.rgw.log'​ replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 60 pgp_num 8 pgp_num_target 60 autoscale_mode warn last_change 699 lfor 0/0/243 flags hashpspool stripe_width 0 application rgw
 +</​code>​
 +
 +
 +Maybe also ''​ceph osd repair''​ will work:
 +  * https://​pastebin.com/​NpNuJpnR
 +  * https://​forum.proxmox.com/​threads/​cephfs-filesystem-is-degraded.41342/​
  
ceph/troubleshooting/pg_degraded_redundancy.txt ยท Last modified: 2019/07/18 07:17 (external edit)