Table of Contents
PG (Placement groups) States
Documentation | |
---|---|
Name: | PG (Placement groups) States |
Description: | Information of PG's states |
Modification date : | 25/07/2019 |
Owner: | dodger |
Notify changes to: | Owner |
Tags: | ceph, object storage |
Scalate to: | Thefuckingbofh |
States
Creating
- Ceph is still creating the placement group.
Active
- Ceph will process requests to the placement group. Active Placement Groups will serve data.
Clean
- Ceph replicated all objects in the placement group the correct number of times.
- active+clean is the ideal PG state.
Down
- A replica with necessary data is down, so the placement group is offline.
- A PG with less than min_size replicas will be marked as down. Useceph health detail` to understand the backing OSD state.
Replay
The placement group is waiting for clients to replay operations after an OSD crashed.
Splitting
Ceph is splitting the placement group into multiple placement groups. (functional?)
Scrubbing
Ceph is checking the placement group for inconsistencies.
Degraded
Ceph has not replicated some objects in the placement group the correct number of times yet.
Inconsistent
Ceph detects inconsistencies in the one or more replicas of an object in the placement group (e.g. objects are the wrong size, objects are missing from one replica after recovery finished, etc.).
Peering (peering)
- The placement group is undergoing the peering process.
- A peering process should clear off without much delay, but if it stays and the number of PGs in a peering state does not reduce in number, the peering may be stuck.
- To understand why a PG is stuck in peering, query the placement group and check if it is waiting on any other OSDs. To query a PG, use:
# ceph pg <pg.id> query
If the PG is waiting on another OSD for the peering to finish, bringing up that OSD should solve this.
Repair
Ceph is checking the placement group and repairing any inconsistencies it finds (if possible).
Recovering
Ceph is migrating/synchronising objects and their replicas.
Backfill
Ceph is scanning and synchronising the entire contents of a placement group instead of inferring what contents need to be synchronised from the logs of recent operations. Backfill is a special case of recovery.
Wait-backfill
The placement group is waiting in line to start backfill.
Backfill-toofull (backfill_toofull)
- A backfill operation is waiting because the destination OSD is over its full ratio.
- Placement Groups which are in a backfilltoofull state will have the backing OSDs hitting the osdbackfillfullratio (0.85 by default).
- Any OSD hitting this threshold will prevent data backfilling from other OSDs to itself.
- NOTE: Any PGs hitting osdbackfillfullratio will still serve read and writes, and also rebalance. Only the backfill is blocked, to prevent the OSD hitting the fullratio faster.
- To understand the osdbackfillfull_ratio of the OSDs, use:
# ceph daemon /var/run/ceph/ceph-mon.*.asok config show | grep backfill_full_ratio
Incomplete
Ceph detects that a placement group is missing information about writes that may have occurred, or does not have any healthy copies. If any of the Placement Groups are in this state, try starting any failed OSDs that may contain the needed information or temporarily adjust min_size to allow recovery.
Remapped
The placement group is temporarily mapped to a different set of OSDs from what CRUSH specified.
Undersized
The placement group fewer copies than the configured pool replication level.
Peered
The placement group has peered but cannot serve client IO due to not having enough copies to reach the poolâs configured minsize parameter. Recovery may occur in this state, so the pg may heal up to minsize eventually.
IMPORTANT
A placement group can be in any of the above states and doesn't necessarily show a problem because it's not active + clean. It should ultimately reach an active + clean state automatically, but manual intervention may be needed sometime. Placement Groups in active+<some-state-other-than-clean> should serve data, since the PG is still active.
Usually, Ceph tries to fix/repair the Placement Group states and make it active + clean, but the PGs can end up in a stuck state in certain cases. The stuck states include:
Inactive
Placement groups in the Inactive state won't accept any I/O. They are usually waiting for an OSD with the most up-to-date data to come back up. In case the UP set and ACTING set are same, and the OSDs are not blocked on any other OSDs, this can be a problem with peering. Manually marking the primary OSD down will force the peering process to start since Ceph would bring the primary OSD back automatically up. The peering process is kickstarted once an OSD comes up.
Stale
The placement group is in an unknown state - because the OSDs that host them have not reported to the monitor cluster in a while (configured by monosdreport timeout).
Unclean
Placement groups contain objects that are not replicated the desired number of times. A very common reason for this would be OSDs that are down or OSDs with a 0 crush weight which prevents the PGs to replicate data onto the OSDs and thus achieve a clean state.
Snaptrim
Following are two more new PG states which were added in jewel release for snapshot trimming feature:
- snaptrim: The PGs are currently being trimmed
- snaptrim_wait: The PGs are waiting to be trimmed
Identifying stuck placement groups
To identify stuck placement groups, execute the following:
# ceph pg dump_stuck [unclean|inactive|stale|undersized|degraded]
Note: For more detail explanation of placement group states, please check monitoring_placement_group_states.