【Case Sharing】ceph troubleshooting - Technical Guide - Antute (Beijing) Technology Limited

【Case Sharing】ceph troubleshooting

Fault description

Received a customer service request and arrived to find a ceph status alarm indicating that one of the pg statuses was reporting an error causing a slow service response involving osd.1:

[root@dfs01 ceph]# ceph -s

cluster:

id: 798fb87a-0d6c-4c20-8298-95074eb642fe

health: HEALTH_WARN

Reduced data availability: 1 pg inactive, 1 pg stale

Degraded data redundancy: 1 pg undersized

services:

mon: 5 daemons, quorum dfs01,dfs02,dfs03,dfs04,dfs05 (age 17h)

mgr: mgr1(active, since 17h), standbys: mgr2, mgr3

mds: ora_arch:1 {0=dfs02=up:active} 2 up:standby

osd: 6 osds: 6 up (since 14h), 6 in (since 14h)

data:

pools: 5 pools, 417 pgs

objects: 38 objects, 85 KiB

usage: 6.1 GiB used, 24 GiB / 30 GiB avail

pgs: 0.240% pgs not active

416 active+clean

1 stale+undersized+peered

progress:

Rebalancing after osd.1 marked in (14h)

[............................]

PG autoscaler decreasing pool 4 PGs from 128 to 32 (13h)

[............................]

PG autoscaler decreasing pool 5 PGs from 128 to 16 (13h)

[............................]

PG autoscaler decreasing pool 2 PGs from 128 to 32 (14h)

[............................]

Troubleshooting

2.1 Check osd status

The osd status is normal:

[root@dfs01 ceph]# ceph osd status

ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE

1 dfs01 1034M 4081M 0 0 0 0 exists,up

2 dfs02 1034M 4081M 0 0 0 0 exists,up

3 dfs03 1034M 4081M 0 0 0 0 exists,up

4 dfs04 1034M 4081M 0 0 0 0 exists,up

5 dfs05 1034M 4081M 0 0 0 0 exists,up

6 dfs06 1034M 4081M 0 0 0 0 exists,up

[root@dfs01 ceph]# ceph osd stat

6 osds: 6 up (since 14h), 6 in (since 14h); epoch: e76

[root@dfs01 ceph]# ceph osd tree

ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF

-1 0.02939 root default

-3 0.00490 host dfs01

1 hdd 0.00490 osd.1 up 1.00000 1.00000

-5 0.00490 host dfs02

2 hdd 0.00490 osd.2 up 1.00000 1.00000

-7 0.00490 host dfs03

3 hdd 0.00490 osd.3 up 1.00000 1.00000

-9 0.00490 host dfs04

4 hdd 0.00490 osd.4 up 1.00000 1.00000

-11 0.00490 host dfs05

5 hdd 0.00490 osd.5 up 1.00000 1.00000

-13 0.00490 host dfs06

6 hdd 0.00490 osd.6 up 1.00000 1.00000

2.2 Fault finding PG

[root@dfs01 ceph]# ceph health detail

HEALTH_WARN Reduced data availability: 1 pg inactive, 1 pg stale; Degraded data redundancy: 1 pg undersized

[WRN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive, 1 pg stale

pg 1.0 is stuck stale for 14h, current state stale+undersized+peered, last acting [0]

[WRN] PG_DEGRADED: Degraded data redundancy: 1 pg undersized

pg 1.0 is stuck undersized for 14h, current state stale+undersized+peered, last acting [0]

With the commands as above, it is found that there is a problem with pg1.0, query pg specific information:

[root@dfs01 ceph]# ceph pg 1.0 query

Error ENOENT: i don't have pgid 1.0

[root@dfs01 ceph]# ceph pg dump_stuck inactive

[root@dfs01 ceph]# ceph pg dump_stuck unclean

PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY

1.0 stale+undersized+peered [0]

2.3 Viewing Storage Pool Information

[root@dfs01 ceph]# ceph osd lspools

1 device_health_metrics

2 database_pool

4 fs_data

5 fs_metadata

[root@dfs01 ceph]# ceph osd pool ls detail

pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 12 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth

pool 2 'database_pool' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 pg_num_target 32 pgp_num_target 32 autoscale_mode on last_change 58 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd

pool 4 'fs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 pg_num_target 32 pgp_num_target 32 autoscale_mode on last_change 75 flags hashpspool stripe_width 0 application cephfs

pool 5 'fs_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 pg_num_target 16 pgp_num_target 16 autoscale_mode on last_change 76 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs

Troubleshooting

3.1 Query the affected storage pools

ceph pg ls-by-pool device_health_metrics|grep ^1.0

Through the above command query, this pg affects device_health_metrics storage pool, device_health_metrics is a non-critical core storage pool, so pg rebuild is performed.

3.2 Attempts to fix

ceph pg repair 1.0

3.3 Reconstruction of PG

ceph osd force-create-pg 1.0 --yes-i-really-mean-it

If the object is still missing after searching all possible locations, the missing object is discarded and the "not found" object is marked as "missing".

[root@dfs01 ceph]# ceph pg 1.0 query

{

"snap_trimq": "[]",

"snap_trimq_len": 0,

"state": "active+clean",

"epoch": 106,

"up": [

"acting": [

"acting_recovery_backfill": [

"2",

"3",

"4"

"info": {

"pgid": "1.0",

"last_update": "0'0",

"last_complete": "0'0",

"log_tail": "0'0",

"last_user_version": 0,

"last_backfill": "MAX",

"purged_snaps": [],

"history": {

"epoch_created": 77,

"epoch_pool_created": 77,

"last_epoch_started": 79,

"last_interval_started": 77,

"last_epoch_clean": 79,

"last_interval_clean": 77,

"last_epoch_split": 0,

"last_epoch_marked_full": 0,

"same_up_since": 77,

"same_interval_since": 77,

"same_primary_since": 77,

"last_scrub": "0'0",

"last_scrub_stamp": "2021-10-09T10:16:06.538634+0800",

"last_deep_scrub": "0'0",

"last_deep_scrub_stamp": "2021-10-09T10:16:06.538634+0800",

"last_clean_scrub_stamp": "2021-10-09T10:16:06.538634+0800",

"prior_readable_until_ub": 0

"stats": {

"version": "0'0",

"reported_seq": 37,

"reported_epoch": 106,

"state": "active+clean",

"last_fresh": "2021-10-09T10:16:48.108134+0800",

"last_change": "2021-10-09T10:16:08.944500+0800",

"last_active": "2021-10-09T10:16:48.108134+0800",

"last_peered": "2021-10-09T10:16:48.108134+0800",

"last_clean": "2021-10-09T10:16:48.108134+0800",

"last_became_active": "2021-10-09T10:16:08.943940+0800",

"last_became_peered": "2021-10-09T10:16:08.943940+0800",

"last_unstale": "2021-10-09T10:16:48.108134+0800",

"last_undegraded": "2021-10-09T10:16:48.108134+0800",

"last_fullsized": "2021-10-09T10:16:48.108134+0800",

"mapping_epoch": 77,

"log_start": "0'0",

"ondisk_log_start": "0'0",

"created": 77,

"last_epoch_clean": 79,

"parent": "0.0",

"parent_split_bits": 0,

"last_scrub": "0'0",

"last_scrub_stamp": "2021-10-09T10:16:06.538634+0800",

"last_deep_scrub": "0'0",

"last_deep_scrub_stamp": "2021-10-09T10:16:06.538634+0800",

"last_clean_scrub_stamp": "2021-10-09T10:16:06.538634+0800",

"log_size": 0,

"ondisk_log_size": 0,

"stats_invalid": false,

"dirty_stats_invalid": false,

"omap_stats_invalid": false,

"hitset_stats_invalid": false,

"hitset_bytes_stats_invalid": false,

"pin_stats_invalid": false,

"manifest_stats_invalid": false,

"snaptrimq_len": 0,

}

Lessons learned

4.1 Multi-copy storage pool

Create a 2/3 copy configuration for critical data storage.

4.2 Careful Operation

Replacement of hard disk should follow the process strictly, first mark and stop osd service,

then delete osd, and finally replace the hard disk.

4.3 Status Interpretation

Undersized: The current Acting Set of PG is less than the number of storage pool replicas;

Peer: Peering has been completed, but the current Acting Set size of PG is smaller than the minimum number

of replicas (min_size) specified by the storage pool.

For more information, please visit Antute's official website:www.antute.com.cn

太阳城官网

Operation & Maintenance Management

Hardware Maintenance

Software Maintenance

DC Migration

Implementation Service