Your current position:Home > Technical Guide| 【Case Sharing】ceph troubleshooting | |||||||||||||
|
Fault description Received a customer service request and arrived to find a ceph status alarm indicating that one of the pg statuses was reporting an error causing a slow service response involving osd.1:
Troubleshooting 2.1 Check osd status The osd status is normal:
2.2 Fault finding PG
With the commands as above, it is found that there is a problem with pg1.0, query pg specific information:
2.3 Viewing Storage Pool Information
Troubleshooting 3.1 Query the affected storage pools
Through the above command query, this pg affects device_health_metrics storage pool, device_health_metrics is a non-critical core storage pool, so pg rebuild is performed. 3.2 Attempts to fix
3.3 Reconstruction of PG
If the object is still missing after searching all possible locations, the missing object is discarded and the "not found" object is marked as "missing".
Lessons learned 4.1 Multi-copy storage pool Create a 2/3 copy configuration for critical data storage. 4.2 Careful Operation Replacement of hard disk should follow the process strictly, first mark and stop osd service, then delete osd, and finally replace the hard disk. 4.3 Status Interpretation Undersized: The current Acting Set of PG is less than the number of storage pool replicas; Peer: Peering has been completed, but the current Acting Set size of PG is smaller than the minimum number of replicas (min_size) specified by the storage pool. For more information, please visit Antute's official website:www.antute.com.cn |
Standardized Management Process
澳门银河娱乐城官网

