【ceph故障排查】ceph集群添加了一个osd之后，该osd的状态始终为down

zi663227 发表于 2019-2-2 06:13:24

背景
　　ceph集群添加了一个osd之后，该osd的状态始终为down。

错误提示
　　状态查看如下

1、查看osd tree
# ceph osd tree
ID WEIGHTTYPE NAME    UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.05388 root default
-2 0.01469 host node1
0 0.00490       osd.0    up1.00000       1.00000
1 0.00490       osd.1    up1.00000       1.00000
2 0.00490       osd.2    up1.00000       1.00000
-3 0.01959 host node2
4 0.00490       osd.4    up1.00000       1.00000
5 0.00490       osd.5    up1.00000       1.00000
6 0.00490       osd.6    up1.00000       1.00000
7 0.00490       osd.7    up1.00000       1.00000
-4 0.01959 host node3
8 0.00490       osd.8    up1.00000       1.00000
9 0.00490       osd.9    up1.00000       1.00000
3 0.00490       osd.3    up1.00000       1.00000
10 0.00490       osd.10    up1.00000       1.00000
11    0 osd.11          down    0       1.00000
#
2、查看osd状态
# systemctl status ceph-osd@11
● ceph-osd@11.service - Ceph object storage daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: disabled)
Active: failed (Result: start-limit) since Sun 2018-09-09 22:15:25 EDT; 4h 57min ago
Process: 10331 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=1/FAILURE)
Sep 09 22:15:05 node1 systemd: ceph-osd@11.service: control process exited, code=exited status=1
Sep 09 22:15:05 node1 systemd: Failed to start Ceph object storage daemon.
Sep 09 22:15:05 node1 systemd: Unit ceph-osd@11.service entered failed state.
Sep 09 22:15:05 node1 systemd: ceph-osd@11.service failed.
Sep 09 22:15:25 node1 systemd: ceph-osd@11.service holdoff time over, scheduling restart.
Sep 09 22:15:25 node1 systemd: start request repeated too quickly for ceph-osd@11.service
Sep 09 22:15:25 node1 systemd: Failed to start Ceph object storage daemon.
Sep 09 22:15:25 node1 systemd: Unit ceph-osd@11.service entered failed state.
Sep 09 22:15:25 node1 systemd: ceph-osd@11.service failed.
3、启动osd
# systemctl start ceph-osd@11
Job for ceph-osd@11.service failed because the control process exited with error code. See "systemctl status ceph-osd@11.service" and "journalctl -xe" for details.
4、查看错误
root@node1 /]# journalctl -xe
Sep 10 03:12:52 node1 polkitd: Unregistered Authentication Agent for unix-process:10473:4129481 (system bus name :1.52, object p
Sep 10 03:13:12 node1 systemd: ceph-osd@11.service holdoff time over, scheduling restart.
Sep 10 03:13:12 node1 systemd: Starting Ceph object storage daemon...
-- Subject: Unit ceph-osd@11.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ceph-osd@11.service has begun starting up.
Sep 10 03:13:12 node1 ceph-osd-prestart.sh: OSD data directory /var/lib/ceph/osd/ceph-11 does not exist; bailing out.
Sep 10 03:13:12 node1 systemd: ceph-osd@11.service: control process exited, code=exited status=1
Sep 10 03:13:12 node1 systemd: Failed to start Ceph object storage daemon.
-- Subject: Unit ceph-osd@11.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ceph-osd@11.service has failed.
--
-- The result is failed.
Sep 10 03:13:12 node1 systemd: Unit ceph-osd@11.service entered failed state.
Sep 10 03:13:12 node1 systemd: ceph-osd@11.service failed.

　　其实我也不知道上卖弄的错误是什么原因，但是根据我的记录，这个osd添加的时候，整个集群处于ERR的状态。

错误解决
　　添加osd时集群状态如下：

# ceph -s
cluster 8eaa3f15-0946-4500-b018-6d31d1cc69f6
health HEALTH_ERR
clock skew detected on mon.node2, mon.node3
121 pgs are stuck inactive for more than 300 seconds
121 pgs peering
121 pgs stuck inactive
121 pgs stuck unclean
Monitor clock skew detected
monmap e1: 3 mons at {node1=192.168.209.100:6789/0,node2=192.168.209.101:6789/0,node3=192.168.209.102:6789/0}
election epoch 266, quorum 0,1,2 node1,node2,node3
osdmap e5602: 12 osds: 11 up, 11 in; 120 remapped pgs
flags sortbitwise,require_jewel_osds
pgmap v16259: 128 pgs, 1 pools, 0 bytes data, 0 objects
1421 MB used, 54777 MB / 56198 MB avail
120 remapped+peering
7 active+clean
1 peering

　　当集群处于ERR的时候，添加osd是会有问题的。所以我决定重新添加一次osd（目前ceph集群的状态为ok）
　　删除步骤如下：

1、集群状态如下
# ceph -s
cluster 8eaa3f15-0946-4500-b018-6d31d1cc69f6
health HEALTH_OK
monmap e1: 3 mons at {node1=192.168.209.100:6789/0,node2=192.168.209.101:6789/0,node3=192.168.209.102:6789/0}
election epoch 292, quorum 0,1,2 node1,node2,node3
osdmap e5664: 12 osds: 12 up, 12 in
flags sortbitwise,require_jewel_osds
pgmap v16508: 128 pgs, 1 pools, 0 bytes data, 0 objects
1500 MB used, 59806 MB / 61307 MB avail
128 active+clean
2、将down的osd踢出ceph集群
# ceph osd out osd.11
osd.11 is already out.
3、将down的osd删除
# ceph osd rm osd.11
removed osd.11
4、将down的osd从CRUSH中删除
# ceph osd crush rm osd.11
device 'osd.11' does not appear in the crush map
5、删除osd的认证信息
# ceph auth del osd.11
updated
　　添加步骤如下：

1、擦除磁盘
# ceph-disk zap /dev/sde
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.
Warning! Main and backup partition tables differ! Use the 'c' and 'e' options
on the recovery & transformation menu to examine the two tables.
Warning! One or more CRCs don't match. You should repair the disk!
****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.
#
2、创建osd
# ceph-deploy --overwrite-conf osd create node1:/dev/sde
found configuration file at: /root/.cephdeploy.conf
Invoked (1.5.39): /usr/bin/ceph-deploy --overwrite-conf osd create node1:/dev/sde
ceph-deploy options:
username                   : None
block_db                   : None
disk                      : [('node1', '/dev/sde', None)]
dmcrypt                   : False
verbose                   : False
bluestore                   : None
block_wal                   : None
overwrite_conf             : True
subcommand                : create
dmcrypt_key_dir             : /etc/ceph/dmcrypt-keys
quiet                      : False
cd_conf                   :
cluster                   : ceph
fs_type                   : xfs
filestore                   : None
func                      :
ceph_conf                   : None
default_release             : False
zap_disk                   : False
Preparing cluster ceph disks node1:/dev/sde:
connected to host: node1
detect platform information from remote host
detect machine type
find the location of an executable
Distro info: CentOS Linux 7.5.1804 Core
Deploying osd to node1
write cluster configuration to /etc/ceph/{cluster}.conf
Preparing host node1 disk /dev/sde journal None activate True
find the location of an executable
Running command: /usr/sbin/ceph-disk -v prepare --cluster ceph --fs-type xfs -- /dev/sde
command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
command: Running command: /usr/bin/ceph-osd --check-allows-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph
command: Running command: /usr/bin/ceph-osd --check-wants-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph
command: Running command: /usr/bin/ceph-osd --check-needs-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph
get_dm_uuid: get_dm_uuid /dev/sde uuid path is /sys/dev/block/8:64/dm/uuid
set_type: Will colocate journal with data on /dev/sde
command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size
get_dm_uuid: get_dm_uuid /dev/sde uuid path is /sys/dev/block/8:64/dm/uuid
get_dm_uuid: get_dm_uuid /dev/sde uuid path is /sys/dev/block/8:64/dm/uuid
get_dm_uuid: get_dm_uuid /dev/sde uuid path is /sys/dev/block/8:64/dm/uuid
command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs
command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
get_dm_uuid: get_dm_uuid /dev/sde uuid path is /sys/dev/block/8:64/dm/uuid
get_dm_uuid: get_dm_uuid /dev/sde uuid path is /sys/dev/block/8:64/dm/uuid
ptype_tobe_for_name: name = journal
get_dm_uuid: get_dm_uuid /dev/sde uuid path is /sys/dev/block/8:64/dm/uuid
create_partition: Creating journal partition num 2 size 5120 on /dev/sde
command_check_call: Running command: /usr/sbin/sgdisk --new=2:0:+5120M --change-name=2:ceph journal --partition-guid=2:22ea9667-570d-4697-b9dc-21968d31c445 --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sde
The operation has completed successfully.
update_partition: Calling partprobe on created device /dev/sde
command_check_call: Running command: /usr/bin/udevadm settle --timeout=600
command: Running command: /usr/bin/flock -s /dev/sde /usr/sbin/partprobe /dev/sde
command_check_call: Running command: /usr/bin/udevadm settle --timeout=600
get_dm_uuid: get_dm_uuid /dev/sde uuid path is /sys/dev/block/8:64/dm/uuid
get_dm_uuid: get_dm_uuid /dev/sde uuid path is /sys/dev/block/8:64/dm/uuid
get_dm_uuid: get_dm_uuid /dev/sde2 uuid path is /sys/dev/block/8:66/dm/uuid
prepare_device: Journal is GPT partition /dev/disk/by-partuuid/22ea9667-570d-4697-b9dc-21968d31c445
prepare_device: Journal is GPT partition /dev/disk/by-partuuid/22ea9667-570d-4697-b9dc-21968d31c445
get_dm_uuid: get_dm_uuid /dev/sde uuid path is /sys/dev/block/8:64/dm/uuid
set_data_partition: Creating osd partition on /dev/sde
get_dm_uuid: get_dm_uuid /dev/sde uuid path is /sys/dev/block/8:64/dm/uuid
ptype_tobe_for_name: name = data
get_dm_uuid: get_dm_uuid /dev/sde uuid path is /sys/dev/block/8:64/dm/uuid
create_partition: Creating data partition num 1 size 0 on /dev/sde
command_check_call: Running command: /usr/sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:e9aecd36-93a6-456a-b05f-b8097d16d88d --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be --mbrtogpt -- /dev/sde
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
update_partition: Calling partprobe on created device /dev/sde
command_check_call: Running command: /usr/bin/udevadm settle --timeout=600
command: Running command: /usr/bin/flock -s /dev/sde /usr/sbin/partprobe /dev/sde
command_check_call: Running command: /usr/bin/udevadm settle --timeout=600
get_dm_uuid: get_dm_uuid /dev/sde uuid path is /sys/dev/block/8:64/dm/uuid
get_dm_uuid: get_dm_uuid /dev/sde uuid path is /sys/dev/block/8:64/dm/uuid
get_dm_uuid: get_dm_uuid /dev/sde1 uuid path is /sys/dev/block/8:65/dm/uuid
populate_data_path_device: Creating xfs fs on /dev/sde1
command_check_call: Running command: /usr/sbin/mkfs -t xfs -f -i size=2048 -- /dev/sde1
meta-data=/dev/sde1          isize=2048 agcount=4, agsize=327615 blks
      =                   sectsz=512 attr=2, projid32bit=1
      =                   crc=1    finobt=0, sparse=0
data =                   bsize=4096 blocks=1310459, imaxpct=25
      =                   sunit=0    swidth=0 blks
naming =version 2          bsize=4096 ascii-ci=0 ftype=1
log    =internal log       bsize=4096 blocks=2560, version=2
      =                   sectsz=512 sunit=0 blks, lazy-count=1
realtime =none                extsz=4096 blocks=0, rtextents=0
mount: Mounting /dev/sde1 on /var/lib/ceph/tmp/mnt.5St2Fg with options noatime,inode64
command_check_call: Running command: /usr/bin/mount -t xfs -o noatime,inode64 -- /dev/sde1 /var/lib/ceph/tmp/mnt.5St2Fg
command: Running command: /usr/sbin/restorecon /var/lib/ceph/tmp/mnt.5St2Fg
populate_data_path: Preparing osd data dir /var/lib/ceph/tmp/mnt.5St2Fg
command: Running command: /usr/sbin/restorecon -R /var/lib/ceph/tmp/mnt.5St2Fg/ceph_fsid.10803.tmp
command: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.5St2Fg/ceph_fsid.10803.tmp
command: Running command: /usr/sbin/restorecon -R /var/lib/ceph/tmp/mnt.5St2Fg/fsid.10803.tmp
command: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.5St2Fg/fsid.10803.tmp
command: Running command: /usr/sbin/restorecon -R /var/lib/ceph/tmp/mnt.5St2Fg/magic.10803.tmp
command: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.5St2Fg/magic.10803.tmp
command: Running command: /usr/sbin/restorecon -R /var/lib/ceph/tmp/mnt.5St2Fg/journal_uuid.10803.tmp
command: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.5St2Fg/journal_uuid.10803.tmp
adjust_symlink: Creating symlink /var/lib/ceph/tmp/mnt.5St2Fg/journal -> /dev/disk/by-partuuid/22ea9667-570d-4697-b9dc-21968d31c445
command: Running command: /usr/sbin/restorecon -R /var/lib/ceph/tmp/mnt.5St2Fg
command: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.5St2Fg
unmount: Unmounting /var/lib/ceph/tmp/mnt.5St2Fg
command_check_call: Running command: /bin/umount -- /var/lib/ceph/tmp/mnt.5St2Fg
get_dm_uuid: get_dm_uuid /dev/sde uuid path is /sys/dev/block/8:64/dm/uuid
command_check_call: Running command: /usr/sbin/sgdisk --typecode=1:4fbd7e29-9d25-41b8-afd0-062c0ceff05d -- /dev/sde
The operation has completed successfully.
update_partition: Calling partprobe on prepared device /dev/sde
command_check_call: Running command: /usr/bin/udevadm settle --timeout=600
command: Running command: /usr/bin/flock -s /dev/sde /usr/sbin/partprobe /dev/sde
command_check_call: Running command: /usr/bin/udevadm settle --timeout=600
command_check_call: Running command: /usr/bin/udevadm trigger --action=add --sysname-match sde1
Running command: systemctl enable ceph.target
checking OSD status...
find the location of an executable
Running command: /bin/ceph --cluster=ceph osd stat --format=json
there is 1 OSD down
there is 1 OSD out
Host node1 is now ready for osd use.
3、查看osd
# ceph osd tree
ID WEIGHTTYPE NAME    UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.05878 root default
-2 0.01959 host node1
0 0.00490       osd.0    up1.00000       1.00000
1 0.00490       osd.1    up1.00000       1.00000
2 0.00490       osd.2    up1.00000       1.00000
11 0.00490       osd.11    up1.00000       1.00000
-3 0.01959 host node2
4 0.00490       osd.4    up1.00000       1.00000
5 0.00490       osd.5    up1.00000       1.00000
6 0.00490       osd.6    up1.00000       1.00000
7 0.00490       osd.7    up1.00000       1.00000
-4 0.01959 host node3
8 0.00490       osd.8    up1.00000       1.00000
9 0.00490       osd.9    up1.00000       1.00000
3 0.00490       osd.3    up1.00000       1.00000
10 0.00490       osd.10    up1.00000       1.00000
4、查看osd状态
# systemctl status ceph-osd@11
● ceph-osd@11.service - Ceph object storage daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: disabled)
Active: active (running) since Mon 2018-09-10 03:20:37 EDT; 20min ago
Main PID: 11379 (ceph-osd)
CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@11.service
└─11379 /usr/bin/ceph-osd -f --cluster ceph --id 11 --setuser ceph --setgroup ceph
Sep 10 03:20:36 node1 systemd: ceph-osd@11.service holdoff time over, scheduling restart.
Sep 10 03:20:36 node1 systemd: Starting Ceph object storage daemon...
Sep 10 03:20:37 node1 ceph-osd-prestart.sh: create-or-move updating item name 'osd.11' weight 0.0049 at location {hos...sh map
Sep 10 03:20:37 node1 systemd: Started Ceph object storage daemon.
Sep 10 03:20:38 node1 ceph-osd: starting osd.11 at :/0 osd_data /var/lib/ceph/osd/ceph-11 /var/lib/ceph/osd/ceph-11/journal
Sep 10 03:21:13 node1 ceph-osd: 2018-09-10 03:21:13.399072 7f09b5797ac0 -1 osd.11 0 log_to_monitors {default=true}
Hint: Some lines were ellipsized, use -l to show in full.
#

　　在添加osd的中途，集群会短暂出现ERR的状态。

# ceph -s
cluster 8eaa3f15-0946-4500-b018-6d31d1cc69f6
health HEALTH_ERR
11 pgs are stuck inactive for more than 300 seconds
13 pgs peering
11 pgs stuck inactive
11 pgs stuck unclean
monmap e1: 3 mons at {node1=192.168.209.100:6789/0,node2=192.168.209.101:6789/0,node3=192.168.209.102:6789/0}
election epoch 292, quorum 0,1,2 node1,node2,node3
osdmap e5664: 12 osds: 12 up, 12 in
flags sortbitwise,require_jewel_osds
pgmap v16499: 128 pgs, 1 pools, 0 bytes data, 0 objects
1519 MB used, 59788 MB / 61307 MB avail
98 active+clean
17 activating
13 peering

页: [1]

运维网's Archiver

【ceph故障排查】ceph集群添加了一个osd之后，该osd的状态始终为down