I replaced both of the drives in my RAID1 boot array on a Ubuntu server yesterday and had some struggles so I thought I would write down my experience both for my future self and any others who needs to do something similar. I think my main problem was following a guide that was not for UEFI setups 🙂
So, my setup is two NVMe drives on nvme0n1 and nvme1n1. Both drives have the same partition table:
- UEFI 512M
- md0 / boot 1G
- md1 / LVM <rest of disk>
Before starting, check that all disks are online:
# cat /proc/mdstat Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md1 : active raid1 nvme1n1p3[2] nvme0n1p3[3] x blocks super 1.2 [2/2] [UU] bitmap: 3/7 pages [12KB], xKB chunk md0 : active raid1 nvme0n1p2[3] nvme1n1p2[2] x blocks super 1.2 [2/2] [UU]
So it should say [UU] and not [U_] or something else.
In my case I also did smartctl -ia /dev/nvme0
to write down serial numbers etc to know which physical disk to replace in the box.
Next up is to save a copy of your EFI partition for later use:
dd if=/dev/nvme0n1p1 bs=4M of=/root/EFI_PARTITITON.img
Pick one drive to start with and make it offline:
mdadm /dev/md0 --fail /dev/nvme0n1p2 --remove nvme0n1p2 mdadm /dev/md1 --fail /dev/nvme0n1p3 --remove nvme0n1p3
Physically replace the drive in the server. It might also be a good idea to do a firmware upgrade of the new drive if it’s available.
After disk is replaced, copy over the partition table from the remaining good disk to the newly replaced disk, and then generate new UUIDs for the partitions:
sfdisk -d /dev/nvme1n1 | sfdisk /dev/nvme0n1
sgdisk -G /dev/nvme0n1
Add the device back to the RAID1 array:
mdadm --manage /dev/md0 --add /dev/nvme0n1p2 mdadm --manage /dev/md1 --add /dev/nvme0n1p3
Monitor the rebuild status with a command like watch cat /proc/mdstat
If the rebuild is slow/limited to like 200MB/s you can change the max speed to eg 1.6GB/s with:
echo 1600000 > /proc/sys/dev/raid/speed_limit_max
After the resync is done we need to fix EFI, start by copying back the partiton we backed up before:
dd if=/root/EFI_PARTITITON.img bs=4M of=/dev/nvme0n1p1
Fix grub:
update-grub
grub-install /dev/nvme0n1
And lastly reinstall the UEFI boot option (for ubuntu):
efibootmgr -v | grep ubuntu # only shows one entry efibootmgr --create --disk /dev/nvme0n1 --part 1 --label "ubuntu" --loader "\EFI\ubuntu\shimx64.efi"
You should after this have two ubuntu entries from efibootmgr, and that should be it! Try rebooting and make sure boot works from both drives. If you need to replace the other drive follow the same procedure as above but use eg nvme1n1.
If the new drive is bigger you can also grow the RAID1:
mdadm /dev/md1 --fail /dev/nvme0n1p3 --remove nvme0n1p3
parted /dev/nvme0n1
Use resizepart
command in parted
to change the size.
Add back to the array:
mdadm --manage /dev/md1 --add /dev/nvme0n1p3
Wait for resync (cat /proc/mdstat) and then grow the array:
mdadm --grow /dev/md1 --size=max
Then you need to resize the filesystem, in my case it’s LVM with ext filesystem on top so:
pvresize /dev/md1
lvextend -L +1G /dev/vg-md1/lv-root
resize2fs /dev/vg-md1/lv-root
Or just resize2fs /dev/md1 if it’s no LVM.