Replacing software RAID1 boot drives with UEFI

I replaced both of the drives in my RAID1 boot array on a Ubuntu server yesterday and had some struggles so I thought I would write down my experience both for my future self and any others who needs to do something similar. I think my main problem was following a guide that was not for UEFI setups 🙂

So, my setup is two NVMe drives on nvme0n1 and nvme1n1. Both drives have the same partition table:

  1. UEFI 512M
  2. md0 / boot 1G
  3. md1 / LVM <rest of disk>

Before starting, check that all disks are online:

# cat /proc/mdstat
 Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
 md1 : active raid1 nvme1n1p3[2] nvme0n1p3[3]
       x blocks super 1.2 [2/2] [UU]
       bitmap: 3/7 pages [12KB], xKB chunk
 md0 : active raid1 nvme0n1p2[3] nvme1n1p2[2]
       x blocks super 1.2 [2/2] [UU]

So it should say [UU] and not [U_] or something else.

In my case I also did smartctl -ia /dev/nvme0 to write down serial numbers etc to know which physical disk to replace in the box.

Next up is to save a copy of your EFI partition for later use:

dd if=/dev/nvme0n1p1 bs=4M of=/root/EFI_PARTITITON.img

Pick one drive to start with and make it offline:

mdadm /dev/md0 --fail /dev/nvme0n1p2 --remove nvme0n1p2
mdadm /dev/md1 --fail /dev/nvme0n1p3 --remove nvme0n1p3

Physically replace the drive in the server. It might also be a good idea to do a firmware upgrade of the new drive if it’s available.

After disk is replaced, copy over the partition table from the remaining good disk to the newly replaced disk, and then generate new UUIDs for the partitions:

sfdisk -d /dev/nvme1n1 | sfdisk /dev/nvme0n1
sgdisk -G /dev/nvme0n1

Add the device back to the RAID1 array:

mdadm --manage /dev/md0 --add /dev/nvme0n1p2
mdadm --manage /dev/md1 --add /dev/nvme0n1p3

Monitor the rebuild status with a command like watch cat /proc/mdstat

If the rebuild is slow/limited to like 200MB/s you can change the max speed to eg 1.6GB/s with:

echo 1600000 > /proc/sys/dev/raid/speed_limit_max

After the resync is done we need to fix EFI, start by copying back the partiton we backed up before:

dd if=/root/EFI_PARTITITON.img bs=4M of=/dev/nvme0n1p1

Fix grub:

update-grub
grub-install /dev/nvme0n1

And lastly reinstall the UEFI boot option (for ubuntu):

efibootmgr -v | grep ubuntu  # only shows one entry
efibootmgr --create --disk /dev/nvme0n1 --part 1 --label "ubuntu" --loader "\EFI\ubuntu\shimx64.efi"

You should after this have two ubuntu entries from efibootmgr, and that should be it! Try rebooting and make sure boot works from both drives. If you need to replace the other drive follow the same procedure as above but use eg nvme1n1.

If the new drive is bigger you can also grow the RAID1:

mdadm /dev/md1 --fail /dev/nvme0n1p3 --remove nvme0n1p3
parted /dev/nvme0n1

Use resizepart command in parted to change the size.

Add back to the array:

mdadm --manage /dev/md1 --add /dev/nvme0n1p3

Wait for resync (cat /proc/mdstat) and then grow the array:

mdadm --grow /dev/md1 --size=max

Then you need to resize the filesystem, in my case it’s LVM with ext filesystem on top so:

pvresize /dev/md1
lvextend -L +1G /dev/vg-md1/lv-root
resize2fs /dev/vg-md1/lv-root

Or just resize2fs /dev/md1 if it’s no LVM.

This entry was posted in hardware, linux, Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *