Recover VMs from a dead hypervisor

Follow these steps when a hypervisor has crashed and you need to recover VMs that were running on that hypervisor.

Kill it!

Before continuing, make sure the hypervisor is really dead and/or the VMs on that hypervisor will not boot automatically when the hypervisor is brought back online! Otherwise you may end up with some very ugly split-brain issues when identical VMs that share MAC addresses and storage are running at the same time on different hypervisors. There are some checks to prevent that, but it's always better to be safe.

Deactivate host

Use admin UI to set appropriate host status. If the hypervisor is not quite dead yet, use MAINTENANCE. If it's dead an unreachable, use OFFLINE. Status change dialog in admin UI explains in detail all the different host statuses. Here's just a quick overview.

Status

Description

Status

Description

AVAILABLE

This host is considered a valid targed for new and migrated VMs.

NOT ACCEPTING

No new VMs will be allocated to this host. Admin will still be able to explicitly migrate VMs to that host.

MAINTENANCE

All VMs on this host will be automatically migrated to other hosts.

OFFLINE

The host is considered to be unavailable, all VMs on this host will be recreated on other hosts. Potentially dangerous, read the warning above!

Use admin API and/or dashboard to monitor how host is being emptied.

Check action log for a specific host to see which VMs were affected, make sure to specify a reasonable since date to get only relevant events:

curl -X GET -H "apikey: <Admin_Apikey>" https://<api.host>/v1/base-operator/admin/action_log?host_uuid=<Host_UUID>&since=2020-05-27

Check for any unmigrated VMs and other errors in base-operator log, if possible.

Activate host

Once the host is brought back online, mark it as NOT ACCEPTING.

Now try to migrate some VM to the host to verify that the host is ready for accepting VMs. In Admin UI VM list, find the VM that you want to test with and migrate it to that host. You can select any host from target selection dropdown, even if it's in NOT ACCEPTING status.

Check the VM, try to log in, see if network is available etc.

Finally, mark the host as AVAILABLE.