Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

To perform maintenance on Ceph that brings Ceph down, all VMs must be stopped beforehand and restarted after maintenance has been finished.

Prerequisite

Have the following command line tools available:

As an admin user, create an API key and store it to ADMIN_APIKEY environment variable. E.g. export ADMIN_APIKEY=x4g...

Store the API hostname to API_HOST environment variable. E.g. export API_HOST=api.pilw.io

Step by Step Guide

Mark all hypervisors as not accepting workloads so no new resources can be created.

...

Code Block
languagebash
cat running_vms.json | jq -r '.[]|.uuid' | \
xargs -n 1 -I {} curl -X POST -H "apikey: $ADMIN_APIKEY" https://$API_HOST/v1/user-resource/admin/vm/stop -d uuid={}

Open VMs panel in admin UI, change Filter by status to have only running and press the Reload button. Keep reloading every once in a while and see how the list gets shorter.

Not all VMs agree to stop. Windows is especially known for not stopping when requested. These VMs must be stopped forcefully.

Code Block
languagebash
# There is no harm in sending stop again to VMs that are already stopped. We can reuse the same list.
cat running_vms.json | jq -r '.[]|.uuid' | \
xargs -n 1 -I {} curl -X POST -H "apikey: $ADMIN_APIKEY" https://$API_HOST/v1/user-resource/admin/vm/stop -d uuid={} -d force=True

It is now safe to perform maintenance and bring Ceph offline.

...

Code Block
languagebash
curl -sS -X GET -H "apikey: $ADMIN_APIKEY" https://$API_HOST/v1/base-operator/host/list | \
jq -r '.[]|.uuid' | \
xargs -n 1 -I {} curl -X PUT -H "apikey: $ADMIN_APIKEY" https://$API_HOST/v1/base-operator/admin/host_flags -d uuid={} -d is_accepting_workloads=1

Finally, Now start all VMs that were running before.

Code Block
languagebash
cat running_vms.json | jq -r '.[]|.uuid' | \
xargs -n 1 -I {} curl -X POST -H "apikey: $ADMIN_APIKEY" https://$API_HOST/v1/user-resource/admin/vm/start -d uuid={}

This process will take some time. Some starts might fail, these need investigation, possibly by the VM owner.

Finally, check the current status of VMs in the running_vms.json file.

Code Block
languagebash
cat running_vms.json | jq -r '../start -d uuid={}[]|.uuid' | \
xargs -n 1 -I {} bash -c \
"curl -sS -X GET -H \"apikey: $ADMIN_APIKEY\" https://$API_HOST/v1/user-resource/admin/vm?uuid={} | jq -r '.uuid+\"\t\"+(.user_id|tostring)+\"\t\"+.status'"

The result has three columns:

  • VM UUID
  • User ID
  • VM status

Make note of all VMs that do not have status running. Either try to start the VM again, maybe manually from the UI while impersonating the user or send a notification to the user that their VM was unable to start, they should go and have a look. Virtual Console is useful for troubleshooting VM boot issues.