Cinder

Cleanup of backups fails for non-existing volumes

Series yoga
Bug #2016138

Bug #2016138 reported by Waldemar Reger on 2023-04-13

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	Cinder	In Progress	Undecided	Unassigned
	Yoga	Incomplete	Undecided	Unassigned

Bug Description

Using: OpenStack Yoga

Some of our volumes got backups in state "creating" for a unusual time and did not change. In that case we try to delete this backup by hand (openstack client). To do this, the backup state need to be in state "available" or "error". It is not possible to change the state for a backup in "creating" state with openstack client.
For that reason we restart the cinder-backup service on that host which is responsible for that backup.
What was expected: The cinder backup manager checks during its init phase backups related to that host for there state. If a backup is in state "creating"[1] a cleanup starts and set the state to "error". This behaviour works for backups with existing volumes.

Bug:
If the volume of that backup in state "creating" is deleted the cleanup as described above is throwing a VolumeNotFound exception.

INFO oslo_service.service [req-392f55c2-b353-41cd-a54e-865d64130f62 - - - - -] Starting 1 workers
NFO cinder.service [-] Starting cinder-backup node (version 20.1.0)
INFO cinder.backup.manager [req-1f53cac1-07c4-4ae4-a983-d4f6654b5618 - - - - -] Cleaning up incomplete backup operations.
INFO cinder.backup.manager [req-1f53cac1-07c4-4ae4-a983-d4f6654b5618 - - - - -] Resetting backup 20e6c872-b308-4635-b67b-d3f4aacc2c56 to error (was creating).
ERROR cinder.backup.manager [req-1f53cac1-07c4-4ae4-a983-d4f6654b5618 - - - - -] Problem cleaning up backup 20e6c872-b308-4635-b67b-d3f4aacc2c56.: cinder.exception.VolumeNotFound: Volume eb0c0145-db93-4258-b549-f1963e9cf205 could not be found.
cinder.backup.manager Traceback (most recent call last):
cinder.backup.manager File "/usr/lib/python3/dist-packages/cinder/backup/manager.py", line 202, in _cleanup_incomplete_backup_operations
cinder.backup.manager self._cleanup_one_backup(ctxt, backup)
ERROR cinder.backup.manager File "/usr/lib/python3/dist-packages/cinder/backup/manager.py", line 235, in _cleanup_one_backup
ERROR cinder.backup.manager volume = objects.Volume.get_by_id(ctxt, backup.volume_id)
ERROR cinder.backup.manager File "/usr/lib/python3/dist-packages/cinder/objects/base.py", line 339, in get_by_id
ERROR cinder.backup.manager orm_obj = db.get_by_id(context, cls.model, id, *args, **kwargs)
ERROR cinder.backup.manager File "/usr/lib/python3/dist-packages/cinder/db/api.py", line 109, in get_by_id
ERROR cinder.backup.manager return IMPL.get_by_id(context, model, id, *args, **kwargs)
ERROR cinder.backup.manager File "/usr/lib/python3/dist-packages/cinder/db/sqlalchemy/api.py", line 187, in wrapper
ERROR cinder.backup.manager return f(*args, **kwargs)
ERROR cinder.backup.manager File "/usr/lib/python3/dist-packages/cinder/db/sqlalchemy/api.py", line 427, in get_by_id
ERROR cinder.backup.manager return _GET_METHODS[model](context, id, *args, **kwargs)
ERROR cinder.backup.manager File "/usr/lib/python3/dist-packages/cinder/db/sqlalchemy/api.py", line 187, in wrapper
ERROR cinder.backup.manager return f(*args, **kwargs)
ERROR cinder.backup.manager File "/usr/lib/python3/dist-packages/cinder/db/sqlalchemy/api.py", line 2367, in volume_get
ERROR cinder.backup.manager return _volume_get(context, volume_id)
ERROR cinder.backup.manager File "/usr/lib/python3/dist-packages/cinder/db/sqlalchemy/api.py", line 187, in wrapper
ERROR cinder.backup.manager return f(*args, **kwargs)
ERROR cinder.backup.manager File "/usr/lib/python3/dist-packages/cinder/db/sqlalchemy/api.py", line 2126, in _volume_get
ERROR cinder.backup.manager raise exception.VolumeNotFound(volume_id=volume_id)
ERROR cinder.backup.manager cinder.exception.VolumeNotFound: Volume eb0c0145-db93-4258-b549-f1963e9cf205 could not be found.

---
[1]: https://github.com/openstack/cinder/blob/78fac2960d28e97f42454bb5f385b46fbae23a3c/cinder/backup/manager.py#L230

Tags:

Revision history for this message

Sofia Enriquez (lsofia-enriquez) wrote on 2023-04-19:

Hey Waldemar,

I hope you're doing great! 😊 I saw your bug report and it seems like you're dealing with two issues related to the backup service. Let me know if I got this right:

To help you out, I need some extra info from you:
Which Cinder backup driver are you using?

1) After restarting the service, do the backups stay in the creating state or change? Have you tried resetting the db backup status manually like this: `cinder backup-restore <backup-uuid> --state error`? Just a heads up, this will update the backup's state on the db, letting you remove it, but you'll lose the backup data.
For more on resetting backup state, check this out: https://docs.openstack.org/python-cinderclient/latest/cli/details.html#cinder-backup-reset-state

2) You want the backup service to detect failing backups and switch them to an error state:
Thing is, there should already be some code that does this. So, just let me know which backup driver you're using, and we'll figure it out together.

Looking forward to your reply!

Changed in cinder:
status:	New → Incomplete
tags:	added: backup-service yoga

Revision history for this message

Waldemar Reger (wreger) wrote on 2023-04-20:

Hi Sofia,

thanks for your answer.

We are using as Cinder backup driver: 'cinder.backup.drivers.ceph.CephBackupDriver'

1) I used the openstackclient to change the backup state. `openstack volume backup set --state error <backup-id>`
It should do the same!? But no, I did not use the cinder client directly as you mentioned. The state could not be changed with the command I used.

2) Yes, I tried to solve the problem by let the cinder-backup service change the state to error. But this one checks at first if a volume for that backup still exists. I found the code line for that here [1]. It seems that cinder needs an existing volume even the scope is to change a backup state.

So the request fails because of the VolumeNotFound exception thrown here [2].

Thanks for your help.

[1]: https://github.com/openstack/cinder/blob/78fac2960d28e97f42454bb5f385b46fbae23a3c/cinder/backup/manager.py#L230
[2]: https://github.com/openstack/cinder/blob/d4535c77493a7b362091b962f42f2613dea65dbe/cinder/db/sqlalchemy/api.py#L2126

Revision history for this message

Christian Rohmann (christian-rohmann) wrote on 2023-06-21:

Download full text (3.7 KiB)

We just had a list of backups left in the "creating" state even though cinder-backup was not working on them anymore (is there not check-loop to see if the individual backup process is still alive?).

I restarted cinder-backup to have it move all those backups to the "error" state on startup.
But some volumes were already deleted in the meantime, causing SQL error:

2023-06-21 08:24:35.353 1345602 INFO cinder.backup.manager [req-cfd983eb-a261-495f-bada-ad52ac04f866 - - - - -] Resetting backup fe1b42f6-49ee-4f24-9220-841c37c25928 to error (was creating).
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager [req-cfd983eb-a261-495f-bada-ad52ac04f866 - - - - -] Problem cleaning up backup fe1b42f6-49ee-4f24-9220-841c37c25928.: cinder.exception.VolumeNotFound: Volume f7b070c4-daea-4f0b-a8d9-95470f750e83 could not be found.
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager Traceback (most recent call last):
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager File "/usr/lib/python3/dist-packages/cinder/backup/manager.py", line 202, in _cleanup_incomplete_backup_operations
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager self._cleanup_one_backup(ctxt, backup)
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager File "/usr/lib/python3/dist-packages/cinder/backup/manager.py", line 235, in _cleanup_one_backup
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager volume = objects.Volume.get_by_id(ctxt, backup.volume_id)
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager File "/usr/lib/python3/dist-packages/cinder/objects/base.py", line 339, in get_by_id
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager orm_obj = db.get_by_id(context, cls.model, id, *args, **kwargs)
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager File "/usr/lib/python3/dist-packages/cinder/db/api.py", line 109, in get_by_id
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager return IMPL.get_by_id(context, model, id, *args, **kwargs)
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager File "/usr/lib/python3/dist-packages/cinder/db/sqlalchemy/api.py", line 187, in wrapper
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager return f(*args, **kwargs)
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager File "/usr/lib/python3/dist-packages/cinder/db/sqlalchemy/api.py", line 427, in get_by_id
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager return _GET_METHODS[model](context, id, *args, **kwargs)
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager File "/usr/lib/python3/dist-packages/cinder/db/sqlalchemy/api.py", line 187, in wrapper
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager return f(*args, **kwargs)
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager File "/usr/lib/python3/dist-packages/cinder/db/sqlalchemy/api.py", line 2367, in volume_get
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager return _volume_get(context, volume_id)
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager File "/usr/lib/python3/dist-packages/cinder/db/sqlalchemy/api.py", line 187, in wrapper
2023-06-2...

We just had a list of backups left in the "creating" state even though cinder-backup was not working on them anymore (is there not check-loop to see if the individual backup process is still alive?).

I restarted cinder-backup to have it move all those backups to the "error" state on startup.
But some volumes were already deleted in the meantime, causing SQL error:

2023-06-21 08:24:35.353 1345602 INFO cinder.backup.manager [req-cfd983eb-a261-495f-bada-ad52ac04f866 - - - - -] Resetting backup fe1b42f6-49ee-4f24-9220-841c37c25928 to error (was creating).
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager [req-cfd983eb-a261-495f-bada-ad52ac04f866 - - - - -] Problem cleaning up backup fe1b42f6-49ee-4f24-9220-841c37c25928.: cinder.exception.VolumeNotFound: Volume f7b070c4-daea-4f0b-a8d9-95470f750e83 could not be found.
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager Traceback (most recent call last):
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager   File "/usr/lib/python3/dist-packages/cinder/backup/manager.py", line 202, in _cleanup_incomplete_backup_operations
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager     self._cleanup_one_backup(ctxt, backup)
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager   File "/usr/lib/python3/dist-packages/cinder/backup/manager.py", line 235, in _cleanup_one_backup
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager     volume = objects.Volume.get_by_id(ctxt, backup.volume_id)
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager   File "/usr/lib/python3/dist-packages/cinder/objects/base.py", line 339, in get_by_id
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager     orm_obj = db.get_by_id(context, cls.model, id, *args, **kwargs)
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager   File "/usr/lib/python3/dist-packages/cinder/db/api.py", line 109, in get_by_id
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager     return IMPL.get_by_id(context, model, id, *args, **kwargs)
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager   File "/usr/lib/python3/dist-packages/cinder/db/sqlalchemy/api.py", line 187, in wrapper
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager     return f(*args, **kwargs)
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager   File "/usr/lib/python3/dist-packages/cinder/db/sqlalchemy/api.py", line 427, in get_by_id
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager     return _GET_METHODS[model](context, id, *args, **kwargs)
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager   File "/usr/lib/python3/dist-packages/cinder/db/sqlalchemy/api.py", line 187, in wrapper
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager     return f(*args, **kwargs)
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager   File "/usr/lib/python3/dist-packages/cinder/db/sqlalchemy/api.py", line 2367, in volume_get
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager     return _volume_get(context, volume_id)
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager   File "/usr/lib/python3/dist-packages/cinder/db/sqlalchemy/api.py", line 187, in wrapper
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager     return f(*args, **kwargs)
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager   File "/usr/lib/python3/dist-packages/cinder/db/sqlalchemy/api.py", line 2126, in _volume_get
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager     raise exception.VolumeNotFound(volume_id=volume_id)
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager cinder.exception.VolumeNotFound: Volume f7b070c4-daea-4f0b-a8d9-95470f750e83 could not be found.
2023-06-21 08:24:35.359 1345602 ERROR cinder.backup.manager

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2023-06-21: Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/886584

Changed in cinder:
status:	Incomplete → In Progress

Revision history for this message

Christian Rohmann (christian-rohmann) wrote on 2023-07-11:

This issue here (and therefore the proposed fix) is similar to https://launchpad.net/bugs/1996049 which is also about deleted volumes.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.