Default token-expiration of 1 hour causes live-migration failure states

Bug #1856876 reported by Drew Freiberger
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Keystone Charm
Triaged
High
Unassigned

Bug Description

When performing live host evacuations/live-migrations, it has been found that if a migration takes longer than the token timeout period, the migration will complete, but the metadata updates such as network-vif-plugged (nova-compute talking to neutron-api upon completion of libvirt live-migration) on the destination host fail due to the token being expired.

I believe that there are three main concerns surrounding token expiration:

1. time limitation reduces exposure for the platform for compromised tokens or compromised workstations. (think horizon auto-logouts, etc)

2. the volume of managed tokens to store in the database is directly related to the amount of time the tokens are valid and the regularity of the keystone_manage token expiration cron cycles, the volume of which can affect keystone response rates for authentication.

3. HR action response (leaving cloud user can potentially still auth if they have a valid token, if all that is done is locking/disabling the user's password)

On a cloud I was recently migrating workloads on, a 2TB vm took roughly 5.5 hours to migrate. This is a common database size that may be landed on local ephemeral storage and need to be maintained when performing maintenance on a hypervisor by live-migration to another hypervisor. With an average environment hosting upwards of 4-10TB of ephemeral storage per node, evacuation of a node using a single authentication token (as in the case of nova host-evacuate-live <hypervisorname>) would require upwards of 24 hours to migrate.

I'd like to suggest for day 2 operations of the cloud that we consider raising the default token-expiration to 1 day from the current 1 hour setting.

Revision history for this message
Drew Freiberger (afreiberger) wrote :

This bug may be related to my long-running migration issues: https://bugs.launchpad.net/nova/+bug/1657774

Revision history for this message
James Page (james-page) wrote :

To quote:

"The amount of time that a token should remain valid (in seconds). Drastically
reducing this value may break "long-running" operations that involve multiple
services to coordinate together, and will force users to authenticate with
keystone more frequently. Drastically increasing this value will increase the
number of tokens that will be simultaneously valid. Keystone tokens are also
bearer tokens, so a shorter duration will also reduce the potential security
impact of a compromised token."

Revision history for this message
Canonical Solutions QA Bot (oil-ci-bot) wrote :

This bug is fixed with commit 1d8b2738 to fce-templates on branch fcb/intermediate/master.
To view that commit see the following URL:
https://git.launchpad.net/fce-templates/commit/?id=1d8b2738

James Page (james-page)
Changed in charm-keystone:
status: New → Triaged
importance: Undecided → High
milestone: none → 20.05
milestone: 20.05 → 20.02
Liam Young (gnuoy)
Changed in charm-keystone:
milestone: 20.02 → 20.05
David Ames (thedac)
Changed in charm-keystone:
milestone: 20.05 → 20.08
James Page (james-page)
Changed in charm-keystone:
milestone: 20.08 → none
Revision history for this message
Alvaro Uria (aluria) wrote :

Since bumping the default token-expiration time is mentioned in the how-to about live migration operations (see [1]), should this bug be marked as "Won't fix" due to the security concerns that the default value of 1d would involve?

1. https://docs.openstack.org/charm-guide/latest/admin/ops-live-migrate-vms.html#avoid-expired-keystone-tokens

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.