Upgrading from SMI-S to RESTAPI based driver fails for long hostnames
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Cinder |
New
|
Undecided
|
Unassigned |
Bug Description
Upgrading from SMI-S based driver to RESTAPI based driver Fails
Seamless upgrades from an SMI-S based driver to RESTAPI based driver, following the setup instructions above, are supported with a few exceptions:
1. Live migration functionality will not work on already attached/in-use legacy volumes. These volumes will first need to be detached and reattached using the RESTAPI based driver. This is because we have changed the masking view architecture from Pike to better support this functionality.
2. Consistency groups are deprecated in Pike. Generic Volume Groups are supported from Pike onwards.
Problem is with #1 - detaching the volume so that it can be attached again later into a new masking view with cascaded storage groups. I send the terminate_
2019-02-04 17:24:48.448 128227 WARNING cinder.
But this is not true. It is in a legacy masking view.
Our server hostnames are longer than 16 characters but shorter than the number of characters allowed in the SMI-S driver. For example, in the connector we may have 'host': 'csky-old-
This is 26 characters long.
tags: | added: dell drivers powermax vmax |
summary: |
- Upgrading from SMI-S based driver to RESTAPI based driver with long + Upgrading from SMI-S based driver to RESTAPI based driver fails for long hostnames |
summary: |
- Upgrading from SMI-S based driver to RESTAPI based driver fails for long - hostnames + Upgrading from SMI-S to RESTAPI based driver fails for long hostnames |
description: | updated |
Code from Queens: stable- queens/ cinder/ volume/ drivers/ dell_emc/ vmax/utils. py Line 253) unique_ trunc_host( self, host_name):
In File (cinder-
def generate_
"""Create a unique short host name under 16 characters.
:param host_name: long host name
:returns: truncated host name
"""
if host_name and len(host_name) > 16:
...
Similar code from Ocata (SMI-S) level: stable- ocata/cinder/ volume/ drivers/ dell_emc/ vmax/utils. py Line 2547)
In File (cinder-
def generate_ unique_ trunc_host( self, hostName):
"""Create a unique short host name under 40 chars
:param sgName: long storage group name
:returns: truncated storage group name
"""
if hostName and len(hostName) > 38:
....
It seems this is what causes the masking view not to be found. In fc->terminate_ connection( ), there is: stable- queens/ cinder/ volume/ drivers/ dell_emc/ vmax/fc. py Line 292)
In File (cinder-
if connector:
zoning_ mappings = self._get_ zoning_ mappings( volume, connector)
if zoning_mappings:
self. common. terminate_ connection( volume, connector) zones(zoning_ mappings)
data = self._cleanup_
return data
If no zoning mappings are not found, then it skips calling the terminate_ connection flow. So it responds like the detach worked, but it does not do anything. mappings( ) eventually gets to _get_masking_ views_from_ volume( ) to do the lookup. Since there is a shortened (mangled) 'host', this logic is happens for comparing hosts, but the host compare does not evaluate to true. The old masking veiw name had the full 26 character host name in it, and that does not compare to the now 16 char host name.
The reason is that the _get_zoning_
if host_compare:
if host.lower() in mv.lower():
maskingview_ list.append( mv)
This suggests the work around of passing an empty 'host' value in the connector. I tried this and the flow deadlocks with the following analysis:
common. _remove_ members( ) remove_ and_reset_ members( ) masking. _cleanup_ deletion( ) ->masking. remove_ volume_ from_sg( storagegroup_ name=OS- no_SLO- SG)
-->do_ remove_ volume_ from_sg( mv-sg) [lock on OS-no_SLO-SG]
-->masking. multiple_ vols_in_ sg()
-- >masking. add_volume_ to_default_ storage_ group(src_ sg=<dft- sg>) [move=true flow]
-->masking. get_or_ create_ default_ storage_ group()
-->_move_ vol_to_ default_ sg() [already there, deadlocks on OS-no_SLO-SG because that lock already held]
--> rest.move_ volume_ between_ storage_ groups( )
-->masking.
-->
Loop over storage groups because no 'host':
-
It tries to operate on the default storage group in a nested fashion causing the deadlock.
Therefore, it appears the driver will need to be fixed for the original case of passing the host on the connector so that the terminate flow is not skipped.