Share Replication: File-based locks don't provide concurrency control in multi-node/multi-AZ deployments

Bug #1585241 reported by Goutham Pacha Ravi
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Shared File Systems Service (Manila)
Triaged
Low
Goutham Pacha Ravi

Bug Description

Workflows in the share replication feature introduced in Mitaka are coordinated by the share manager with the help of locks deriving from oslo_concurrency. These locks are limited to file based locks. If deployers choose to have the manila-share service run across multiple controller nodes (multi-node deployment), local file locks are of no use.

https://github.com/openstack/manila/blob/27b2974/manila/share/manager.py#L119

Tags: races
monika (monikaparkar25)
Changed in manila:
assignee: nobody → monika (monikaparkar25)
Changed in manila:
assignee: monika (monikaparkar25) → Goutham Pacha Ravi (gouthamr)
status: New → In Progress
Changed in manila:
importance: Undecided → Medium
Changed in manila:
assignee: Goutham Pacha Ravi (gouthamr) → Tom Barron (tpb)
Changed in manila:
assignee: Tom Barron (tpb) → Goutham Pacha Ravi (gouthamr)
Revision history for this message
Goutham Pacha Ravi (gouthamr) wrote :

After much deliberation and multiple design summit discussions, we have the impression that this is not to be treated as a bug. We will introduce a mechanism to provide distributed locking management underneath manila. However, deployers today have an option of using file locks living on a shared file system across distributed services.

The tooz abstraction is proposed here: https://review.openstack.org/#/c/318336

It will not be back ported to the releases where this "bug" exists;

Please note, that tooz can also be used with no further configuration, i.e, default to using file locks as is the current behavior for the share replication feature to work as intended.

Changed in manila:
status: In Progress → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to manila (master)

Reviewed: https://review.openstack.org/318336
Committed: https://git.openstack.org/cgit/openstack/manila/commit/?id=02ab18c5dfa8d49631052d9951b27d57a5340968
Submitter: Jenkins
Branch: master

commit 02ab18c5dfa8d49631052d9951b27d57a5340968
Author: Goutham Pacha Ravi <email address hidden>
Date: Tue May 17 10:55:20 2016 -0400

    Tooz integration

    Manila currently uses file locks from oslo_concurrency to
    coordinate operations racing with each other to perform a
    particular action. In many situations, deployers may need a
    distributed lock to a local file lock (or even a file lock living on
    a shared file system). This need is accentuated if they were running
    Manila services in HA or if they were using Share Replication across
    AZs where manila-share services were running off different controllers
    that would not be able to share a common oslo_concurrency
    file lock or be protected against service/lock management failures.

    Integrate Tooz library with helper methods to create a locking
    coordinator and allow deployers to make the choice between file
    and distributed locks.

    Start the manila share service with Tooz backed coordination.

    Replace the locks used for Share Replication work-flows in the
    share manager to use Tooz based locks.

    Co-Authored-By: Goutham Pacha Ravi <email address hidden>
    Co-Authored-By: Szymon Wroblewski <email address hidden>
    Co-Authored-By: Tom Barron <email address hidden>

    Related-Bug: #1585241
    Partially-implements: bp distributed-locking-with-tooz
    Change-Id: I710e86bd42034fa3b93b87ff77fa48ada8661168

Tom Barron (tpb)
tags: added: races
Revision history for this message
Jason Grosso (jgrosso) wrote : Re: File-based locks don't provide concurrency over multi-node/multi-AZ deployments

Goutham any update on this defect?

Revision history for this message
Goutham Pacha Ravi (gouthamr) wrote :

Thanks for checking Jason. These file locks have still not been replaced by tooz based locks. There is going to be a general effort to replace all file locks used by the core share manager service from oslo_concurrency to tooz based locks that can be back-ended by a distributed lock management system (such as tooz+etcd or tooz+zookeeper). We're going to discuss this at the Denver PTG in April 2019 [1]. I'll take an AI to come back and update this bug report.

[1] https://etherpad.openstack.org/p/manila-denver-train-ptg-planning

Revision history for this message
Goutham Pacha Ravi (gouthamr) wrote :

This bug has been added to work items in Train. Please see this wiki for progress on these items, and if it will be acked/completed in the Train release: https://wiki.openstack.org/wiki/Manila/TrainCycle#Active-Active_Share_Service

Jason Grosso (jgrosso)
Changed in manila:
milestone: none → train-rc1
Revision history for this message
Jason Grosso (jgrosso) wrote :

Goutham is this pushed out to Ussuri?

Revision history for this message
Goutham Pacha Ravi (gouthamr) wrote :

Yes, Thanks for checking, Jason!

Changed in manila:
milestone: train-rc1 → ussuri-3
Changed in manila:
milestone: ussuri-3 → victoria-rc1
Revision history for this message
Vida Haririan (vhariria) wrote :
Changed in manila:
milestone: victoria-rc1 → wallaby-1
summary: - File-based locks don't provide concurrency over multi-node/multi-AZ
- deployments
+ Share Replication: File-based locks don't provide concurrency control in
+ multi-node/multi-AZ deployments
Changed in manila:
milestone: wallaby-1 → wallaby-3
Changed in manila:
milestone: wallaby-3 → none
Changed in manila:
importance: Medium → Low
milestone: none → zed-3
Changed in manila:
milestone: zed-3 → antelope-1
Changed in manila:
milestone: antelope-1 → antelope-2
Changed in manila:
milestone: antelope-2 → antelope-rc1
Changed in manila:
milestone: antelope-rc1 → bobcat-1
Revision history for this message
MichaelWik (michael-wik1122) wrote :
Changed in manila:
milestone: bobcat-1 → bobcat-2
Changed in manila:
milestone: bobcat-2 → bobcat-3
Changed in manila:
milestone: bobcat-3 → caracal-1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.