Cisco Openstack

Ringsync puppet failed during storage node installation in HA deploys

Bug #1218981 reported by Chip on 2013-08-30

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Cisco Openstack	Confirmed	Medium	Chip	Cisco Openstack i.1
	Grizzly	Fix Released	Medium	Chip	Cisco Openstack g.3

Bug Description

During installation of swift storage nodes a puppet failure occurs.

err: Could not retrieve catalog from remote server: Error 400 on Server
Exported resource Swift::Ringsync[account] cannot override local resource on node <node_name>

The current HA documentation suggests commenting out
Swift::Ringsync<<||>> in swift-storage.pp

Commenting this out is not a solution as it disables the ring sync causing start up failures on the storage nodes.

Manually copying the rings from the proxy server allows the nodes to operate properly.

Tags:

Chip (cbaesema) on 2013-08-30

Changed in openstack-cisco:
milestone:	none → g.2

Revision history for this message

Shweta P (shweta-ap05) wrote on 2013-08-30:

Observed on my setup too. Copying the rings and restarting all services worked in my setup as well.

Revision history for this message

Mark T. Voelker (mvoelker) wrote on 2013-08-30:

Both of these failures were seen on HA deployments, correct?

Revision history for this message

Chip (cbaesema) wrote on 2013-08-30:

Yes both of these failures where seen on a HA deployment.

Changing the order of launch proxy, storage or storage, proxy makes no difference.

Revision history for this message

Chris Ricker (chris-ricker) wrote on 2013-08-30:

This also occurs on non-HA. When we run puppet on the proxy, we get:

err: Could not retrieve catalog from remote server: Error 400 on SERVER: Exported resource Ring_object_device[172.29.75.141:6000/sdb] cannot override local resource on node ci-os-sw-proxy1.ctocllab.cisco.com

Revision history for this message

Chris Ricker (chris-ricker) wrote on 2013-08-30:

Ignore https://bugs.launchpad.net/openstack-cisco/+bug/1218981/comments/4 -- I had a mistake in site.pp

I don't get this issue on non-HA when I have a proper site.pp

Revision history for this message

Mark T. Voelker (mvoelker) wrote on 2013-09-03:

I can also verify that this doesn't occur in non-HA deployments.

Changed in openstack-cisco:
status:	New → Triaged
importance:	Undecided → Medium

Mark T. Voelker (mvoelker) on 2013-09-03

tags:

added: ha

Revision history for this message

Daneyon Hansen (danehans) wrote on 2013-09-03:

Download full text (4.9 KiB)

I built my HA environment from scratch last night, using the latest modules and manifests. My Swift nodes were deployed without issue:

From Proxy 1

root@swiftproxy01:~# swift-ring-builder /etc/swift/account.account.builder, build version 15
262144 partitions, 3.000000 replicas, 1 regions, 3 zones, 15 devices, The minimum number of hours before a partition can be reassigned is 1
Devices: id region zone ip address port name weight partitions 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 root@swiftproxy01:~# swift-ring-builder /etc/swift/container.container.builder, build version 15
262144 partitions, 3.000000 replicas, 1 regions, 3 zones, 15 devices, The minimum number of hours before a partition can be reassigned is 1
Devices: id region zone ip address port name weight partitions 0 1 2 3 4 5 6 7 8 9 10 11 12 />builder
0.00 balance
balance meta
1 3 192.168.222.73 6002 sde 1.00 52429 0.00
1 2 192.168.222.72 6002 sdd 1.00 52428 -0.00
1 3 192.168.222.73 6002 sdc 1.00 52429 0.00
1 2 192.168.222.72 6002 sdb 1.00 52429 0.00
1 3 192.168.222.73 6002 sdb 1.00 52429 0.00
1 1 192.168.222.71 6002 sdb 1.00 52429 0.00
1 1 192.168.222.71 6002 sdc 1.00 52429 0.00
1 2 192.168.222.72 6002 sdf 1.00 52429 0.00
1 1 192.168.222.71 6002 sdd 1.00 52429 0.00
1 2 192.168.222.72 6002 sdc 1.00 52429 0.00
1 1 192.168.222.71 6002 sde 1.00 52428 -0.00
1 1 192.168.222.71 6002 sdf 1.00 52429 0.00
1 3 192.168.222.73 6002 sdf 1.00 52429 0.00
1 2 192.168.222.72 6002 sde 1.00 52429 0.00
1 3 192.168.222.73 6002 sdd 1.00 52428 -0.00
/>builder
0.00 balance
balance meta
1 2 192.168.222.72 6001 sdb 1.00 52429 0.00
1 1 192.168.222.71 6001 sdc 1.00 52429 0.00
1 3 192.168.222.73 6001 sdf 1.00 52429 0.00
1 3 192.168.222.73 6001 sdb 1.00 52429 0.00
1 3 192.168.222.73 6001 sde 1.00 52429 0.00
1 1 192.168.222.71 6001 sdb 1.00 52429 0.00
1 2 192.168.222.72 6001 sdc 1.00 52429 0.00
1 1 192.168.222.71 6001 sdf 1.00 52429 0.00
1 2 192.168.222.72 6001 sdf 1.00 52429 0.00
1 1 192.168.222.71 6001 sdd 1.00 52429 0.00
1 1 192.168.222.71 6001 sde 1.00 52428 -0.00
1 2 192.168.222.72 6001 sde 1.00 52429 0.00
1 2 192.16...

I built my HA environment from scratch last night, using the latest modules and manifests.  My Swift nodes were deployed without issue:

From Proxy 1

root@swiftproxy01:~# swift-ring-builder /etc/swift/account.builder 
/etc/swift/account.builder, build version 15
262144 partitions, 3.000000 replicas, 1 regions, 3 zones, 15 devices, 0.00 balance
The minimum number of hours before a partition can be reassigned is 1
Devices:    id  region  zone      ip address  port      name weight partitions balance meta
             0     1     3  192.168.222.73  6002       sde   1.00      52429   0.00 
             1     1     2  192.168.222.72  6002       sdd   1.00      52428  -0.00 
             2     1     3  192.168.222.73  6002       sdc   1.00      52429   0.00 
             3     1     2  192.168.222.72  6002       sdb   1.00      52429   0.00 
             4     1     3  192.168.222.73  6002       sdb   1.00      52429   0.00 
             5     1     1  192.168.222.71  6002       sdb   1.00      52429   0.00 
             6     1     1  192.168.222.71  6002       sdc   1.00      52429   0.00 
             7     1     2  192.168.222.72  6002       sdf   1.00      52429   0.00 
             8     1     1  192.168.222.71  6002       sdd   1.00      52429   0.00 
             9     1     2  192.168.222.72  6002       sdc   1.00      52429   0.00 
            10     1     1  192.168.222.71  6002       sde   1.00      52428  -0.00 
            11     1     1  192.168.222.71  6002       sdf   1.00      52429   0.00 
            12     1     3  192.168.222.73  6002       sdf   1.00      52429   0.00 
            13     1     2  192.168.222.72  6002       sde   1.00      52429   0.00 
            14     1     3  192.168.222.73  6002       sdd   1.00      52428  -0.00 
root@swiftproxy01:~# swift-ring-builder /etc/swift/container.builder 
/etc/swift/container.builder, build version 15
262144 partitions, 3.000000 replicas, 1 regions, 3 zones, 15 devices, 0.00 balance
The minimum number of hours before a partition can be reassigned is 1
Devices:    id  region  zone      ip address  port      name weight partitions balance meta
             0     1     2  192.168.222.72  6001       sdb   1.00      52429   0.00 
             1     1     1  192.168.222.71  6001       sdc   1.00      52429   0.00 
             2     1     3  192.168.222.73  6001       sdf   1.00      52429   0.00 
             3     1     3  192.168.222.73  6001       sdb   1.00      52429   0.00 
             4     1     3  192.168.222.73  6001       sde   1.00      52429   0.00 
             5     1     1  192.168.222.71  6001       sdb   1.00      52429   0.00 
             6     1     2  192.168.222.72  6001       sdc   1.00      52429   0.00 
             7     1     1  192.168.222.71  6001       sdf   1.00      52429   0.00 
             8     1     2  192.168.222.72  6001       sdf   1.00      52429   0.00 
             9     1     1  192.168.222.71  6001       sdd   1.00      52429   0.00 
            10     1     1  192.168.222.71  6001       sde   1.00      52428  -0.00 
            11     1     2  192.168.222.72  6001       sde   1.00      52429   0.00 
            12     1     2  192.168.222.72  6001       sdd   1.00      52428  -0.00 
            13     1     3  192.168.222.73  6001       sdd   1.00      52429   0.00 
            14     1     3  192.168.222.73  6001       sdc   1.00      52428  -0.00

From Proxy 2

root@swiftproxy02:~# swift-ring-builder /etc/swift/account.builder 
/etc/swift/account.builder, build version 15
262144 partitions, 3.000000 replicas, 1 regions, 3 zones, 15 devices, 0.00 balance
The minimum number of hours before a partition can be reassigned is 1
Devices:    id  region  zone      ip address  port      name weight partitions balance meta
             0     1     3  192.168.222.73  6002       sde   1.00      52429   0.00 
             1     1     2  192.168.222.72  6002       sdd   1.00      52429   0.00 
             2     1     3  192.168.222.73  6002       sdc   1.00      52429   0.00 
             3     1     2  192.168.222.72  6002       sdb   1.00      52429   0.00 
             4     1     3  192.168.222.73  6002       sdb   1.00      52429   0.00 
             5     1     1  192.168.222.71  6002       sdb   1.00      52429   0.00 
             6     1     1  192.168.222.71  6002       sdc   1.00      52429   0.00 
             7     1     2  192.168.222.72  6002       sdf   1.00      52428  -0.00 
             8     1     1  192.168.222.71  6002       sdd   1.00      52428  -0.00 
             9     1     2  192.168.222.72  6002       sdc   1.00      52429   0.00 
            10     1     1  192.168.222.71  6002       sde   1.00      52429   0.00 
            11     1     1  192.168.222.71  6002       sdf   1.00      52429   0.00 
            12     1     3  192.168.222.73  6002       sdf   1.00      52428  -0.00 
            13     1     2  192.168.222.72  6002       sde   1.00      52429   0.00 
            14     1     3  192.168.222.73  6002       sdd   1.00      52429   0.00

Revision history for this message

Mark T. Voelker (mvoelker) wrote on 2013-09-03:

OK, so current status is that:
1.) We don't see this on non-HA
2.) Not everyone sees this on HA

And as a workaround in case you do run into this, you can copy the ring manually by doing something like:

scp /etc/swift/*.gz swift-storage01:/etc/swift/
scp /etc/swift/*.gz swift-storage02:/etc/swift/
scp /etc/swift/*.gz swift-storage03:/etc/swift/

ssh swift-storage01 "swift-init all restart"
ssh swift-storage02 "swift-init all restart"
ssh swift-storage03 "swift-init all restart"

Given all that, I'm going to retarget as I don't think this is a showstopper for today's scheduled release.

Changed in openstack-cisco:
milestone:	g.2 → g.3

Revision history for this message

Chip (cbaesema) wrote on 2013-09-12:

Pull https://github.com/CiscoSystems/puppet-openstack/pull/29 for testing before submission to upstream.

Revision history for this message

Daneyon Hansen (danehans) wrote on 2013-09-13:

#10

I think we are missing a dependency. I should be able to have a clean puppet run after the initial storage run and proxy run, but I get the following errors:

err: /Service[swift-container-replicator]/ensure: change from stopped to running failed: Could not start Service[swift-container-replicator]: Execution of '/sbin/start swift-container-replicator' returned 1: at /usr/share/puppet/modules/swift/manifests/storage/generic.pp:61

err: /Service[swift-object-replicator]/ensure: change from stopped to running failed: Could not start Service[swift-object-replicator]: Execution of '/sbin/start swift-object-replicator' returned 1: at /usr/share/puppet/modules/swift/manifests/storage/generic.pp:61

err: /Service[swift-account-replicator]/ensure: change from stopped to running failed: Could not start Service[swift-account-replicator]: Execution of '/sbin/start swift-account-replicator' returned 1: at /usr/share/puppet/modules/swift/manifests/storage/generic.pp:61

err: /Service[swift-container-sync]/ensure: change from stopped to running failed: Could not start Service[swift-container-sync]: Execution of '/sbin/start swift-container-sync' returned 1: at /usr/share/puppet/modules/swift/manifests/storage/container.pp:45

notice: /Stage[main]/Openstack::Swift::Storage-node/Swift::Ringsync[account]/Rsync::Get[/etc/swift/account.ring.gz]/Exec[rsync /etc/swift/account.ring.gz]/returns: executed successfully

notice: /Stage[main]/Openstack::Swift::Storage-node/Swift::Ringsync[object]/Rsync::Get[/etc/swift/object.ring.gz]/Exec[rsync /etc/swift/object.ring.gz]/returns: executed successfully

notice: /Stage[main]/Openstack::Swift::Storage-node/Swift::Ringsync[container]/Rsync::Get[/etc/swift/container.ring.gz]/Exec[rsync /etc/swift/container.ring.gz]/returns: executed successfully

After i perform a 2nd puppet run on the storage nodes, the errors clear-up.

Mark T. Voelker (mvoelker) on 2013-09-17

Changed in openstack-cisco:
assignee:	nobody → Chip (cbaesema)
status:	Triaged → In Progress

Revision history for this message

Chip (cbaesema) wrote on 2013-09-17:

#11

Upstream Bug: puppet-openstack 1224592

Revision history for this message

Chip (cbaesema) wrote on 2013-09-18:

#12

Upstream proposed patch 47085

Revision history for this message

Mark T. Voelker (mvoelker) wrote on 2013-10-04:

#13

Pulls:
https://github.com/CiscoSystems/puppet-openstack/pull/30
https://github.com/CiscoSystems/grizzly-manifests/pull/186

summary:	- Ringsync puppet failured during storage node installation + Ringsync puppet failured during storage node installation in HA deploys
summary:	- Ringsync puppet failured during storage node installation in HA deploys + Ringsync puppet failed during storage node installation in HA deploys

Mark T. Voelker (mvoelker) on 2013-10-10

Changed in openstack-cisco:
status:	In Progress → Fix Committed

Revision history for this message

Shweta P (shweta-ap05) wrote on 2014-08-12:

#14

I see this on i.1 full HA setup as well.

My workaround was to add this line in user.full_ha.yaml, where I set ring_server to the real IP of the first swift proxy server(not VIP)
openstack::swift::storage-node::ring_server: 172.29.XX.XXX

If this additional line was not added, the storage nodes would try to reach out to the VIP to copy over the common files, but we really need to be pointing to one of the swift proxy servers.

Once this was done subsequent puppet runs on the storage nodes was able to copy over the expected files.

Changed in openstack-cisco:
status:	Fix Released → Confirmed
milestone:	g.3 → i.1

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.