[library] Rabbit error: Generic server <0.4361.0> terminating

Bug #1346163 reported by Anastasia Palkina
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
Medium
Fuel Library (Deprecated)
5.0.x
Won't Fix
Medium
Fuel Library (Deprecated)

Bug Description

"build_id": "2014-07-17_00-31-14",
"mirantis": "yes",
"build_number": "134",
"ostf_sha": "09b6bccf7d476771ac859bb3c76c9ebec9da9e1f",
"nailgun_sha": "1d08d6f80b6514085dd8c0af4d437ef5d37e2802",
"production": "docker",
"api": "1.0",
"fuelmain_sha": "069686abb90f458f67cfcb4018cacc19971e4b4d",
"astute_sha": "9a74b788be9a7c5682f1c52a892df36e4766ce3f",
"release": "5.0.1",
"fuellib_sha": "2d1e1369c13bc9771e9473086cb064d257a21fc2"

1. Create new environment (CentOS, HA mode)
2. Choose GRE segmentation
3. Choose Ceph for volumes
4. Add 3 controllers+ceph, compute
5. Start deployment
6. Rabbitmq died

There are errors in /var/log/docker-logs/rabbitmq/rabbit\@7fe1ee63962c.log

=ERROR REPORT==== 18-Jul-2014::14:50:52 ===
** Generic server <0.4361.0> terminating
** Last message in was {'$gen_cast',client_timeout}
** When Server state == {state,"session-qtE5Tt2l41drxk3bOuaTXg",<0.4371.0>,
                         <0.4365.0>,
                         {dict,12,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          {{[[<<"T_mcollective_broadcast_discovery">>|
                              {subscription,
                               "/exchange/mcollective_broadcast/discovery",
                               <0.4371.0>,auto,false,
                               "id='mcollective_broadcast_discovery'"}],
                             [<<"T_mcollective_broadcast_uploadfile">>|
                              {subscription,
                               "/exchange/mcollective_broadcast/uploadfile",
                               <0.4371.0>,auto,false,
                               "id='mcollective_broadcast_uploadfile'"}]],
                            [],
                            [[<<"T_mcollective_broadcast_erase_node">>|
                              {subscription,
                               "/exchange/mcollective_broadcast/erase_node",
                               <0.4371.0>,auto,false,
                               "id='mcollective_broadcast_erase_node'"}],
                             [<<"T_mcollective_broadcast_puppetsync">>|
                              {subscription,
                               "/exchange/mcollective_broadcast/puppetsync",
                               <0.4371.0>,auto,false,
                               "id='mcollective_broadcast_puppetsync'"}],
                             [<<"T_mcollective_broadcast_execute_shell_command">>|
                              {subscription,
                               "/exchange/mcollective_broadcast/execute_shell_command",
                               <0.4371.0>,auto,false,
                               "id='mcollective_broadcast_execute_shell_command'"}]],
                            [],[],[],
                            [[<<"T_mcollective_broadcast_fake">>|
                              {subscription,
                               "/exchange/mcollective_broadcast/fake",
                               <0.4371.0>,auto,false,
                               "id='mcollective_broadcast_fake'"}]],
                            [[<<"T_mcollective_12_directed_to_identity">>|
                              {subscription,
                               "/exchange/mcollective_directed/12",<0.4371.0>,
                               auto,false,
                               "id='mcollective_12_directed_to_identity'"}]],
                            [[<<"T_mcollective_broadcast_systemtype">>|
                              {subscription,
                               "/exchange/mcollective_broadcast/systemtype",
                               <0.4371.0>,auto,false,
                               "id='mcollective_broadcast_systemtype'"}]],
                            [[<<"T_mcollective_broadcast_rpcutil">>|
                              {subscription,
                               "/exchange/mcollective_broadcast/rpcutil",
                               <0.4371.0>,auto,false,
                               "id='mcollective_broadcast_rpcutil'"}],
                             [<<"T_mcollective_broadcast_mcollective">>|
                              {subscription,
                               "/exchange/mcollective_broadcast/mcollective",
                               <0.4371.0>,auto,false,
                               "id='mcollective_broadcast_mcollective'"}]],
                            [[<<"T_mcollective_broadcast_puppetd">>|
                              {subscription,
                               "/exchange/mcollective_broadcast/puppetd",
                               <0.4371.0>,auto,false,
                               "id='mcollective_broadcast_puppetd'"}]],
                            [],[],[],
                            [[<<"T_mcollective_broadcast_net_probe">>|
                              {subscription,
                               "/exchange/mcollective_broadcast/net_probe",
                               <0.4371.0>,auto,false,
                               "id='mcollective_broadcast_net_probe'"}]],
                            []}}},
                         "1.1",#Fun<rabbit_stomp_reader.1.84385891>,undefined,
                         {stomp_configuration,"guest","guest",false,false},
                         {set,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                         {dict,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                         #Fun<rabbit_stomp_processor.5.11954007>,
                         {amqp_adapter_info,
                          {0,0,0,0,0,65535,44049,3},
                          61613,
                          {0,0,0,0,0,65535,44049,10753},
                          47328,<<"172.17.42.1:47328 -> 172.17.0.3:61613">>,
                          {'STOMP',0},
                          [{ssl,false}]},
                         #Fun<rabbit_stomp_reader.0.123360627>,none}
** Reason for termination ==
** client_timeout

Tags: library
Revision history for this message
Anastasia Palkina (apalkina) wrote :
Changed in fuel:
milestone: 5.0.2 → 5.0.1
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Is reported issue related to Openstack nodes' rabbitmq or Fuel master node one (which one's log you quoted in the ticket)?

Changed in fuel:
importance: High → Medium
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

it is master node, obviously

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

There are also Ceph errors in logs:
2014-07-18T14:48:50.820762 node-11 ./node-11.domain.tld/puppet-apply.log:2014-07-18T14:48:50.820762+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-11][ERROR ] INFO:ceph-disk:re-reading known partitions will display errors
2014-07-18T14:48:52.446200 node-10 ./node-10.domain.tld/puppet-apply.log:2014-07-18T14:48:52.446200+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-10][ERROR ] INFO:ceph-disk:re-reading known partitions will display errors
2014-07-18T14:48:52.488495 node-10 ./node-10.domain.tld/puppet-apply.log:2014-07-18T14:48:52.488495+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-10][ERROR ] INFO:ceph-disk:re-reading known partitions will display errors
2014-07-18T14:49:00.054663 node-9 ./node-9.domain.tld/puppet-apply.log:2014-07-18T14:49:00.054663+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-9][ERROR ] INFO:ceph-disk:re-reading known partitions will display errors
2014-07-18T14:49:00.108285 node-9 ./node-9.domain.tld/puppet-apply.log:2014-07-18T14:49:00.108285+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-9][ERROR ] INFO:ceph-disk:re-reading known partitions will display errors
2014-07-18T14:49:17.899223 node-9 ./node-9.domain.tld/puppet-apply.log:2014-07-18T14:49:17.899223+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd activate]/returns) cmd=cmd, ret=ret, out=out, err=err)
2014-07-18T14:49:17.990683 node-9 ./node-9.domain.tld/puppet-apply.log:2014-07-18T14:49:17.990683+00:00 err: ceph-deploy osd activate node-9:/dev/sdb4 node-9:/dev/sdc4 returned 1 instead of one of [0]

Dmitry Ilyin (idv1985)
summary: - Rabbit error: Generic server <0.4361.0> terminating
+ [library] Rabbit error: Generic server <0.4361.0> terminating
tags: added: library
Changed in fuel:
milestone: 5.0.1 → 5.0.2
Revision history for this message
Anastasia Palkina (apalkina) wrote :
Download full text (6.0 KiB)

Reproduced on ISO #366
"build_id": "2014-07-28_02-01-14",
"ostf_sha": "8c328521b1444f22c50463b9432193e20ed33813",
"build_number": "366",
"auth_required": true,
"api": "1.0",
"nailgun_sha": "83cc9ed44ebc8dd97248483b6d414ebbc4cff3c0",
"production": "docker",
"fuelmain_sha": "9adfbf5a52cedbdd16ec1a74f6c44c5b3419b87c",
"astute_sha": "aa5aed61035a8dc4035ab1619a8bb540a7430a95",
"feature_groups": ["mirantis"],
"release": "5.1",
"fuellib_sha": "d1c7f67b3cf51978d3178c8666ea398f2477dcb5"

At the same time I started 3 deployments: 2 HA and 1 simple. Simple was successful. But HA deployments hangs because error with rabbitmq.

=ERROR REPORT==== 28-Jul-2014::13:50:28 ===
** Generic server <0.3263.0> terminating
** Last message in was {'$gen_cast',client_timeout}
** When Server state == {state,"session-JhSE8VZM7fkb-nEmFuh_ew",<0.3273.0>,
                         <0.3267.0>,
                         {dict,12,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          {{[[<<"T_mcollective_broadcast_discovery">>|
                              {subscription,
                               "/exchange/mcollective_broadcast/discovery",
                               <0.3273.0>,auto,false,
                               "id='mcollective_broadcast_discovery'"}],
                             [<<"T_mcollective_broadcast_uploadfile">>|
                              {subscription,
                               "/exchange/mcollective_broadcast/uploadfile",
                               <0.3273.0>,auto,false,
                               "id='mcollective_broadcast_uploadfile'"}]],
                            [[<<"T_mcollective_4_directed_to_identity">>|
                              {subscription,
                               "/exchange/mcollective_directed/4",<0.3273.0>,
                               auto,false,
                               "id='mcollective_4_directed_to_identity'"}]],
                            [[<<"T_mcollective_broadcast_erase_node">>|
                              {subscription,
                               "/exchange/mcollective_broadcast/erase_node",
                               <0.3273.0>,auto,false,
                               "id='mcollective_broadcast_erase_node'"}],
                             [<<"T_mcollective_broadcast_puppetsync">>|
                              {subscription,
                               "/exchange/mcollective_broadcast/puppetsync",
                               <0.3273.0>,auto,false,
                               "id='mcollective_broadcast_puppetsync'"}],
                             [<<"T_mcollective_broadcast_execute_shell_command">>|
                              {subscription,
                               "/exchange/mcollective_broadcast/execute_shell_command",
                               <0.3273.0>,auto,false,
                               "id='mcollective_broadcast_execute_shell_command'"}]],
                            [],[],[],
                            [[<<"T_mcollective_broadcast_fake">>|
                              {subscription,
                               "/exchange/mcollective_broadcast/fake",
           ...

Read more...

Revision history for this message
Anastasia Palkina (apalkina) wrote :
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

> There are also Ceph errors in logs:
Ceph can't impact RabbitMQ (other than starving it for CPU/RAM if it goes amok). If there's a genuine problem with Ceph that can be seen in the logs from this bug but is unrelated to rabbit, please raise a new bug to track that.

Revision history for this message
Anastasia Palkina (apalkina) wrote :

Reproduced on ISO #169
"build_id": "2014-08-11_12-45-06",
"mirantis": "yes",
"build_number": "169",
"ostf_sha": "09b6bccf7d476771ac859bb3c76c9ebec9da9e1f",
"nailgun_sha": "04ada3cd7ef14f6741a05fd5d6690260f9198095",
"production": "docker",
"api": "1.0",
"fuelmain_sha": "43374c706b4fdce28aeb4ef11e69a53f41646740",
"astute_sha": "6db5f5031b74e67b92fcac1f7998eaa296d68025",
"release": "5.0.1",
"fuellib_sha": "a31dbac8fff9cf6bc4cd0d23459670e34b27a9ab"

I started 2 deployments at the same time.

First env:
1. Create new environment (Ubuntu, HA mode)
2. Choose GRE segmentation
3. Choose Ceph for images
4. Add 3 controllers, 1 compute, 1 cinder, 2 ceph
5. Start deployment

Second env:
1. Create new environment (Ubuntu, simple mode)
2. Choose nova-network, flat
3. Choose Ceph for volumes
4. Add 1 controller, 1 compute, 2 ceph
5. Start deployment

As a result Rabbitmq died. No errors for Ceph.

Logs are here: https://drive.google.com/a/mirantis.com/file/d/0B6SjzarTGFxaWXdSZ1B0RnQ1Umc/edit?usp=sharing

Controllers for first env: node-23-29

Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/5.1.x
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

RabbitMQ upgrade to 3.3.5 tracked in bug #1355947 may help fix this.

Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 5.1 → 6.0
Revision history for this message
Dima Shulyak (dshulyak) wrote :

After new version of rabbitmq (3.3.5) was installed this problem shouldnot occur.

If no hb received, rabbitmq handled it and closed connection

=WARNING REPORT==== 19-Aug-2014::14:37:51 ===
STOMP detected missed client heartbeat(s) on connection 172.17.42.1:60217 -> 172.17.0.12:61613, closing it

=INFO REPORT==== 19-Aug-2014::14:37:51 ===
closing STOMP connection <0.1628.0> (172.17.42.1:60217 -> 172.17.0.12:61613)

After this is done mcollective client successfully reconected and deployment continued

2014-08-19T14:37:11 debug: [431] Retry #1 to run mcollective agent on nodes: '6'
2014-08-19T14:37:54 debug: [431] Retry #2 to run mcollective agent on nodes: '6'
2014-08-19T14:37:54 debug: [431] ee649aa6-4468-48ac-a0b9-0c0e1f7dc3d0: MC agent 'puppetd', method 'last_run_summary', results: {:sender=>"6", :statuscode=>0, :statusmsg=>"O
K", :data=>{:idling=>0, :status=>"running", :runtime=>1408455474, :stopped=>0, :resources=>{"total"=>0, "restarted"=>0, "out_of_sync"=>0, "failed"=>1, "changed"=>0}, :lastrun=>0, :version=>nil, :output=>"Currently running; last completed run 1408455474 seconds ago", :time=>{"last_run"=>0}, :changes=>nil, :running=>1, :enabled=>1, :events=>nil}}

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

We already have 3.3.5 rabbit in master, I set this bug status to Fix committed then.

no longer affects: fuel/6.0.x
Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/5.1.x
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.