heat engine can not report status after db restart

Bug #1464188 reported by Ethan Lynn
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
Medium
Ethan Lynn
Kilo
Fix Released
Undecided
Unassigned

Bug Description

After stoping an db and then restart it, heat service-list always shows engine status as down, unless heat-engine restart.

2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall Traceback (most recent call last):
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/opt/stack/heat/heat/openstack/common/loopingcall.py", line 81, in _inner
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall self.f(*self.args, **self.kw)
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 105, in wrapper
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall return f(*args, **kwargs)
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/opt/stack/heat/heat/engine/service.py", line 1611, in service_manage_report
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall dict(deleted_at=None))
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/opt/stack/heat/heat/objects/service.py", line 74, in update_by_id
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall db_api.service_update(context, service_id, values))
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/opt/stack/heat/heat/db/api.py", line 362, in service_update
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall return IMPL.service_update(context, service_id, values)
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/opt/stack/heat/heat/db/sqlalchemy/api.py", line 981, in service_update
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall service = service_get(context, service_id)
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/opt/stack/heat/heat/db/sqlalchemy/api.py", line 999, in service_get
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall result = model_query(context, models.Service).get(service_id)
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 840, in get
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall return loading.load_on_ident(self, key)
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/loading.py", line 235, in load_on_ident
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall return q.one()
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2398, in one
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall ret = list(self)
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2441, in __iter__
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall return self._execute_and_instances(context)
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2454, in _execute_and_instances
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall close_with_result=True)
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2445, in _connection_from_session
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall **kw)
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 880, in connection
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall execution_options=execution_options)
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 887, in _connection_for_bind
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall conn = engine.contextual_connect(**kw)
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/compat/handle_error.py", line 104, in contextual_connect
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall **kwargs)
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 85, in __init__
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall self.dispatch.engine_connect(self, _branch)
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/event/attr.py", line 260, in __call__
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall fn(*args, **kw)
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/session.py", line 345, in _connect_ping_listener
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall connection.scalar(select([1]))
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 771, in scalar
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall return self.execute(object, *multiparams, **params).scalar()
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 841, in execute
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall return meth(self, multiparams, params)
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py", line 322, in _execute_on_connection
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall return connection._execute_clauseelement(self, multiparams, params)
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 938, in _execute_clauseelement
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall compiled_sql, distilled_params
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1005, in _execute_context
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall None, None)
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/compat/handle_error.py", line 152, in _handle_dbapi_exception
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall e, self.dialect, self.engine)
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/compat/handle_error.py", line 218, in _handle_dbapi_exception_noconnection
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall six.reraise(type(newraise), newraise, exc_info[2])
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 999, in _execute_context
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall conn = self._revalidate_connection()
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/compat/handle_error.py", line 164, in _revalidate_connection
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall _connection=self)
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/compat/handle_error.py", line 119, in raw_connection
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall self.pool.unique_connection, _connection)
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/compat/handle_error.py", line 115, in _wrap_pool_connect
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall six.reraise(*sys.exc_info())
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/compat/handle_error.py", line 109, in _wrap_pool_connect
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall return fn()
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 280, in unique_connection
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall return _ConnectionFairy._checkout(self)
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 645, in _checkout
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall fairy = _ConnectionRecord.checkout(pool)
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 442, in checkout
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall dbapi_connection = rec.get_connection()
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 526, in get_connection
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall self.connection = self.__connect()
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 539, in __connect
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall connection = self.__pool._creator()
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/compat/handle_error.py", line 94, in connect
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall raise original_exception
2015-06-11 15:56:15.542 TRACE heat.openstack.common.loopingcall DBConnectionError: (OperationalError) (2003, 'Can\'t connect to MySQL server on \'127.0.0.1\' (111 "Connection refused")') None None

[elynn@localhost ~]$ heat service-list
+-----------------------+-------------+--------------------------------------+-----------------------+--------+----------------------------+--------+
| hostname | binary | engine_id | host | topic | updated_at | status |
+-----------------------+-------------+--------------------------------------+-----------------------+--------+----------------------------+--------+
| localhost.localdomain | heat-engine | 1aec1861-ca56-4adc-8037-e87a81c98a40 | localhost.localdomain | engine | 2015-06-11T08:24:06.000000 | down |
| localhost.localdomain | heat-engine | 4a9b1cd9-a113-47b6-af23-9c5ec33ddedd | localhost.localdomain | engine | 2015-06-11T08:23:41.000000 | down |
| localhost.localdomain | heat-engine | 8f3f2fc3-b990-497c-bf53-6eafa88681b1 | localhost.localdomain | engine | 2015-06-11T08:23:41.000000 | down |
| localhost.localdomain | heat-engine | a518dff1-eddc-4c6e-bff3-1c3d91d43a73 | localhost.localdomain | engine | 2015-06-11T08:23:41.000000 | down |
| localhost.localdomain | heat-engine | abfededf-7938-436f-ae91-c038e19cda9c | localhost.localdomain | engine | 2015-06-11T08:23:41.000000 | down |
| localhost.localdomain | heat-engine | b931cdc2-0289-4bcf-82ba-6191c4c8e33c | localhost.localdomain | engine | 2015-06-11T08:24:06.000000 | down |
| localhost.localdomain | heat-engine | c22a77a4-6bf0-4c27-95e8-1a4b52e8964e | localhost.localdomain | engine | 2015-06-11T08:24:06.000000 | down |
| localhost.localdomain | heat-engine | c9dcc079-4c98-41f5-b48c-103df8c75886 | localhost.localdomain | engine | 2015-06-11T08:24:06.000000 | down |
+-----------------------+-------------+--------------------------------------+-----------------------+--------+----------------------------+--------+

Although engine status is down, heat stack-list and stack-create works fine. so it's only a bug related service_update function.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/190647

Changed in heat:
assignee: nobody → Ethan Lynn (ethanlynn)
status: New → In Progress
Changed in heat:
importance: Undecided → Medium
Ethan Lynn (ethanlynn)
tags: added: kilo-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/190647
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=4e21b371d85289548c04f50f5614ede70ff1a21f
Submitter: Jenkins
Branch: master

commit 4e21b371d85289548c04f50f5614ede70ff1a21f
Author: Ethan Lynn <email address hidden>
Date: Thu Jun 11 20:09:26 2015 +0800

    Catch exceptions in service status report

    If db throw any exceptions during service status report, service
    will not report its status forever. This patch catch errors during
    status report and log them in to log files.

    Change-Id: I6775b3446caffabd3135d5e24098ab0ee31975ac
    Closes-Bug: #1464188

Changed in heat:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/190935

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (stable/kilo)

Reviewed: https://review.openstack.org/190935
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=8919a3474e12e94e6a5651cc85be4634922f8fe0
Submitter: Jenkins
Branch: stable/kilo

commit 8919a3474e12e94e6a5651cc85be4634922f8fe0
Author: Ethan Lynn <email address hidden>
Date: Thu Jun 11 20:09:26 2015 +0800

    Catch exceptions in service status report

    If db throw any exceptions during service status report, service
    will not report its status forever. This patch catch errors during
    status report and log them in to log files.

    Change-Id: I6775b3446caffabd3135d5e24098ab0ee31975ac
    Closes-Bug: #1464188
    (cherry picked from commit 4e21b371d85289548c04f50f5614ede70ff1a21f)

tags: added: in-stable-kilo
Thierry Carrez (ttx)
Changed in heat:
milestone: none → liberty-1
status: Fix Committed → Fix Released
tags: removed: kilo-backport-potential
Thierry Carrez (ttx)
Changed in heat:
milestone: liberty-1 → 5.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.