Wrong symlinks for "bak" directories cause logrotate failures

Bug #1428150 reported by Bogdan Dobrelya
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Sebastian Kalinowski
5.0.x
Won't Fix
High
Fuel Python (Deprecated)
5.1.x
Won't Fix
High
Fuel Python (Deprecated)
6.0.x
Won't Fix
High
Fuel Python (Deprecated)

Bug Description

From original bug https://bugs.launchpad.net/fuel/+bug/1427197

When /var/log/remote contains "*.bak" directories, the logrotate may fail with "File exist" error and stop rotating all log files for all affected /var/log/remote/<symlynk> paths, there <symlink> points to the nodes' directories which contain ".*bak"-ed copies

An example https://launchpadlibrarian.net/199353405/logrotate-bug.txt
Related bug https://review.openstack.org/#/c/138397

Björn Pettersson provided the root cause:
The duplicate symlinks should actually point at the .bak directory and
not where it used to point.

> lrwxrwxrwx 1 root root 29 Feb 2 13:02 /var/log/remote/172.31.2.77 -> node-84.domain.tld
> lrwxrwxrwx 1 root root 29 Feb 2 19:32 /var/log/remote/172.31.2.91 -> node-84.domain.tld

[root@fuel-juno remote]# ssh -q node-84 ifconfig | grep 172.31
          inet addr:172.31.2.91 Bcast:172.31.3.255 Mask:255.255.252.0

The .77 symlink should point to this one:
> drwxr-xr-x 4 root root 4096 Feb 2 19:17 /var/log/remote/node-88.domain.tld.bak

Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
milestone: none → 6.1
importance: Undecided → Medium
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Raised to high as this issue with symlinks affects the cloud operations in a long term

summary: - Symlinks for "bak" directories cause logrotate failures
+ Wrong symlinks for "bak" directories cause logrotate failures
Changed in fuel:
status: New → Confirmed
description: updated
Changed in fuel:
importance: Medium → High
tags: added: logging
tags: added: to-be-covered-by-tests
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

I reassigned this bug to Nailgun team as it seems the duplicating symlinks is being created by nailgun backend after environment reset/recreate.

Revision history for this message
Roger Törnström (roger-tornstrom) wrote :

FYI, in case it makes any difference behind the hood.

In the case that lead to this bugreport (https://bugs.launchpad.net/fuel/+bug/1427197) the environment was not reset. We deployed controllers and discovered that Telemetry node(s) could not be deployed from the GUI afterwards. So the controllers were deleted an added again.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

I reproduced with bug again with iso #153.
It is enough to remove a node from environment and add another one. There is a chance what the new node received the same name as it had been given to the removed node. And for such case, the symlink created for removed node will point to wrong directory: http://paste.openstack.org/show/189207/

/var/log/remote/10.108.0.3 -> node-1.test.domain.local
/var/log/remote/10.108.0.7 -> node-1.test.domain.local

But the first one should point to the node-1.test.domain.local.bak dir instead

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

And here is a test case for this bug:
1) Remove a primary-controller node from environment, and deploy
2) Add a new controller, and deploy
3) repeat the steps 1-2 until some of the old node name swill be reused for a new node
4) verify output of ls -ld /var/log/remote/* | awk '{ print $11 }' | sort | uniq -d
it should give nothing (test passed), otherwise it will print the names of affected nodes

Revision history for this message
Maciej Kwiek (maciej-iai) wrote :

I takes ages for me to deploy anything, I need to wait until we have our own office server for tests to even reproduce this bug :/

Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/6.1.x
Dmitry Pyzhov (dpyzhov)
tags: added: feature-logging
removed: logging
Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/7.0.x
Revision history for this message
Ivan Kliuk (ivankliuk) wrote :
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

@Ivan, do you have this bug in progress? If so, could you please update the status?

Revision history for this message
Ihor Kalnytskyi (ikalnytskyi) wrote :

@Bogdan, Ivan is on vacation. I think someone else should work on this.

Dmitry Pyzhov (dpyzhov)
tags: added: tricky
tags: added: logrotate
Revision history for this message
Alexander Bozhenko (alexbozhenko) wrote :

It probably will be addressed, if we will not have .bak at all:
https://bugs.launchpad.net/fuel/+bug/1428825

Revision history for this message
Alexander Bozhenko (alexbozhenko) wrote :

Probably, we can address this issue:
 error: error creating output file /var/log/remote/172.31.2.82/mongod.27017.log.1: File exists
By using:
dateformat -%Y%m%d-%s

So, rotated files will look like:
filename.log-20150412-1428814861.gz

Since unix time seconds is uniq every time, no 'File exists' errors should be present.
This need to be tested.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

IMHO, the prepare_syslog_dir() should be fixed to not create duplicating sym links pointing to the same directories AND must be covered by tests - there is none yet for this method

Revision history for this message
Kamil Sambor (ksambor) wrote :

This bug will be fix by patchset for https://bugs.launchpad.net/fuel/+bug/1428825 becaus it remove also logs for deleted nodes, so there shodn't be a situation like this described in bug

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

@Kamil, this bug is not about environment deletion. The subject is about removing and adding nodes to existing environments. Please confirm that the issue https://bugs.launchpad.net/fuel/+bug/1428825 would have resolved this one as well. Until then I removed the duplicate status. Thank you!

Revision history for this message
Kamil Sambor (ksambor) wrote :

I talk with Sylwester and we will remove old logs when you remove node from cluster.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Reproduced with ISO #414
[root@nailgun ~]# ls -ld /var/log/remote/* | awk '{ print $11 }' | sort | uniq -d
node-3.test.domain.local
node-4.test.domain.local

as a result logrotate fails with "File already exists" error mentioned above.

How to reproduce:
1) Deploy 1 controller
2) Add 2 controllers and deploy
3) Add 2 more controllers and deploy
4) Remove 2 controllers and deploy
5) Reset env and repeat steps 1-4 several times

Revision history for this message
Łukasz Oleś (loles) wrote :

Please provide a snapshot, look like a bug in previous patch.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

I don't have this environment anymore, sorry

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

@Lukasz, there is a scale labs affected by this issue permanently due to the scecific flow they use for add/remove reset envs - feel free to investigate. You can contact Leontiy Istomin to get the environment

Revision history for this message
Kamil Sambor (ksambor) wrote :

On lab http://172.16.44.10/ there are old iso (build number: 317) without fix of this problem. We still need reproduce this bug on newer with fix for this problem (iso >383). I change status back for incomplite

Revision history for this message
Leontiy Istomin (listomin) wrote :

6.1-425. I deployed an env then deleteed it then created and deployed new env.

[root@fuel ~]# ls -ld /var/log/remote/* | awk '{ print $11 }' | sort | uniq -d

[root@fuel ~]#

It seems it works fine. Should I check something else to be sure?

Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

Looks like this bug fixed with this bugfix: https://bugs.launchpad.net/fuel/+bug/1428825

Revision history for this message
Leontiy Istomin (listomin) wrote :

reproduced the issue with 6.1-511 build:
http://paste.openstack.org/show/278338/
[root@fuel ~]# du -hs /var/log/docker-logs/remote/node-39.domain.tld/* | grep G
5.4G /var/log/docker-logs/remote/node-39.domain.tld/keystone-all.log
7.4G /var/log/docker-logs/remote/node-39.domain.tld/neutron-server.log
1.4G /var/log/docker-logs/remote/node-39.domain.tld/nova-api.log

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

@Leontiy, and what was an output for the
ls -ld /var/log/remote/* | awk '{ print $11 }' | sort | uniq -d
?

Revision history for this message
Leontiy Istomin (listomin) wrote :

@Bogdan, you can find requested info here: http://paste.openstack.org/show/278338/

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Yes, thank you, didn't notice. This is the case, indeed.

Revision history for this message
Sebastian Kalinowski (prmtl) wrote :

The fix is ready, now I need to test it to be sure that this issue will not happen when reseting env. It should be done tomorrow morning (CEST)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/190965

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/190965
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=f39b95b6dc993ba1224ddfab3d1e8ea0848f87d7
Submitter: Jenkins
Branch: master

commit f39b95b6dc993ba1224ddfab3d1e8ea0848f87d7
Author: Sebastian Kalinowski <email address hidden>
Date: Fri Jun 12 09:36:24 2015 +0200

    Remove old logs when resetting environment

    If logs are left, they cause failing of logratate since
    there are hanging symlinks left.
    It was fixed for removing nodes and fix for resetting must be added.

    Change-Id: I047bf4161cf1f76b30bf7a0f5b6d13207d314019
    Closes-Bug: #1428150

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/6.1)

Fix proposed to branch: stable/6.1
Review: https://review.openstack.org/191001

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (stable/6.1)

Reviewed: https://review.openstack.org/191001
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=86af2b9bf3b9998e576bc5e8c8c7f02cfb49b492
Submitter: Jenkins
Branch: stable/6.1

commit 86af2b9bf3b9998e576bc5e8c8c7f02cfb49b492
Author: Sebastian Kalinowski <email address hidden>
Date: Fri Jun 12 09:36:24 2015 +0200

    Remove old logs when resetting environment

    If logs are left, they cause failing of logratate since
    there are hanging symlinks left.
    It was fixed for removing nodes and fix for resetting must be added.

    Change-Id: I047bf4161cf1f76b30bf7a0f5b6d13207d314019
    Closes-Bug: #1428150
    (cherry picked from commit f39b95b6dc993ba1224ddfab3d1e8ea0848f87d7)

tags: added: long-haul-testing
Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/6.1.x
Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Won't Fix for 6.0-updates as we have no delivery channel for Fuel fixes in 6.0

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.