sshfs mount locked up after suspend

Bug #388419 reported by Bogdan Butnaru
182
This bug affects 36 people
Affects Status Importance Assigned to Milestone
sshfs-fuse (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

Binary package hint: sshfs

Hello!

I have a server on which most of my media resides, and I'm using SSHFS to mount its drives on my desktop computer. The commands I use are like this (tanelorn is the server):

sshfs -o reconnect,transform_symlinks,allow_other,nonempty,fsname='tanelorn:/mnt/corum' tanelorn:/mnt/corum /media/corum

This works nicely in general. However, strange things happen after a suspend: For instance, usually I have Amarok playing in the background; I pause it before I suspend.

If I resume playing after the computer wakes up, Amarok will keep playing for a while, then it freezes. (Presumably, when its buffers empty.) For some reason, the SSHFS mountpoints are in a kind of “frozen” state. If I try to use them in any other application (open in Nautilus, ls in terminal), that app freezes too. Unmount doesn't work (it says the mounts are in use), and I have to kill all SSHFS processes. When I do that, sometimes the apps will wake up, sometimes they remain frozen; today Amarok turned unkillable (kill -9 didn't do anything), but while I wrote this report it disappeared.

After I kill SSHFS I can simply remount and things work OK until the next suspend.

Another weird thing: For a while I've used afuse to do the SSHFS mounting. (I've stopped because it interferes with the Nautilus Trash can.) The strange thing is that after a suspend Amarok behaved similarly—played whatever it had in the buffer until it exhausted it—but then it skipped to the next track with no lock-up.
---
Architecture: amd64
DistroRelease: Ubuntu 10.04
EcryptfsInUse: Yes
NonfreeKernelModules: nvidia
Package: sshfs 2.2-1build1
PackageArchitecture: amd64
ProcEnviron:
 LANGUAGE=en_US:en
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.32-21.32-generic 2.6.32.11+drm33.2
Tags: lucid
Uname: Linux 2.6.32-21-generic x86_64
UserGroups: adm admin audio cdrom dialout floppy fuse lpadmin netdev plugdev sambashare scanner staff video

---
Architecture: amd64
DistroRelease: Ubuntu 10.04
EcryptfsInUse: Yes
NonfreeKernelModules: nvidia
Package: sshfs 2.2-1build1
PackageArchitecture: amd64
ProcEnviron:
 LANGUAGE=en_US:en
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.32-21.32-generic 2.6.32.11+drm33.2
Tags: lucid
Uname: Linux 2.6.32-21-generic x86_64
UserGroups: adm admin audio cdrom dialout floppy fuse lpadmin netdev plugdev sambashare scanner staff video

Revision history for this message
Zach Dwiel (zdwiel) wrote :

This also happens to me. Is there any way I can provide more debug information so we can figure out what is going on?

Thanks!

zach

Revision history for this message
Charlie Kravetz (cjkgeek) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. We are sorry that we do not always have the capacity to look at all reported bugs in a timely manner. There have been many changes in Ubuntu since that time you reported the bug and your problem may have been fixed with some of the updates. If you could test the current Ubuntu development version, this would help us a lot. If you can test it, and it is still an issue, we would appreciate if you could upload updated logs by running apport-collect 388419, and any other logs that are relevant for this particular issue.

Changed in sshfs-fuse (Ubuntu):
status: New → Incomplete
Revision history for this message
Eric Drechsel (ericdrex) wrote :

still present in Lucid beta. this issue is easily reproducible on any system with suspend (just mount a share with sshfs and suspend). I searched for an upstream issue, and found this:

http://sourceforge.net/mailarchive/forum.php?<email address hidden>&forum_name=fuse-sshfs

Revision history for this message
Bogdan Butnaru (bogdanb) wrote :

Moving to package “sshfs”, which is the current name in Lucid (it seems it was renamed from “sshfs-fuse” at some point). The bug is still manifest, will attach the apport info right away.

Changed in sshfs-fuse (Ubuntu):
status: Incomplete → New
tags: added: apport-collected
description: updated
Revision history for this message
Bogdan Butnaru (bogdanb) wrote : Dependencies.txt

apport information

description: updated
Revision history for this message
Bogdan Butnaru (bogdanb) wrote :

apport information

Revision history for this message
Bogdan Butnaru (bogdanb) wrote :

I’m sorry, something went wrong on the way:

bogdanb@mabelode:~$ apport-collect 388419
Package sshfs-fuse not installed and no hook available, ignoring

Just like that it didn’t work, presumably because apport was looking for “sshfs-fuse” which is the name of the source package, instead of “sshfs” which is the binary. Should this be reported as a bug against apport?

I ran however “apport-collect --package=sshfs 388419” which seems to have worked. Something went wrong the first time, so I ran it twice; the second time it seems to have worked.

Revision history for this message
Charlie Kravetz (cjkgeek) wrote :

 Thanks for for testing this and for the additional documentation. Since this bug has enough information provided for a developer to begin work, I'm going to mark it as confirmed and let them handle it from here. Thanks for taking the time to make Ubuntu better!

Changed in sshfs-fuse (Ubuntu):
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Paul Heidelman (phissure) wrote :

I just want to confirm that I am having this problem as well.

I use rhythmbox to play music off a sshfs mounted drive (in /media/).
This happens when I forget to close rhythmbox and unmount my sshfs before Suspending.
I cannot unmount the drive, cannot kill my frozen rhythmbox or nautilus (even with -9 and sudo), and must restart the system (I just tried the OP's 'killall sshfs -9', and this fixes it as well).

Fully updated 10.04 on a Dell Latitude laptop.

Revision history for this message
Luke Tidd (lukeisgreat) wrote :

I don't think this is a bug; like I don't think there will be an easy way for a remotely mounted file system to survive losing network connectivity, but I would love to see if there was a clever solution.

Revision history for this message
axx (axx) wrote :

I get a similar situation with SSHFS and Banshee.

killall sshfs -9 and remounting the share solves it.

And to answer Luke Tidd, I don't remember NFS doing the same after resume. Which actually quite impressed me.

Revision history for this message
Bogdan Butnaru (bogdanb) wrote :

Here’s a work-around for others experiencing this issue, which should help until we solve the problem.

The trick is to automatically unmount sshfs before the network connection goes down, and re-mount it afterwards. (The network goes down right before suspend, and is brought up after resume.)

Download the file I’m attaching to this message.

First of all, open the file in a text-editor and customize it for your needs. WARNING! If you don’t understand what goes on in there and how to customize it, don’t continue! Ask someone knowledgeable to help you first (show them this message). The script as given is configured for my system, it will NOT work on yours. Make sure you understand the comments, not just what to change; the effects might not be what you want, especially for laptops.

Then make the file executable and test it manually (see below). To make it work automatically you need to put in /etc/network/if-down.d/ and /etc/network/if-up.d/ (put it in one directory then make a symlink to it in the other; its name doesn’t matter). It should be working right away; that is, once the file is correctly customized and installed you can suspend, no need to reboot first or anything.

You can test it manually by running (as root) commands like below. There’s a line you can edit in the file that directs output to the file of your choice.

# IFACE=eth0 PHASE=down ./sshfs
# IFACE=eth0 PHASE=up ./sshfs

Note that with this method I personally don’t need to kill sshfs processes. However, depending on what applications use the mount point(s) during suspend, they might get stuck. You can edit the file to add “kill -9” commands if you need this.

(In my case, things get stuck sometimes during normal use, not during suspend/resume; I have a similar script on my panel that unmounts, kills sshfs, then remounts everything, just for this case.)

Contact me directly if you need help with this script, don’t spam this bug report.

Revision history for this message
crazy ivan (x-floc) wrote :

Hi, just an update on Ubuntu 10.10:
* The issue is still not fixed and leads to severe system halts.
* Bogdans workaround is not supported anymore since the networking daemon moved to upstart completely

/etc/network/if-down.d/ is only processed if you explicitly mention it in /etc/network/interfaces (c.f. outdated information in https://wiki.ubuntu.com/OnNetworkConnectionRunScript). In this case the networkmanager will show "device not managed" instead of managing the connection.

* My simple approach was to create /usr/lib/pm-utils/sleep.d/54ssh, make it executable and enter

#!/bin/sh
case "$1" in
        hibernate|suspend)
  pkill -x ssh
                ;;
        thaw|resume)
  IFACE=eth0 PHASE=up /etc/network/if-up.d/sshfs.
                ;;
esac

I execute 'pkill -x ssh' since it gets rid off sshfs + kills all open ssh session, which else give me 'Broken pipe' messages that I don't need.

This takes care of system after suspend/hibernate. When a network-connection is lost (thanks god I have LAN), the whole system still crashes.

This solution did not emerge immediately and is still not complete:

* the "up" script does not really work since it asks to
"Enter passphrase for key '/root/.ssh/id_rsa" or ''/home/user/.ssh/id_rsa' respectively even though I generated a specific key for root via 'ssh-keygen ' and added it via 'ssh-copy-id -i'.. If I enter the passphrase it passes fine, but this is no automation then.

* My approaches towards creating an /etc/init/sshfs script have been fruitless, since upstart does not seem to have a network_down or suspend event and setting it on the "stopping network-manager" event seems not to be invoked during suspend. FYI some related links on upstart:
introduction: http://www.linux.com/learn/tutorials/404619-manage-system-startup-and-boot-processes-on-linux-with-upstart
http://tech.zhenhua.info/2010/12/ubuntu-init-script.html
http://upstart.ubuntu.com/wiki/Stanzas and "man intctl"

In my opinion the whole System V vs. upstart business is a terrible mess, with most tools still using the old syntax and some slowly moving to upstart, which in my opinion is poorly documented and very painful to debug (ever wondered what the "console" is supposed to do??).

Revision history for this message
crazy ivan (x-floc) wrote :

Also found the issue on http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=565229 and
http://sourceforge.net/mailarchive/forum.php?thread_name=E1Nz5rE-0006Yj-EW%40pomaz-ex.szeredi.hu&forum_name=fuse-sshfs.

Had me rebooting and lossing work two times this week already:
1. time the connected server had a large load an thus delay
2. time the cleaning service disconnected the LAN switch power supply

both times system freeze!

I'm thinking about a script to monitor connectivity and killl the process as fast as possible as I found on the debian thread
"The only solution I've found is to kill -9 the sshfs userspace daemon and then to fusermount -uz the mountpoint. "

I mean this package is supposed to be "maintained by canonical"...

Revision history for this message
Kent Asplund (hoglet) wrote :

I am using sshfs together with autofs in 11.10 and is seeing the same problem.

Workaround:
Created a script "sshfs" in /etc/network/if-down.d containning:
#!/bin/sh
killall sshfs
sleep 1s
killall -9 sshfs

Created a script "autofs" in /etc/network/if-up.d containing:
#!/bin/sh
restart autofs

Revision history for this message
DJ (ke7mbz) wrote :

I experience this problem for both NFS and SSHFS. If I try to suspend when connected to either, suspend fails. Also, nautilus hangs if the interface goes down while connected to the remote FS.

Placing the script shown in http://ubuntuforums.org/archive/index.php/t-430312.html in /etc/network/if-down.d/ worked almost flawlessly in 11.04, but fails in 11.10.

I did make a few alterations to the script. I actually didn't have a problem with NFS in 11.04, but that has changed in 11.10. Therefore, I added nfs and nfs4 to the umount script.

I tried placing the script in sleep.d/, but that locks my laptop mid-suspend, and I have to do a hard shutdown. The script seems to execute too late in the process.

So how can you get the remote FS to umount before suspending, so it doesn't interfere with the suspend process?

Also, my nfs client in 11.10 is defaulting to nfsv4, which has caused additional problems. It seems premature to make v4 the default.

Revision history for this message
DJ (ke7mbz) wrote :

I should clarify, the scripts in ifdown.d/ don't even execute on suspend in 11.10. I put a test script there that simply logs activity, and it never ran.

Kent, it seems that your script is executing, so that's curious.

Revision history for this message
dronus (paul-geisler) wrote :

Still there in 14.04. A very thrusty bug :-)

Also applies to SMB shares in the same manner. After resuming, nautlius hangs.

The hang may even occur on opening local folders, or desktop item properties, if some sftp or smb mount is dangling after resume.

Revision history for this message
sfxdude (sfxdude1) wrote :

Can confirm that this bug still exists. Upon killing the ssh process any ls commands waiting give 'cannot access x: Input/output error'

Would be nice to have this 7 year old bug fixed!

Revision history for this message
allan999 (allan-laal) wrote :

still present in Ubuntu 16.04 LTS

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.