zsys automatic snapshots just eat up all drive space and never get cleaned up

Bug #1895943 reported by Tessa
62
This bug affects 12 people
Affects Status Importance Assigned to Milestone
zsys (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

the zsys automatic zfs snapshots use up an incredible amount of disk space by snapshotting things they shouldn't on every package install (like user's home directories). as well, there seems to be no cleanup done whatsoever. a month or so after installing my new desktop, my 1TB SSD was down to ~9% disk space free. I ran `zsysctl service gc -a`, and nothing changed. I manually deleted all automatic snapshots, and I was back to 453 GB free.

I'm not sure what the intention of this behaviour is, but it doesn't seem configurable, and it seems absolutely broken on the face of it. Certainly unusable for any normal system usage.

Please correct zsys to have configurable snapshot settings, and have gc do... something. anything.

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: zsys 0.4.7
ProcVersionSignature: Ubuntu 5.4.0-47.51-lowlatency 5.4.55
Uname: Linux 5.4.0-47-lowlatency x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair nvidia_modeset nvidia
ApportVersion: 2.20.11-0ubuntu27.8
Architecture: amd64
CasperMD5CheckResult: skip
CurrentDesktop: ubuntu:GNOME
Date: Thu Sep 17 00:49:54 2020
InstallationDate: Installed on 2020-07-15 (64 days ago)
InstallationMedia: Ubuntu 20.04 LTS "Focal Fossa" - Release amd64 (20200423)
ProcKernelCmdLine: BOOT_IMAGE=/BOOT/ubuntu_13qyk7@/vmlinuz-5.4.0-47-lowlatency root=ZFS=rpool/ROOT/ubuntu_13qyk7 ro intel_iommu=on nvidia-drm.modeset=1 quiet splash vt.handoff=1
RelatedPackageVersions:
 zfs-initramfs 0.8.3-1ubuntu12.4
 zfsutils-linux 0.8.3-1ubuntu12.4
SourcePackage: zsys
UpgradeStatus: No upgrade log present (probably fresh install)
ZFSImportedPools:
 NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
 bpool 1.88G 217M 1.66G - - 1% 11% 1.00x ONLINE -
 rpool 920G 439G 481G - - 11% 47% 1.00x ONLINE -
ZFSListcache-bpool:
 bpool /boot off on on off on off on off - none
 bpool/BOOT none off on on off on off on off - none
 bpool/BOOT/ubuntu_13qyk7 /boot on on on off on off on off - none

Revision history for this message
Tessa (unit3) wrote :
Revision history for this message
Didier Roche-Tolomelli (didrocks) wrote :

Thanks for reporting this bug and help making ubuntu better.

ZSys is supposed to take user snapshots at the same time than system snapshot. This is done on purpose, to ensure that a system revert will always revert you to a coherent state. You can have application data that changed format and upgraded your database. If we didn’t snapshot USERDATA, then, reverting the system will end up in, for instance, older thunderbird not being able to open the new database schema (it can’t know about it), and so, your system snapshot will be useless.

There is no point in snapshotting only the system thus. I suggest that you read this blog post which details that a little bit more: https://didrocks.fr/2020/05/28/zfs-focus-on-ubuntu-20.04-lts-zsys-general-principle-on-state-management/.

It seems though that you have a bunch of data generated/copied and then removed on your machine (0.5T?). The GC is by default kicking out everything that is more than one month old since 0.4.7, but if your threshold is higher I understand that cn be a bottleneck.

The good news is that this is configurable and explained at https://didrocks.fr/2020/06/04/zfs-focus-on-ubuntu-20.04-lts-zsys-state-collection/. Note that the default policy changed as stated above so that until free-space GC pressure is impemented, we mitigate extreme use cases like yours.
Another way is to consider using persistent datasets which will be excluded from snapshots as explained in https://didrocks.fr/2020/06/16/zfs-focus-on-ubuntu-20.04-lts-zsys-dataset-layout/

Revision history for this message
Tessa (unit3) wrote :

Alright, I've read through some of these design docs. it's unfortunate they're buried in blog posts and not just laid out on an ubuntu site about zsys and zfs usage on Ubuntu, but at least now I know where they are.

More to the point, all I've done is watch some movies and download some games on steam, and within a month my system had unusably low disk space. So I'd definitely dispute your claim that my use case is "extreme". my use case is standard desktop usage, and not accounting for very normal usage in your design before shipping it with the OS is honestly unbelievable. if I was running server workloads with highly changing datasets, this would be way worse. I typically see > 1TB / week of writes on low usage servers, let alone the sometimes multiple TBs/hour on high usage systems, so I honestly can't imagine who this system design is for.

I'll try playing with a manually installed zsys.conf file and see if I can get this down to something usable. maybe if I just keep it to 2 old states instead of 20, that'd be more viable. I'd definitely recommend that the zsys.conf usage be documented in the package, either with a man page or a README under /usr/share/doc, since these implementation details seem pretty well hidden.

Revision history for this message
Tessa (unit3) wrote :

as well, I'd highly recommend a command for zsysctl to purge up to X number of old states, so that an admin can manually get disk usage under control if necessary.

Revision history for this message
Nathan Nye (nathannye) wrote :

Identical experience to Tessa. Normal desktop usage with regular updates, within exactly 5 months I exhausted the storage capacity of my 2 TB drive (!!). Something is wrong here, older snapshots must have a sane rotation default.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in zsys (Ubuntu):
status: New → Confirmed
Revision history for this message
max (macsym69) wrote :

Hello, same situation for me... Up to the point I needed to boot in recovery mode to delete some states (it was so full I couldn't boot in normal mode).

Revision history for this message
André Stein (andre-stein-1985) wrote :

I can report the same just after a few month of working with ZFS. I now get this error message I try to do an apt upgrade:

ERROR couldn't save system state: Minimum free space to take a snapshot and preserve ZFS performance is 20%.

I am using ZFS on Ubuntu 21.10 on a private notebook that I use _occasionally_ meaning maximum once per week without a lot of data intensive stuff going there.

I agree with other comments that this ZFS default behaviour gives a bad impression on an otherwise great filesystem. The idea of snapshots is a good one but should have much saner defaults that also take into account on how much space they take. With that experience I would never enable ZFS with Ubuntu on a production relevant box.

Revision history for this message
Tavin Cole (tavin) wrote :

I'm running into the same errors as the previous commenter. I really have to agree that the default retention is excessive.

Also the solution of copypasting a config file from a blog post to override the defaults does not feel safe. I would expect the software to follow the convention of automatically installing a config file reflecting the defaults which I could then modify.

Finally, manual cleanup is needlessly cumbersome. You have to figure out and run 3 different commands to clean up an old state for the system, root, and your local user. It should be possible to run a single command to delete them all. And maybe there should be a helper command to delete multiple old states by count or age.

Thanks for your consideration.

Revision history for this message
Tavin Cole (tavin) wrote :

Oh and btw, why not automatically delete the oldest state(s) instead of producing this error?

Revision history for this message
Tessa (unit3) wrote :

over two years later, and this basic problem for many users hasn't been addressed at all. for anyone coming across this aging ticket, just purge zsys from your system and use third party snapshotting tools. the Arch wiki, as always, is a good source of suggestions (https://wiki.archlinux.org/title/ZFS#Automatic_snapshots).

Revision history for this message
Tavin Cole (tavin) wrote :

My apt upgrade failed (again) today because there wasn't enough space to make the initramfs.

Why aren't the oldest snapshots thrown away to make room?

There is absolutely no way I'm going to rollback to a snapshot from over a month ago.

Can we at least get a maintainer to comment on the roadmap here?
I remind you there is no /etc/zsys.conf to modify.

Thanks.

Revision history for this message
Tessa (unit3) wrote :

it's been 2.5 years, none of the bugs see any attention, zsys is abandonware. I'd recommend folks do what we're doing, and purge zsys from your systems post-install, and install a third party tool that works better for precisely what you need. and then plan the switch to Debian.

Revision history for this message
inf3rno (laszlo-janszky) wrote :

I just ran out of space. These Ubuntu defaults are funny...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.