Use of os/exec in juju is problematic in resource limited environments.

Bug #1516676 reported by Eric Snow
48
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Low
Nicholas Skaggs

Bug Description

See lp:1382556.

Go uses fork+exec to execute commands in subprocesses. The fork means a duplication of Juju's memory usage for a moment or two, which can be a problem depending on the available memory on the host and the kernel's overcommit setting.

One possible solution is to use a "forker" process that serves an RPC API (likely over a socket). Then we could run commands using that API rather than directly/indirectly through os/exec. See https://lists.ubuntu.com/archives/juju-dev/2015-June/004566.html.

tags: added: tech-debt
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.0-alpha1
Changed in juju-core:
milestone: 2.0-alpha1 → 2.0-beta1
Curtis Hovey (sinzui)
tags: added: run
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta1 → 2.0-beta2
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta2 → 2.0-beta3
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta3 → 2.0-beta4
Changed in juju-core:
milestone: 2.0-beta4 → 2.1.0
affects: juju-core → juju
Changed in juju:
milestone: 2.1.0 → none
milestone: none → 2.1.0
Revision history for this message
Anastasia (anastasia-macmood) wrote :

As this is a tech-debt item, I am lowering its priority and removing from the milestone.

Changed in juju:
importance: High → Low
milestone: 2.1.0 → none
Revision history for this message
Junien Fridrick (axino) wrote :

Hi,

juju2 controllers, which can host more than 1 model, are way more likely to hit this bug. Above all when mongodb is configured with default memory settings (LP#1671466). Unless you're using _very_ large nodes to host your controllers, you will be in a "resource limited" environment whenever you're starting to actually use your

Given the numerous results obtained when grepping exec.Command in the source code, I believe this bug should be triaged as High importance, and not consider tech debt. I'll defer to your opinion if you still think this is not the case, though.

Thank you

tags: added: canonical-is
removed: tech-debt
Changed in juju:
importance: Low → High
Felipe Reyes (freyes)
tags: added: sts
Revision history for this message
Felipe Reyes (freyes) wrote :
Download full text (6.4 KiB)

We have a juju 2.1.3 controller handling ~500 machines, jujud is using 105G, the controller has 60G free ... which is more than enough to work properly until jujud decides to fork/exec. I do agree with Junien that this bug should be classified as high priority, and hopefully targeted to 2.3, because the only workaround is restart jujud, but in a 500 machines env (~1500 agents) produces a massive config-changed hook trigger putting a lot of pressure on the controller.

1a6f9258-9688-40a8-88b5-a7f8260b18b4: machine-0 2017-07-25 16:30:00 WARNING juju.apiserver.machine machiner.go:200 not updating network config for container "1/lxd/3"
1a6f9258-9688-40a8-88b5-a7f8260b18b4: machine-0 2017-07-25 16:30:00 WARNING juju.apiserver.machine machiner.go:200 not updating network config for container "2/lxd/3"
1a6f9258-9688-40a8-88b5-a7f8260b18b4: machine-0 2017-07-25 16:30:02 ERROR juju.worker.dependency engine.go:547 "disk-manager" manifold worker returned unexpected error: cannot list block devices: lsblk failed: fork/exec /bin/lsblk: cannot allocate memory
1a6f9258-9688-40a8-88b5-a7f8260b18b4: machine-0 2017-07-25 16:30:09 ERROR juju.worker.dependency engine.go:547 "disk-manager" manifold worker returned unexpected error: cannot list block devices: lsblk failed: fork/exec /bin/lsblk: cannot allocate memory
1a6f9258-9688-40a8-88b5-a7f8260b18b4: machine-0 2017-07-25 16:30:14 WARNING juju.apiserver.machine machiner.go:200 not updating network config for container "0/lxd/1"
1a6f9258-9688-40a8-88b5-a7f8260b18b4: machine-0 2017-07-25 16:30:15 ERROR juju.worker.dependency engine.go:547 "disk-manager" manifold worker returned unexpected error: cannot list block devices: lsblk failed: fork/exec /bin/lsblk: cannot allocate memory
1a6f9258-9688-40a8-88b5-a7f8260b18b4: machine-0 2017-07-25 16:30:22 ERROR juju.worker.dependency engine.go:547 "disk-manager" manifold worker returned unexpected error: cannot list block devices: lsblk failed: fork/exec /bin/lsblk: cannot allocate memory
1a6f9258-9688-40a8-88b5-a7f8260b18b4: machine-0 2017-07-25 16:30:28 ERROR juju.worker.dependency engine.go:547 "disk-manager" manifold worker returned unexpected error: cannot list block devices: lsblk failed: fork/exec /bin/lsblk: cannot allocate memory
1a6f9258-9688-40a8-88b5-a7f8260b18b4: machine-0 2017-07-25 16:30:35 ERROR juju.worker.dependency engine.go:547 "disk-manager" manifold worker returned unexpected error: cannot list block devices: lsblk failed: fork/exec /bin/lsblk: cannot allocate memory
1a6f9258-9688-40a8-88b5-a7f8260b18b4: machine-0 2017-07-25 16:30:42 ERROR juju.worker.dependency engine.go:547 "disk-manager" manifold worker returned unexpected error: cannot list block devices: lsblk failed: fork/exec /bin/lsblk: cannot allocate memory
1a6f9258-9688-40a8-88b5-a7f8260b18b4: machine-0 2017-07-25 16:30:48 ERROR juju.worker.dependency engine.go:547 "disk-manager" manifold worker returned unexpected error: cannot list block devices: lsblk failed: fork/exec /bin/lsblk: cannot allocate memory
1a6f9258-9688-40a8-88b5-a7f8260b18b4: machine-0 2017-07-25 16:30:55 ERROR juju.worker.dependency engine.go:547 "disk-manager" manifold worker returned unexpected error: cannot list block...

Read more...

Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Felipe Reyes (freyes) and Junien Fridrick (axino),

2.1.3 controllers and models should be upgraded to 2.2.2 at least.

The problem is that fork/exec is tripped on high memory usage caused by a few lingering memory leaks in 2.1.3. These leaks are not immediately apparent and do strike on a longer running environments rather than the short lived ones.

The good news is that these memory leaks are known to be fixed in 2.2.2.

As for fork/exec, this is the default behavior for golang os/exec. All we can do to avoid tripping it in Juju is to use memory wisely and not leak it :D

As for this report, it's dedicated to a specific tech debt item. I am lowering its priority back to the original Low.
If you do encounter issues with 2.2.2, I think it will become a different bug that will deal with observable memory patterns rather than standard library used... Keep us posted and let's decide what we do if you would still observe issues related to fork/exec with a running 2.2.2.

Changed in juju:
importance: High → Low
Revision history for this message
William Grant (wgrant) wrote :

This bug is still High or Critical; Juju still easily uses a gigabyte of RAM on tiny controllers with just a couple of dozen machines.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

@William Grant (wgrant),

Which version of Juju are referring to?

Revision history for this message
William Grant (wgrant) wrote : Re: [Bug 1516676] Re: Use of os/exec in juju is problematic in resource limited environments.

On 26/07/17 13:37, Anastasia wrote:
> @William Grant (wgrant),
>
> Which version of Juju are referring to?

2.2.2

Revision history for this message
Tim Penhey (thumper) wrote :

@William, we really shouldn't be running controllers on tiny machines for a start.

I'll talk with those that know better, but fork/exec is the fundamental way that the golang os/exec package works. Changing this is non-trivial.

Revision history for this message
Joel Sing (jsing) wrote :

I'd recommend deferring for Go 1.9 (due August 1st 2017) - this contains a change that makes use of CLONE_VM and CLONE_VFORK to avoid cloning the parent's address table. This should reduce os.Exec latency as well as reducing memory requirements:

https://tip.golang.org/doc/go1.9

https://go-review.googlesource.com/c/37439/

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

With 1.9 we will get a replacement of CoW behavior on clone(2) & SIGCHLD flag only with clone(2) + SIGCHLD|CLONE_VFORK|CLONE_VM flags (i.e. no page table copying will be done as in this "for" loop in dup_mmap http://elixir.free-electrons.com/linux/v4.13.2/source/kernel/fork.c#L630) which should solve:
* the overcommit issues
* a problem with fork + exec* latency for processes with large memory footprint (essentially replacement of O(n) with O(1) as we only need to initialize a fresh mm_struct on exec* http://elixir.free-electrons.com/linux/v4.13.2/source/fs/exec.c#L1759 without traversing the old one)

Caveats:
* this change is implemented only for amd64 as of now - other architectures still use the old implementation https://github.com/golang/go/blob/release-branch.go1.9/src/syscall/exec_linux.go#L154-L160
* with CLONE_VFORK the parent OS thread (not the whole process) stays blocked until a child process executes and gets a new mm_struct (~ its own fresh virtual address space). Given that they used a "raw syscall", they do not trigger the go runtime logic for blocking system calls. Due to CLONE_VFORK this region of code will be blocked in the parent thread https://github.com/golang/go/blob/release-branch.go1.9/src/syscall/exec_linux.go#L154-L409. Normal non-raw Syscall invocations call runtime.entersyscall https://github.com/golang/go/blob/release-branch.go1.9/src/syscall/asm_linux_amd64.s#L12-L18 and set https://github.com/golang/go/blob/release-branch.go1.9/src/runtime/proc.go#L2509 _Psyscall status on a processor (P) in Go runtime. I wonder how this affects programs in general given that using a raw syscall is the same as saying that there is no blocking syscall here as far as Go runtime is concerned and the parent thread is blocked for some time.

https://github.com/golang/go/issues/5717#issuecomment-66081134
https://github.com/golang/go/issues/5717#issuecomment-66081132

I think there may be some edge cases when the kernel code is doing clone syscall's code and receives, for example, a ton of network-related interrupts to process (which leads to nested interrupt handler execution depending on the number of cores) but this is not something we can control - we'll get a little more delay in this case when a parent OS thread is blocked.

Either way, we just need to flip the switch to 1.9 and mark it as fixed for amd64 at least. No code changes are needed to start using this.

The "forker" process idea is nice but requires us to maintain a separate entity and RPC for process execution which is an additional burden compared to the above.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

move to go 1.9 might help here. PR against develop (heading into 2.3-beta2): https://github.com/juju/juju/pull/7903

Changed in juju:
status: Triaged → In Progress
assignee: nobody → Nicholas Skaggs (nskaggs)
Revision history for this message
Nicholas Skaggs (nskaggs) wrote :

Packaging for the client is going to need work to ensure it's built with go-1.9 as well.

Changed in juju:
status: In Progress → Fix Committed
milestone: none → 2.3-beta2
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.