There is at least one fundamental bug in start.c's signal_handler, as should
be fixed by the below. However, this alone did not fix it for me, so more is wrong. There is a minimal testcase at http://people.canonical.com/~serge/signalfd.c which originally reproduced this bug, then was fixed by using the waitpid loop as below.
The monitor process adds a signalfd to the set of fds it watches
with epoll. It calls signal_handler() when a signal is found in
the sigfd. That returns 1 if the container init was found to
be the exiting process.
The flaw in reasoning (pointed out by Andy Whitcroft - thanks!)
here is that if two children have exited, we assume we will get
two sigchilds - in fact we may only get one. So when we get one,
we need to reap all the children we have and check if any is the
container init.
ret = read(fd, &siginfo, sizeof(siginfo));
@@ -188,16 +188,16 @@ static int signal_handler(int fd, void *data,
return 0;
}
- /* more robustness, protect ourself from a SIGCHLD sent
- * by a process different from the container init
+ /*
+ * wait for any and all children which are ours which are reapable,
+ * since if >1 children have exited, we'll only get one sigchld.
*/
- if (siginfo.ssi_pid != *pid) {
- WARN("invalid pid for SIGCHLD");
- return 0;
- }
+ while ((ret = waitpid(-1, &status, WNOHANG)) > 0)
+ if (ret == *pid)
+ retval = 1; // we reaped the container init
There is at least one fundamental bug in start.c's signal_handler, as should people. canonical. com/~serge/ signalfd. c which originally reproduced this bug, then was fixed by using the waitpid loop as below.
be fixed by the below. However, this alone did not fix it for me, so more is wrong. There is a minimal testcase at http://
From 7333b77759794f5 420ea6898494073 a28cac445f Mon Sep 17 00:00:00 2001 signal_ handler: fix wrong assumption about
From: Serge Hallyn <email address hidden>
Date: Wed, 3 Jul 2013 14:13:08 -0500
Subject: [PATCH 1/1] start.c:
sigchld
The monitor process adds a signalfd to the set of fds it watches
with epoll. It calls signal_handler() when a signal is found in
the sigfd. That returns 1 if the container init was found to
be the exiting process.
The flaw in reasoning (pointed out by Andy Whitcroft - thanks!)
here is that if two children have exited, we assume we will get
two sigchilds - in fact we may only get one. So when we get one,
we need to reap all the children we have and check if any is the
container init.
Signed-off-by: Serge Hallyn <email address hidden>
---
src/lxc/start.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/src/lxc/start.c b/src/lxc/start.c
index 8c8af9c..3fa50ad 100644
--- a/src/lxc/start.c
+++ b/src/lxc/start.c
@@ -162,7 +162,7 @@ static int signal_handler(int fd, void *data,
struct lxc_epoll_descr *descr)
{
struct signalfd_siginfo siginfo;
- int ret;
+ int ret, status, retval = 0;
pid_t *pid = data;
ret = read(fd, &siginfo, sizeof(siginfo));
@@ -188,16 +188,16 @@ static int signal_handler(int fd, void *data,
return 0;
}
- /* more robustness, protect ourself from a SIGCHLD sent
- * by a process different from the container init
+ /*
+ * wait for any and all children which are ours which are reapable,
+ * since if >1 children have exited, we'll only get one sigchld.
*/
- if (siginfo.ssi_pid != *pid) {
- WARN("invalid pid for SIGCHLD");
- return 0;
- }
+ while ((ret = waitpid(-1, &status, WNOHANG)) > 0)
+ if (ret == *pid)
+ retval = 1; // we reaped the container init
DEBUG("container init process exited");
- return 1;
+ return retval;
}
int lxc_set_state(const char *name, struct lxc_handler *handler, lxc_state_t state)
--
1.8.1.2