Comment 5 for bug 1893650

Revision history for this message
rew (r-e-wolff) wrote :

(earlier this week:) To debug this further I started chromium under "strace -f -o /tmp/..." : Must be run as root. Hmm.

Today I tried something different. I used strace on ALL 33 chrome processes. Many did something every second or so, just one was in a tight loop. This is also the process that continues to use about 30% of one core.

I used "task manager" from Chrome to try to figure out what chrome would call this process itself. I expected to find "browser", but the top CPU-users according to chrome were using only about 2% of CPU time, so it is not listed on the "task manager" list. :-/

The process that is using CPU time, turns out to be the same as the one that is causing the network traffic: Just stopping that one process causes the network traffic to stop.

What I don't understand is how the calls:

poll([{fd=11, events=POLLIN}, {fd=12, events=POLLIN}, {fd=30, events=POLLIN}, {fd=31, events=POLLIN}, {fd=41, events=POLLIN}], 5, 0) = 1 ([{fd=12, revents=POLLIN}])
recvmsg(30, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
read(12, "!", 2) = 1
write(28, "\0", 1) = 1
recvmsg(30, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(31, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(31, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(31, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=11, events=POLLIN}, {fd=12, events=POLLIN}, {fd=30, events=POLLIN}, {fd=31, events=POLLIN}, {fd=41, events=POLLIN}], 5, 0) = 0 (Timeout)
recvmsg(30, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(30, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(31, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(31, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(31, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)

end up causing NFS traffic. It is about 50/50 that I see a timeout and "data available on fd 12"
When that data is available and read (one character) then often one character is written to fd 30 (more than 50% of the cases, but not always).

The three filedescriptors are a pipe and two sockets. I don't know how to find out where they go.

lr-x------ 1 wolff wolff 64 Feb 20 13:40 12 -> 'pipe:[42472991]'
lrwx------ 1 wolff wolff 64 Feb 20 13:40 30 -> 'socket:[42471337]'
lrwx------ 1 wolff wolff 64 Feb 20 13:40 31 -> 'socket:[42470374]'

I see 4637 poll calls each 10 seconds, 2270 read (12...) calls (49%) and 1537 (33%) writes to fd 30. With about 23000 total EAGAIN calls to recvmsg. The total of 23000 corresponds to the number of network packets in a period of 10 seconds...

So my conclusion is that

recvmsg(30, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(30, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(31, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(31, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(31, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)

is causing NFS(?!) network traffic.

OK. Found it.... That process has many tasks and pipe 12/13 is used for inter-task communciation. Some of the other tasks are spinning on statting and/or opening some files:

/home/wolff/snap/chromium/common/chromium/Default/databases/Databases.db-wal
/home/wolff/snap/chromium/common/chromium/Default/databases/Databases.db-journal
/home/wolff/snap/chromium/common/chromium/Default/File System/primary.origin

These files don't exist. (such a "does not exist" is not cached by NFS, by design).