Here is an analysis of the kernel core dump captured for this issue : crash> sys KERNEL: vmlinux-3.2.0-38-generic DUMPFILE: VmCore CPUS: 24 DATE: Wed Sep 18 22:34:35 2013 UPTIME: 1 days, 11:33:14 LOAD AVERAGE: 2.04, 2.09, 2.16 TASKS: 6656 NODENAME: ddb-mongo41 RELEASE: 3.2.0-38-generic VERSION: #61-Ubuntu SMP Tue Feb 19 12:18:21 UTC 2013 MACHINE: x86_64 (2533 Mhz) MEMORY: 47.9 GB PANIC: "[127932.907100] kernel BUG at /build/buildd/linux-3.2.0/mm/mempolicy.c:1638!" PID: 21767 COMMAND: "mongod" TASK: ffff8800874bdc00 [THREAD_INFO: ffff880544d8a000] CPU: 12 STATE: EXIT_DEAD (PANIC) Analysis ======== This is the backtrace of the panic task : crash> bt PID: 21767 TASK: ffff8800874bdc00 CPU: 12 COMMAND: "mongod" #0 [ffff880657cc3820] machine_kexec at ffffffff810393da #1 [ffff880657cc3890] crash_kexec at ffffffff810b53f8 #2 [ffff880657cc3960] oops_end at ffffffff8165e528 #3 [ffff880657cc3990] die at ffffffff810178d8 #4 [ffff880657cc39c0] do_trap at ffffffff8165de94 #5 [ffff880657cc3a20] do_invalid_op at ffffffff81014f65 #6 [ffff880657cc3ac0] invalid_op at ffffffff8166796b [exception RIP: slab_node+0x2e] RIP: ffffffff8115a66e RSP: ffff880657cc3b70 RFLAGS: 00010097 RAX: 0000000000000000 RBX: ffff880657802c00 RCX: 00000000e62f6aef RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffff880abf18a288 RBP: ffff880657cc3b80 R8: 0000000000000001 R9: 0000000100100010 R10: 0000000000000000 R11: 0000000000000022 R12: 0000000000000002 R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000020 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff880657cc3b88] get_any_partial at ffffffff816496a0 #8 [ffff880657cc3c18] __slab_alloc at ffffffff816498cf #9 [ffff880657cc3cc8] __kmalloc_node_track_caller at ffffffff81166f07 #10 [ffff880657cc3d38] __alloc_skb at ffffffff815364c8 #11 [ffff880657cc3d88] __netdev_alloc_skb at ffffffff81536b14 #12 [ffff880657cc3da8] enic_rq_alloc_buf at ffffffffa005484c [enic] #13 [ffff880657cc3e08] enic_poll_msix at ffffffffa00559ff [enic] #14 [ffff880657cc3e58] net_rx_action at ffffffff81545274 #15 [ffff880657cc3ec8] __do_softirq at ffffffff8106f5f8 #16 [ffff880657cc3f38] call_softirq at ffffffff81667bec #17 [ffff880657cc3f50] do_softirq at ffffffff81016305 #18 [ffff880657cc3f70] irq_exit at ffffffff8106f9de #19 [ffff880657cc3f80] do_IRQ at ffffffff816684a3 --- --- #20 [ffff880544d8bd48] ret_from_intr at ffffffff8165d82e [exception RIP: __slab_free+0x2e1] RIP: ffffffff81649467 RSP: ffff880544d8bdf8 RFLAGS: 00000202 RAX: 0000000000000001 RBX: ffffffffff0a0210 RCX: 0000000180aa00a9 RDX: 0000000180aa00aa RSI: ffffea002afc6201 RDI: ffff880657806200 RBP: ffff880544d8bea8 R8: 0000000000000001 R9: 0000000000000000 R10: ffff8800874be020 R11: ffff8800874be030 R12: ffff880544d8be33 R13: 000000000000000d R14: ffffffff81191895 R15: ffff880544d8bdb8 ORIG_RAX: ffffffffffffff54 CS: 0010 SS: 0018 #21 [ffff880544d8be30] __change_pid at ffffffff81087dca #22 [ffff880544d8beb0] kmem_cache_free at ffffffff81163634 #23 [ffff880544d8bef0] __mpol_put at ffffffff81159937 #24 [ffff880544d8bf00] do_exit at ffffffff8106c75c #25 [ffff880544d8bf70] sys_exit at ffffffff8106caf7 #26 [ffff880544d8bf80] system_call_fastpath at ffffffff81665982 RIP: 00007f6f476b8f37 RSP: 00007f68cbcfdbb0 RFLAGS: 00000202 RAX: 000000000000003c RBX: ffffffff81665982 RCX: ffffffffffffffff RDX: 00007f68cbcfe700 RSI: 00007f6f478c9250 RDI: 0000000000000000 RBP: 0000000000000000 R8: 00007f68cbcfe700 R9: 00007f68e82a0370 R10: 000000007fffffff R11: 0000000000000246 R12: ffffffff8106caf7 R13: ffff880544d8bf78 R14: 0000000000000003 R15: 00007f68f8744a10 ORIG_RAX: 000000000000003c CS: 0033 SS: 002b And the register information : [127932.906804] ------------[ cut here ]------------ [127932.907100] [127932.907502] invalid opcode: 0000 [#1] SMP [127932.907774] CPU 12 [127932.907901] Modules linked in: vesafb joydev iomemory_vsl(P) i7core_edac ioatdma edac_core dca tpm_tis acpi_power_meter mac_hid lp parport usbhid hid usb_storage mptsas enic mptscsih mptbase scsi_transport_sas [127932.909246] [127932.909340] Pid: 21767, comm: mongod Tainted: P O 3.2.0-38-generic #61-Ubuntu Cisco Systems Inc UCSC-BSE-SFF-C200/UCSC-BSE-SFF-C 200 [127932.910150] RIP: 0010:[] [] slab_node+0x2e/0xa0 [127932.910614] RSP: 0018:ffff880657cc3b70 EFLAGS: 00010097 [127932.910921] RAX: 0000000000000000 RBX: ffff880657802c00 RCX: 00000000e62f6aef [127932.911344] RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffff880abf18a288 [127932.911763] RBP: ffff880657cc3b80 R08: 0000000000000001 R09: 0000000100100010 [127932.912184] R10: 0000000000000000 R11: 0000000000000022 R12: 0000000000000002 [127932.912625] R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000020 [127932.913126] FS: 0000000000000000(0000) GS:ffff880657cc0000(0000) knlGS:0000000000000000 [127932.913621] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [127932.913986] CR2: 00007fb8e6d01218 CR3: 00000006468d2000 CR4: 00000000000006e0 [127932.936937] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [127932.960466] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [127932.984295] Process mongod (pid: 21767, threadinfo ffff880544d8a000, task ffff8800874bdc00) [127933.008400] Stack: [127933.019992] ffffffff8188ee40 ffff880657cd6720 ffff880657cc3c10 ffffffff816496a0 [127933.043734] ffff880657801e80 ffff8800874bdc00 ffff8800874bdc00 0000000077f9e84e [127933.068114] ffff880657cd6700 0000000000000008 ffff880657cc3bf0 ffff8800874bdc00 [127933.092225] Call Trace: [127933.103955] The current task is actually in the process of exiting : sys_exit -> do_exit The execution of __mpol_put() is in the process of destructing the mempolicy structure of the current task : #ifdef CONFIG_NUMA task_lock(tsk); mpol_put(tsk->mempolicy); tsk->mempolicy = NULL; task_unlock(tsk); #endif crash> task | grep mempolicy mempolicy = 0xffff880abf18a288, crash> * mempolicy 0xffff880abf18a288 struct mempolicy { refcnt = { counter = 0x0 }, mode = 0x0, flags = 0x0, v = { preferred_node = 0x3, nodes = { bits = {0x3} } }, w = { cpuset_mems_allowed = { bits = {0x3} }, user_nodemask = { bits = {0x3} } } } /* Slow path of a mpol destructor. */ void __mpol_put(struct mempolicy *p) { if (!atomic_dec_and_test(&p->refcnt)) return; kmem_cache_free(policy_cache, p); } The p->refcnt has already been decreased and kmem_cache_free is being performed to release the mempolicy element : The IRQ is triggered while __slab_free() is being executed : #20 [ffff880544d8bd48] ret_from_intr at ffffffff8165d82e [exception RIP: __slab_free+0x2e1] The slab where the eement is being freed is : RDI: ffff880657806200 crash> kmem -s 0xffff880abf18a288 CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE ffff880657806200 numa_policy 24 41373 43180 254 4k SLAB MEMORY NODE TOTAL ALLOCATED FREE ffffea002afc6280 ffff880abf18a000 1 169 169 0 FREE / [ALLOCATED] [ffff880abf18a288] which is indeed the slab where the mempolicy is __slab_free+0x2e1 is at the very end of __slab_free() : Looking at the RIP where __slab_free should return, we can see that __slab_free was about to return when the IRQ came in : crash> dis -l __slab_free .... 0xffffffff81649457 <__slab_free+0x2d1>: call 0xffffffff8165d590 <_raw_spin_unlock_irqrestore> /build/buildd/linux-3.2.0/mm/slub.c: 2517 0xffffffff8164945c <__slab_free+0x2d6>: mov rsi,r12 0xffffffff8164945f <__slab_free+0x2d9>: mov rdi,r13 0xffffffff81649462 <__slab_free+0x2dc>: call 0xffffffff81162f90 /build/buildd/linux-3.2.0/mm/slub.c: 2518 0xffffffff81649467 <__slab_free+0x2e1>: add rsp,0x88 spin_unlock_irqrestore(&n->list_lock, flags); stat(s, FREE_SLAB); discard_slab(s, page); } So most probably the mempolicy page has now been freed. The IRQ processing now happens and goes through many functions until it needs to allocate a new slab element for an skb (network buffer) : [exception RIP: slab_node+0x2e] RIP: ffffffff8115a66e RSP: ffff880657cc3b70 RFLAGS: 00010097 RAX: 0000000000000000 RBX: ffff880657802c00 RCX: 00000000e62f6aef RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffff880abf18a288 RBP: ffff880657cc3b80 R8: 0000000000000001 R9: 0000000100100010 R10: 0000000000000000 R11: 0000000000000022 R12: 0000000000000002 R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000020 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff880657cc3b88] get_any_partial at ffffffff816496a0 #8 [ffff880657cc3c18] __slab_alloc at ffffffff816498cf #9 [ffff880657cc3cc8] __kmalloc_node_track_caller at ffffffff81166f07 #10 [ffff880657cc3d38] __alloc_skb at ffffffff815364c8 __slab_alloc (from slab_alloc) gets called with a kmem_cache structure retrieved from get_slab() crash> bt -F #7 [ffff880657cc3b88] get_any_partial at ffffffff816496a0 ffff880657cc3b90: [kmem_cache_node] [task_struct] ffff880657cc3ba0: [task_struct] 0000000077f9e84e ffff880657cc3bb0: ffff880657cd6700 0000000000000008 ffff880657cc3bc0: ffff880657cc3bf0 [task_struct] ffff880657cc3bd0: [kmalloc-2048] [kmalloc-8192] ffff880657cc3be0: 0000000000000008 ffff880657cd6700 ffff880657cc3bf0: 0000000000000000 ffff880657802c00 ffff880657cc3c00: 00000000ffffffff 0000000000000282 ffff880657cc3c10: ffff880657cc3cc0 __slab_alloc+0x14b #8 [ffff880657cc3c18] __slab_alloc at ffffffff816498cf crash> bt -f #7 [ffff880657cc3b88] get_any_partial at ffffffff816496a0 ffff880657cc3b90: ffff880657801e80 ffff8800874bdc00 ffff880657cc3ba0: ffff8800874bdc00 0000000077f9e84e ffff880657cc3bb0: ffff880657cd6700 0000000000000008 ffff880657cc3bc0: ffff880657cc3bf0 ffff8800874bdc00 ffff880657cc3bd0: ffff880877f9e84e ffff880c462e0000 ffff880657cc3be0: 0000000000000008 ffff880657cd6700 ffff880657cc3bf0: 0000000000000000 ffff880657802c00 ffff880657cc3c00: 00000000ffffffff 0000000000000282 ffff880657cc3c10: ffff880657cc3cc0 ffffffff816498cf ffff880657cc3b90: [kmem_cache_node] = ffff880657801e80 But while executing get_any_partial which is only NUMA specific #IFDEF code, it does the following : zonelist = node_zonelist(slab_node(current->mempolicy), flags); current being the current task pointer, it tries does the slab_node() on the mempolicy that was recently freed prior to the IRQ hitting after discard_slab This mempolicy has now been freed, which most probably explains why the value of mode is 0x0 : crash> * mempolicy.mode ffff880abf18a288 mode = 0x0 The behavior for slab_node when mempolicy.mode = 0x0 MPOL_DEFAULT is : default: BUG(); which triggers the panic.