Lenny’s Xen Kernel 2.6.26 Causes DomU Freezes

With the release of Debian 5.0 “Lenny” as stable, I have upgraded my servers and installed its Xen 3.1.2 and Kernel 2.6.26 Xen packages for my DomU’s. After that, one of my DomU’s kept freezing: 100% CPU, no responses on the console whatsoever. I found that various people report similar problems, but no hints towards a solution.
If you experience similar freezes, downgrade your DomU to the 2.6.18 kernel from Etch: Add the etch sources to your APT repository in both Dom0 and your DomU, install linux-image-2.6.18-6-xen-686 (or -amd64 for 64bit systems) and modify your Xen configuration to use the old kernel. You can keep the rest upgraded to Lenny.

9 Responses

  1. jc says:

    having same problem, thanks for reporting this
    Jc

  2. Teleyinex says:

    Thanks for your comments, I was trying today this installation and I had the problems that you have described. Thanks for the info 😉

  3. Any known fix yet? Stock linux 2.6.30 kernels freeze as well…

  4. Aaron says:

    Any update on this problem?
    I don’t know a fix, but I have a workaround – or at least in my case it works:
    Limit the Dom0 to one cpu (“(dom0-cpus 1)”) and also every DomU (“vcpus = 1”). Until the kernel update yesterday I had an uptime of approx. 50 days and no problems.
    But unfortunately there’s another bug to consider if you want to use this workaround:
    http://old.nabble.com/Domain-status-after-shutdown-command:—-s—td15565767.html
    I just wanted to write this because the Etch kernel won’t get security updates much longer.

  5. Moritz says:

    Thank you for your information. Unfortunately, I need multicores inside my DomUs. Did you try limiting the Dom0 only? Is that even possible?
    I guess you have tried the latest Xen+Lenny kernels? I cannot believe they still haven’t fixed this.
    The second issue – problems with DomU shutdown – on the other hand isn’t as bad, I never shut down DomU’s. 😉

  6. Aaron says:

    Yes, at first I tried limiting the Dom0 only – but that didn’t solve it for me.
    I got the idea from this bug report:
    http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=524571
    I have the latest (official) packages for Debian Lenny installed – but I use the workaround for about 50 days now and don’t have a problem with limiting each DomU to one CPU, so I didn’t test if the bug is still there.
    And I agree, the second issue isn’t really a problem 😉

  7. Pasi Kärkkäinen says:

    Many people are seeing this bug, me included.
    Do you guys have a reliable/fast way to reproduce this?
    What should be done is to get a stack trace for each guest vcpu to see what’s happening (going wrong), so it can be debugged and fixed.
    In dom0, like this:
    /usr/lib/xen/bin/xenctx -s System.map-2.6.26-2-xen-686
    Repeat that command for each guest vcpu.
    The first vcpu is number 0, next is number 1 etc.
    If you have a 64bit dom0, xenctx might be under /usr/lib64/.
    The System.map file should be the actual correct System.map for the guest kernel.

  8. Code78 says:

    I have same problem, I’ve updated Dom0
    kernel to 2.6.26-2-xen-amd64 and guests have 2.6.26-2-686-bigmem kernel.
    Guests (mainly webserver) crashes randomly 3-8 times / day, but usually when there’s a bit
    more load.
    Only option is just destroy running webserver and restart it (xm destroy & create).
    If I take a console to a crashed guest there’s CPU Soft lock, once I luckily got full dump:
    4264.683334] BUG: soft lockup – CPU#3 stuck for 83s! [swapper:0]
    [ 4264.683334] Modules linked in: ipv6 loop evdev xen_netfront pcspkr ext3 jbd mbcache xen_blkfront thermal_sys
    [ 4264.683334]
    [ 4264.683334] Pid: 0, comm: swapper Not tainted (2.6.26-2-686-bigmem #1)
    [ 4264.683334] EIP: 0061:[] EFLAGS: 00000246 CPU: 3
    [ 4264.683334] EIP is at _stext+0x3a7/0x1000
    [ 4264.683334] EAX: 00000000 EBX: 00000001 ECX: 00000000 EDX: 00175d28
    [ 4264.683334] ESI: 00000003 EDI: 00000000 EBP: 00000000 ESP: ed049fa0
    [ 4264.683334] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
    [ 4264.683334] CR0: 8005003b CR2: b771a000 CR3: 29186000 CR4: 00000660
    [ 4264.683334] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    [ 4264.683334] DR6: ffff0ff0 DR7: 00000400
    [ 4264.683334] [] xen_safe_halt+0xd/0x17
    [ 4264.683334] [] xen_idle+0x0/0x3a
    [ 4264.683334] [] xen_idle+0x2b/0x3a
    [ 4264.683334] [] cpu_idle+0xb0/0xd0
    [ 4264.683334] =======================
    [ 4264.683337] BUG: soft lockup – CPU#4 stuck for 83s! [swapper:0]
    [ 4264.683337] Modules linked in: ipv6 loop evdev xen_netfront pcspkr ext3 jbd mbcache xen_blkfront thermal_sys
    [ 4264.683337]
    [ 4264.683337] Pid: 0, comm: swapper Not tainted (2.6.26-2-686-bigmem #1)
    [ 4264.683337] EIP: 0061:[] EFLAGS: 00000246 CPU: 4
    [ 4264.683337] EIP is at _stext+0x227/0x1000
    [ 4264.683337] EAX: 00030002 EBX: 00000000 ECX: 00000000 EDX: 00000201
    [ 4264.683337] ESI: 00000004 EDI: 00000000 EBP: 00000000 ESP: ed04bf88
    [ 4264.683337] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
    [ 4264.683337] CR0: 8005003b CR2: b6c9d000 CR3: 259e3000 CR4: 00000660
    [ 4264.683337] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    [ 4264.683337] DR6: ffff0ff0 DR7: 00000400
    [ 4264.683337] [] force_evtchn_callback+0xa/0xc
    [ 4264.683337] [] tick_nohz_restart_sched_tick+0x12d/0x134
    [ 4264.683337] [] xen_idle+0x0/0x3a
    [ 4264.683337] [] cpu_idle+0xc6/0xd0
    [ 4264.683337] =======================
    [ 4595.526344] BUG: soft lockup – CPU#2 stuck for 71s! [swapper:0]
    [ 4595.526344] Modules linked in: ipv6 loop evdev xen_netfront pcspkr ext3 jbd mbcache xen_blkfront thermal_sys
    [ 4595.526344]
    [ 4595.526344] Pid: 0, comm: swapper Not tainted (2.6.26-2-686-bigmem #1)
    [ 4595.526344] EIP: 0061:[] EFLAGS: 00000246 CPU: 2
    [ 4595.526344] EIP is at _stext+0x3a7/0x1000
    [ 4595.526344] EAX: 00000000 EBX: 00000001 ECX: 00000000 EDX: 00175d28
    [ 4595.526344] ESI: 00000002 EDI: 00000000 EBP: 00000000 ESP: ed047fa0
    [ 4595.526344] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
    [ 4595.526344] CR0: 8005003b CR2: b6c9d000 CR3: 261a9000 CR4: 00000660
    [ 4595.526344] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    [ 4595.526344] DR6: ffff0ff0 DR7: 00000400
    [ 4595.526344] [] xen_safe_halt+0xd/0x17
    [ 4595.526344] [] xen_idle+0x0/0x3a
    [ 4595.526344] [] xen_idle+0x2b/0x3a
    [ 4595.526344] [] cpu_idle+0xb0/0xd0
    [ 4595.526344] =======================
    [ 4595.526469] BUG: soft lockup – CPU#3 stuck for 71s! [rsyslogd:2636]
    [ 4595.526469] Modules linked in: ipv6 loop evdev xen_netfront pcspkr ext3 jbd mbcache xen_blkfront thermal_sys
    [ 4595.526469]
    [ 4595.526469] Pid: 2636, comm: rsyslogd Not tainted (2.6.26-2-686-bigmem #1)
    [ 4595.526469] EIP: 0061:[] EFLAGS: 00000246 CPU: 3
    [ 4595.526469] EIP is at do_get_write_access+0x5c/0x331 [jbd]
    [ 4595.526469] EAX: 00000000 EBX: ecdb0b00 ECX: 00000000 EDX: e444efa8
    [ 4595.526469] ESI: ec8042c0 EDI: ecdb0b00 EBP: e444efa8 ESP: ea115ca0
    [ 4595.526469] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
    [ 4595.526469] CR0: 8005003b CR2: b5eb8000 CR3: 2c041000 CR4: 00000660
    [ 4595.526469] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    [ 4595.526469] DR6: ffff0ff0 DR7: 00000400
    [ 4595.526469] [] ? __ext3_get_inode_loc+0xcf/0x26c [ext3]
    [ 4595.526469] [] ? journal_get_write_access+0x18/0x26 [jbd]
    [ 4595.526469] [] ? __ext3_journal_get_write_access+0x13/0x32 [ext3]
    [ 4595.526469] [] ? ext3_reserve_inode_write+0x2d/0x5d [ext3]
    [ 4595.526469] [] ? ext3_mark_inode_dirty+0x11/0x27 [ext3]
    [ 4595.526469] [] ? ext3_dirty_inode+0x50/0x63 [ext3]
    [ 4595.526469] [] ? __mark_inode_dirty+0x21/0x12a
    [ 4595.526469] [] ? ext3_generic_write_end+0x5d/0x64 [ext3]
    [ 4595.526469] [] ? ext3_ordered_write_end+0xb2/0x103 [ext3]
    [ 4595.526469] [] ? generic_file_buffered_write+0x13c/0x553
    [ 4595.526469] [] ? cap_inode_need_killpriv+0x25/0x35
    [ 4595.526469] [] ? security_inode_need_killpriv+0xc/0xd
    [ 4595.526469] [] ? remove_suid+0x15/0x44
    [ 4595.526469] [] ? __generic_file_aio_write_nolock+0x468/0x4cb
    [ 4595.526469] [] ? generic_file_aio_write+0x52/0xa9
    [ 4595.526469] [] ? ext3_file_write+0x19/0x83 [ext3]
    [ 4595.526469] [] ? do_sync_write+0xbf/0x100
    [ 4595.526469] [] ? get_runstate_snapshot+0x3d/0x4b
    [ 4595.526469] [] ? autoremove_wake_function+0x0/0x2d
    [ 4595.526469] [] ? _spin_unlock_irqrestore+0xd/0x10
    [ 4595.526469] [] ? hrtick_set+0x7a/0xd8
    [ 4595.526469] [] ? schedule+0x63b/0x66d
    [ 4595.526469] [] ? security_file_permission+0xc/0xd
    [ 4595.526469] [] ? do_sync_write+0x0/0x100
    [ 4595.526469] [] ? vfs_write+0x83/0x120
    [ 4595.526469] [] ? sys_write+0x3c/0x63
    [ 4595.526469] [] ? syscall_call+0x7/0xb
    [ 4595.526469] =======================
    [ 4595.526740] BUG: soft lockup – CPU#4 stuck for 71s! [ksoftirqd/4:16]
    [ 4595.526752] Modules linked in: ipv6 loop evdev xen_netfront pcspkr ext3 jbd mbcache xen_blkfront thermal_sys
    [ 4595.526752]
    [ 4595.526752] Pid: 16, comm: ksoftirqd/4 Not tainted (2.6.26-2-686-bigmem #1)
    [ 4595.526752] EIP: 0061:[] EFLAGS: 00000246 CPU: 4
    [ 4595.526752] EIP is at _stext+0x227/0x1000
    [ 4595.526752] EAX: 00030002 EBX: 00000000 ECX: 00000000 EDX: ed03e460
    [ 4595.526752] ESI: ec5bbc80 EDI: ed03e460 EBP: 00000000 ESP: ed093f60
    [ 4595.526752] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
    [ 4595.526752] CR0: 8005003b CR2: b6c9d000 CR3: 2638d000 CR4: 00000660
    [ 4595.526752] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    [ 4595.526752] DR6: ffff0ff0 DR7: 00000400
    [ 4595.526752] [] force_evtchn_callback+0xa/0xc
    [ 4595.526752] [] finish_task_switch+0x25/0x99
    [ 4595.526752] [] schedule+0x60a/0x66d
    [ 4595.526752] [] __rcu_process_callbacks+0xd7/0x154
    [ 4595.526752] [] __do_softirq+0x8b/0xd3
    [ 4595.526752] [] ksoftirqd+0x0/0xa6
    [ 4595.526752] [] ksoftirqd+0x23/0xa6
    [ 4595.526752] [] kthread+0x38/0x5d
    [ 4595.526752] [] kthread+0x0/0x5d
    [ 4595.526752] [] kernel_thread_helper+0x7/0x10
    [ 4595.526752] =======================
    [ 4763.846926] BUG: soft lockup – CPU#1 stuck for 113s! [swapper:0]
    [ 4763.846929] Modules linked in: ipv6 loop evdev xen_netfront pcspkr ext3 jbd mbcache xen_blkfront thermal_sys
    [ 4763.846929]
    [ 4763.846929] Pid: 0, comm: swapper Not tainted (2.6.26-2-686-bigmem #1)
    [ 4763.846929] EIP: 0061:[] EFLAGS: 00000246 CPU: 1
    [ 4763.846929] EIP is at _stext+0x3a7/0x1000
    [ 4763.846929] EAX: 00000000 EBX: 00000001 ECX: 00000000 EDX: 00175d28
    [ 4763.846929] ESI: 00000001 EDI: 00000000 EBP: 00000000 ESP: ed045fa0
    [ 4763.846929] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
    [ 4763.846929] CR0: 8005003b CR2: b6c9d000 CR3: 266a0000 CR4: 00000660
    [ 4763.846929] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    [ 4763.846929] DR6: ffff0ff0 DR7: 00000400
    [ 4763.846929] [] xen_safe_halt+0xd/0x17
    [ 4763.846929] [] xen_idle+0x0/0x3a
    [ 4763.846929] [] xen_idle+0x2b/0x3a
    [ 4763.846929] [] cpu_idle+0xb0/0xd0
    [ 4763.846929] =======================
    [ 4763.851004] BUG: soft lockup – CPU#3 stuck for 113s! [rsyslogd:2636]
    [ 4763.851004] Modules linked in: ipv6 loop evdev xen_netfront pcspkr ext3 jbd mbcache xen_blkfront thermal_sys
    [ 4763.851004]
    [ 4763.851004] Pid: 2636, comm: rsyslogd Not tainted (2.6.26-2-686-bigmem #1)
    [ 4763.851004] EIP: 0073:[<080675a2>] EFLAGS: 00000206 CPU: 3
    [ 4763.851004] EIP is at 0x80675a2
    [ 4763.851004] EAX: 09f0d548 EBX: 09f0d548 ECX: 09ef72d8 EDX: b753d03a
    [ 4763.851004] ESI: 00000000 EDI: b753d03a EBP: b753d008 ESP: b753cfa0
    [ 4763.851004] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
    [ 4763.851004] CR0: 8005003b CR2: b6c9d000 CR3: 2c041000 CR4: 00000660
    [ 4763.851004] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    [ 4763.851004] DR6: ffff0ff0 DR7: 00000400
    [ 4763.851004] =======================
    [ 4763.851005] BUG: soft lockup – CPU#4 stuck for 113s! [swapper:0]
    [ 4763.851005] Modules linked in: ipv6 loop evdev xen_netfront pcspkr ext3 jbd mbcache xen_blkfront thermal_sys
    [ 4763.851005]
    [ 4763.851005] Pid: 0, comm: swapper Not tainted (2.6.26-2-686-bigmem #1)
    [ 4763.851005] EIP: 0061:[] EFLAGS: 00000206 CPU: 4
    [ 4763.851005] EIP is at xen_irq_disable+0x6/0xb
    [ 4763.851005] EAX: f5612100 EBX: c326e020 ECX: 00000200 EDX: c326e020
    [ 4763.851005] ESI: c326e020 EDI: e9c15740 EBP: ed03e460 ESP: ed04bf3c
    [ 4763.851005] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
    [ 4763.851005] CR0: 8005003b CR2: b6c9d000 CR3: 29c90000 CR4: 00000660
    [ 4763.851005] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    [ 4763.851005] DR6: ffff0ff0 DR7: 00000400
    [ 4763.851005] [] ? _spin_lock_irqsave+0x16/0x2f
    [ 4763.851005] [] ? hrtick_set+0x40/0xd8
    [ 4763.851005] [] ? schedule+0x63b/0x66d
    [ 4763.851005] [] ? ktime_get+0xd/0x21
    [ 4763.851005] [] ? tick_nohz_stop_idle+0x19/0x45
    [ 4763.851005] [] ? tick_nohz_restart_sched_tick+0x12d/0x134
    [ 4763.851006] [] ? xen_idle+0x0/0x3a
    [ 4763.851006] [] ? cpu_idle+0xcb/0xd0
    [ 4763.851006] =======================
    [ 4996.739489] BUG: soft lockup – CPU#1 stuck for 63s! [sendmail:2640]
    [ 4996.739506] Modules linked in: ipv6 loop evdev xen_netfront pcspkr ext3 jbd mbcache xen_blkfront thermal_sys
    [ 4996.739506]
    [ 4996.739506] Pid: 2640, comm: sendmail Not tainted (2.6.26-2-686-bigmem #1)
    [ 4996.739506] EIP: 0061:[] EFLAGS: 00000293 CPU: 1
    [ 4996.739506] EIP is at prio_tree_insert+0x150/0x1e9
    [ 4996.739506] EAX: 0000013a EBX: ec8b87c0 ECX: ec8b87c0 EDX: 00000139
    [ 4996.739506] ESI: ec8b881c EDI: ecdd8570 EBP: e95ea53c ESP: ea115e60
    [ 4996.739506] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
    [ 4996.739506] CR0: 8005003b CR2: b77b52a0 CR3: 2a136000 CR4: 00000660
    [ 4996.739506] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    [ 4996.739506] DR6: ffff0ff0 DR7: 00000400
    [ 4996.739506] [] ? vma_prio_tree_insert+0x17/0x2a
    [ 4996.739506] [] ? vma_adjust+0x1b8/0x3ab
    [ 4996.739506] [] ? kmem_cache_alloc+0x53/0x87
    [ 4996.739506] [] ? split_vma+0xc3/0xd3
    [ 4996.739506] [] ? do_munmap+0xb4/0x1ba
    [ 4996.739506] [] ? mmap_region+0x6f/0x392
    [ 4996.739506] [] ? arch_get_unmapped_area_topdown+0x0/0x120
    [ 4996.739506] [] ? do_mmap_pgoff+0x25d/0x2b0
    [ 4996.739506] [] ? sys_mmap_pgoff+0x9b/0xc2
    [ 4996.739506] [] ? syscall_call+0x7/0xb
    [ 4996.739506] =======================
    [ 4996.743606] BUG: soft lockup – CPU#2 stuck for 63s! [rsyslogd:2641]
    [ 4996.743606] Modules linked in: ipv6 loop evdev xen_netfront pcspkr ext3 jbd mbcache xen_blkfront thermal_sys
    [ 4996.743606]
    [ 4996.743606] Pid: 2641, comm: rsyslogd Not tainted (2.6.26-2-686-bigmem #1)
    [ 4996.743606] EIP: 0061:[] EFLAGS: 00000202 CPU: 2
    [ 4996.743606] EIP is at __brelse+0x4/0x25
    [ 4996.743606] EAX: ecdc46a0 EBX: ecdb0ac8 ECX: 00000000 EDX: 02e80000
    [ 4996.743606] ESI: ea139c64 EDI: c3255760 EBP: 00000008 ESP: ea139c34
    [ 4996.743606] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
    [ 4996.743606] CR0: 8005003b CR2: 0809f020 CR3: 2c041000 CR4: 00000660
    [ 4996.743606] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    [ 4996.743606] DR6: ffff0ff0 DR7: 00000400
    [ 4996.743606] [] ? __find_get_block+0x16e/0x180
    [ 4996.743606] [] ? __getblk+0x27/0x24e
    [ 4996.743606] [] ? do_get_write_access+0x2f8/0x331 [jbd]
    [ 4996.743606] [] ? __ext3_get_inode_loc+0xcf/0x26c [ext3]
    [ 4996.743606] [] ? ext3_reserve_inode_write+0x19/0x5d [ext3]
    [ 4996.743606] [] ? ext3_mark_inode_dirty+0x11/0x27 [ext3]
    [ 4996.743606] [] ? ext3_dirty_inode+0x50/0x63 [ext3]
    [ 4996.743606] [] ? __mark_inode_dirty+0x21/0x12a
    [ 4996.743606] [] ? ext3_generic_write_end+0x5d/0x64 [ext3]
    [ 4996.743606] [] ? ext3_ordered_write_end+0xb2/0x103 [ext3]
    [ 4996.743606] [] ? generic_file_buffered_write+0x13c/0x553
    [ 4996.743606] [] ? cap_inode_need_killpriv+0x25/0x35
    [ 4996.743606] [] ? security_inode_need_killpriv+0xc/0xd
    [ 4996.743606] [] ? remove_suid+0x15/0x44
    [ 4996.743606] [] ? __generic_file_aio_write_nolock+0x468/0x4cb
    [ 4996.743606] [] ? generic_file_aio_write+0x52/0xa9
    [ 4996.743606] [] ? ext3_file_write+0x19/0x83 [ext3]
    [ 4996.743606] [] ? do_sync_write+0xbf/0x100
    [ 4996.743606] [] ? get_runstate_snapshot+0x3d/0x4b
    [ 4996.743606] [] ? autoremove_wake_function+0x0/0x2d
    [ 4996.743606] [] ? _spin_unlock_irqrestore+0xd/0x10
    [ 4996.743606] [] ? hrtick_set+0x7a/0xd8
    [ 4996.743606] [] ? schedule+0x63b/0x66d
    [ 4996.743606] [] ? security_file_permission+0xc/0xd
    [ 4996.743606] [] ? do_sync_write+0x0/0x100
    [ 4996.743606] [] ? vfs_write+0x83/0x120
    [ 4996.743606] [] ? sys_write+0x3c/0x63
    [ 4996.743606] [] ? syscall_call+0x7/0xb
    [ 4996.743606] =======================
    [ 4996.747880] BUG: soft lockup – CPU#3 stuck for 64s! [apache2:2042]
    [ 4996.747880] Modules linked in: ipv6 loop evdev xen_netfront pcspkr ext3 jbd mbcache xen_blkfront thermal_sys
    [ 4996.747880]
    [ 4996.747880] Pid: 2042, comm: apache2 Not tainted (2.6.26-2-686-bigmem #1)
    [ 4996.747880] EIP: 0073:[] EFLAGS: 00000202 CPU: 3
    [ 4996.747880] EIP is at 0xb68abca3
    [ 4996.747880] EAX: 000005e7 EBX: b6b22dcc ECX: 000005e7 EDX: 00000ec4
    [ 4996.747880] ESI: 0000000a EDI: 0000000a EBP: bfb51108 ESP: bfb51090
    [ 4996.747880] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
    [ 4996.747880] CR0: 80050033 CR2: b6c9d000 CR3: 27dc7000 CR4: 00000660
    [ 4996.747880] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    [ 4996.747880] DR6: ffff0ff0 DR7: 00000400
    [ 4996.747880] =======================
    [ 4996.751947] BUG: soft lockup – CPU#4 stuck for 64s! [swapper:0]
    [ 4996.751947] Modules linked in: ipv6 loop evdev xen_netfront pcspkr ext3 jbd mbcache xen_blkfront thermal_sys
    [ 4996.751947]
    [ 4996.751947] Pid: 0, comm: swapper Not tainted (2.6.26-2-686-bigmem #1)
    [ 4996.751947] EIP: 0061:[] EFLAGS: 00000246 CPU: 4
    [ 4996.751947] EIP is at _stext+0x227/0x1000
    [ 4996.751947] EAX: 00030002 EBX: 00000000 ECX: 00000000 EDX: ed0824e0
    [ 4996.751947] ESI: ed14ee40 EDI: ed0824e0 EBP: 00000001 ESP: ed04bf3c
    [ 4996.751947] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
    [ 4996.751947] CR0: 8005003b CR2: b6c9d000 CR3: 28982000 CR4: 00000660
    [ 4996.751947] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    [ 4996.751947] DR6: ffff0ff0 DR7: 00000400
    [ 4996.751947] [] ? force_evtchn_callback+0xa/0xc
    [ 4996.751947] [] ? finish_task_switch+0x25/0x99
    [ 4996.751947] [] ? schedule+0x60a/0x66d
    [ 4996.751947] [] ? ktime_get+0xd/0x21
    [ 4996.751947] [] ? tick_nohz_stop_idle+0x19/0x45
    [ 4996.751947] [] ? tick_nohz_restart_sched_tick+0x12d/0x134
    [ 4996.751947] [] ? xen_idle+0x0/0x3a
    [ 4996.751947] [] ? cpu_idle+0xcb/0xd0
    [ 4996.751947] =======================
    This is kinda annoying on production environment, I needed to make a script which monitors
    state of DomU and destroys & creates it automatically when crash occurs.

  9. Pasi Karkkainen says:

    So you’re able to reproduct it easily. Good.
    Make sure you have:
    on_crash=”preserve”
    set up in /etc/xen/ cfgfile.
    Then when the domU crashes, run this command for each domU vcpu:
    /usr/lib/xen/bin/xenctx -s System.map-domUkernelversion
    If you’re running 64bit dom0, then xenctx might be under “/usr/lib64/”.
    You need to use the System.map file for the *exact* kernel version running in the domU.
    Please post those stack traces somewhere, for each vcpu.
    That should help debugging the problem.
    I can forward that stuff to xen-devel mailinglist, if you don’t want to yourself (Would be easier if you did it though).
    Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *