general protection fault: 0000 [#1] SMP

General support questions
Post Reply
Sridhar
Posts: 1
Joined: 2018/10/18 06:09:44

general protection fault: 0000 [#1] SMP

Post by Sridhar » 2018/10/18 06:22:07

Hi,

we are running Centos7 in a VM , During peak hours of traffic we are seeing VM freeze and not recoverable. we installed Kdump and saw kernel panic due to general protection fault: 0000 [#1] SMP ocuuring .
would like to know the is there any fix/work around available and is it a know issue.
runq output also shows swapper process running in all the CPUs.

***************************************************************************************************************
Crash 1:
=======
KERNEL: /usr/lib/debug/lib/modules/3.10.0-514.26.2.el7.x86_64/vmlinux
DUMPFILE: vmcore
CPUS: 32
DATE: Mon Oct 15 23:34:59 2018
UPTIME: 15 days, 05:29:30
LOAD AVERAGE: 3.68, 3.99, 4.13
TASKS: 2946
NODENAME: spprod-vwa-dccn-02-71-beta-11
RELEASE: 3.10.0-514.26.2.el7.x86_64
VERSION: #1 SMP Tue Jul 4 15:04:05 UTC 2017
MACHINE: x86_64 (2294 Mhz)
MEMORY: 40 GB
PANIC: "general protection fault: 0000 [#1] SMP "[/color]
PID: 0
COMMAND: "swapper/2"
TASK: ffff88016f9f2f10 (1 of 32) [THREAD_INFO: ffff88016fa40000]
CPU: 2
STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 0 TASK: ffff88016f9f2f10 CPU: 2 COMMAND: "swapper/2"
#0 [ffff88016fa43b60] machine_kexec at ffffffff81059beb
#1 [ffff88016fa43bc0] __crash_kexec at ffffffff81105822
#2 [ffff88016fa43c90] crash_kexec at ffffffff81105910
#3 [ffff88016fa43ca8] oops_end at ffffffff81690008
#4 [ffff88016fa43cd0] die at ffffffff8102e93b
#5 [ffff88016fa43d00] do_general_protection at ffffffff8168f8fe
#6 [ffff88016fa43d30] general_protection at ffffffff8168f1a8
[exception RIP: get_next_timer_interrupt+440]
RIP: ffffffff810995a8 RSP: ffff88016fa43de0 RFLAGS: 00010007
RAX: 4e4266190bb7f9af RBX: 0004acafb1ea01c0 RCX: 00000000000000b1
RDX: 000000018e687cb0 RSI: ffff88016fb513f8 RDI: 00000000014e687d
RBP: ffff88016fa43e30 R8: 0000000000000000 R9: 000000000000003d
R10: 000000000000003d R11: ffff88016fb51028 R12: 000000014e687cb0
R13: ffff88016fb50000 R14: ffff88016fa43de8 R15: ffff88016fa43e00
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff88016fa43e38] tick_nohz_stop_sched_tick at ffffffff810f3958
#8 [ffff88016fa43e90] __tick_nohz_idle_enter at ffffffff810f3afe
#9 [ffff88016fa43ec0] tick_nohz_idle_enter at ffffffff810f402d
#10 [ffff88016fa43ed0] cpu_startup_entry at ffffffff810e8153
#11 [ffff88016fa43f28] start_secondary at ffffffff8104f0da

*********************************************************************************************************
Crash 2:
=======
KERNEL: /usr/lib/debug/lib/modules/3.10.0-514.26.2.el7.x86_64/vmlinux
DUMPFILE: vmcore
CPUS: 32
DATE: Sat Oct 13 21:43:31 2018
UPTIME: 21 days, 23:46:48
LOAD AVERAGE: 4.46, 4.39, 4.27
TASKS: 2982
NODENAME: spprod-vwa-dccn-02-69-beta-11
RELEASE: 3.10.0-514.26.2.el7.x86_64
VERSION: #1 SMP Tue Jul 4 15:04:05 UTC 2017
MACHINE: x86_64 (2294 Mhz)
MEMORY: 40 GB
PANIC: "general protection fault: 0000 [#1] SMP "
PID: 0
COMMAND: "swapper/0"
TASK: ffffffff819c5460 (1 of 32) [THREAD_INFO: ffffffff819b0000]
CPU: 0
STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 0 TASK: ffffffff819c5460 CPU: 0 COMMAND: "swapper/0"
#0 [ffff880a16e03950] machine_kexec at ffffffff81059beb
#1 [ffff880a16e039b0] __crash_kexec at ffffffff81105822
#2 [ffff880a16e03a80] crash_kexec at ffffffff81105910
#3 [ffff880a16e03a98] oops_end at ffffffff81690008
#4 [ffff880a16e03ac0] die at ffffffff8102e93b
#5 [ffff880a16e03af0] do_general_protection at ffffffff8168f8fe
#6 [ffff880a16e03b20] general_protection at ffffffff8168f1a8
[exception RIP: arp_process+527]
RIP: ffffffff815e5bef RSP: ffff880a16e03bd0 RFLAGS: 00010286
RAX: 0100040600080100 RBX: ffff880a00f1dc00 RCX: 0000000000000002
RDX: 00000001713b3e7d RSI: ffff880a1374fa30 RDI: ffff880a1374fb78
RBP: ffff880a16e03c78 R8: ffff880a1374fa00 R9: ffff880a103a8456
R10: ffffffff81aa0e80 R11: 0000000000000001 R12: ffff880a103a844e
R13: ffff880a126aa000 R14: ffff880a13692e00 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff880a16e03c20] enqueue_pushable_task at ffffffff810d44ad
#8 [ffff880a16e03c80] arp_rcv at ffffffff815e62a5
#9 [ffff880a16e03ce8] __netif_receive_skb_core at ffffffff81570162
#10 [ffff880a16e03d50] __netif_receive_skb at ffffffff815703f8
#11 [ffff880a16e03d70] netif_receive_skb_internal at ffffffff81570480
#12 [ffff880a16e03da0] napi_gro_receive at ffffffff81571598
#13 [ffff880a16e03dc8] virtnet_poll at ffffffffa02f99d5 [virtio_net]
#14 [ffff880a16e03e38] net_rx_action at ffffffff81570c20
#15 [ffff880a16e03eb8] __do_softirq at ffffffff8108f63f
#16 [ffff880a16e03f28] call_softirq at ffffffff8169929c
#17 [ffff880a16e03f40] do_softirq at ffffffff8102d365
#18 [ffff880a16e03f60] irq_exit at ffffffff8108f9d5
#19 [ffff880a16e03f78] do_IRQ at ffffffff81699e38
--- <IRQ stack> ---
#20 [ffffffff819b3df8] ret_from_intr at ffffffff8168eeed
[exception RIP: native_safe_halt+6]
RIP: ffffffff81060fe6 RSP: ffffffff819b3ea0 RFLAGS: 00000286
RAX: 00000000ffffffed RBX: 0006c00c18a6f780 RCX: 0100000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000046
RBP: ffffffff819b3ea0 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000104 R12: 0000000000000000
R13: 0006c00c18a6f780 R14: ffff880a16e0fde0 R15: 1a3b1fbf72b64278
ORIG_RAX: ffffffffffffff8e CS: 0010 SS: 0018
#21 [ffffffff819b3ea8] default_idle at ffffffff810347ff
#22 [ffffffff819b3ec8] arch_cpu_idle at ffffffff81035146
#23 [ffffffff819b3ed8] cpu_startup_entry at ffffffff810e82f5
#24 [ffffffff819b3f30] rest_init at ffffffff81675607
#25 [ffffffff819b3f40] start_kernel at ffffffff81b0e05a
#26 [ffffffff819b3f88] x86_64_start_reservations at ffffffff81b0d5ee
#27 [ffffffff819b3f98] x86_64_start_kernel at ffffffff81b0d742
*****************************************************************************************************************

User avatar
avij
Retired Moderator
Posts: 3046
Joined: 2010/12/01 19:25:52
Location: Helsinki, Finland
Contact:

Re: general protection fault: 0000 [#1] SMP

Post by avij » 2018/10/18 07:03:03

As a first step you should try a "yum update". That kernel of yours is more than a year old, and there are tons of fixes between that and the current one, perhaps also one that fixes this problem.

Post Reply