General support questions
-
xxteknolustxx
- Posts: 15
- Joined: 2019/11/27 14:58:11
Post
by xxteknolustxx » 2019/11/27 15:01:46
Hello,
I'm running a dedicated centos 7 (kernel 3.10.0-1062.4.3.el7.x86_64) and I randomly get crashes with the below stack.
I do have a bunch of ntfs drives that are constantly being read from, although its not clear to me if this is a problem with the fuse driver or centos. I have the crash dump if anyones interested.
[125564.354211] [<ffffffff9023a50a>] __mem_cgroup_uncharge_common+0xba/0x2f0
[125564.354215] [<ffffffff9023e912>] mem_cgroup_uncharge_cache_page+0x12/0x20
[125564.354221] [<ffffffff901cd7da>] invalidate_inode_pages2_range+0x33a/0x460
[125564.354225] [<ffffffff901cd917>] invalidate_inode_pages2+0x17/0x20
[125564.354232] [<ffffffffc072b151>] fuse_finish_open+0xa1/0x110 [fuse]
[125564.354237] [<ffffffffc072b28e>] fuse_open_common+0xce/0xe0 [fuse]
[125564.354241] [<ffffffffc072b2b0>] fuse_open+0x10/0x20 [fuse]
[125564.354245] [<ffffffff90247826>] do_dentry_open+0x216/0x2c0
[125564.354251] [<ffffffff903039d2>] ? security_inode_permission+0x22/0x30
[125564.354258] [<ffffffffc072b2a0>] ? fuse_open_common+0xe0/0xe0 [fuse]
[125564.354261] [<ffffffff9024796a>] vfs_open+0x5a/0xb0
[125564.354266] [<ffffffff90255f6a>] ? may_open+0x5a/0x120
[125564.354269] [<ffffffff90258696>] do_last+0x1f6/0x1290
[125564.354272] [<ffffffff9025b52d>] path_openat+0xcd/0x5a0
[125564.354275] [<ffffffff9025cd72>] ? user_path_at_empty+0x72/0xc0
[125564.354278] [<ffffffff90246aea>] ? __check_object_size+0x1ca/0x250
[125564.354281] [<ffffffff9025ce9d>] do_filp_open+0x4d/0xb0
[125564.354286] [<ffffffff9026a977>] ? __alloc_fd+0x47/0x170
[125564.354289] [<ffffffff90248df4>] do_sys_open+0x124/0x220
[125564.354292] [<ffffffff90248f24>] SyS_openat+0x14/0x20
[125564.354298] [<ffffffff9078cede>] system_call_fastpath+0x25/0x2a
Last edited by
xxteknolustxx on 2019/11/28 05:51:10, edited 1 time in total.
-
xxteknolustxx
- Posts: 15
- Joined: 2019/11/27 14:58:11
Post
by xxteknolustxx » 2019/11/28 05:50:47
crashed again with a completely different crash
#7 [ffff8ce675ea3970] isolate_migratepages_range at ffffffff85be4fb7
#8 [ffff8ce675ea3a30] compact_zone at ffffffff85be5586
#9 [ffff8ce675ea3a80] compact_zone_order at ffffffff85be57ac
#10 [ffff8ce675ea3b20] try_to_compact_pages at ffffffff85be5b71
#11 [ffff8ce675ea3b80] __alloc_pages_direct_compact at ffffffff861750c5
#12 [ffff8ce675ea3be0] __alloc_pages_nodemask at ffffffff85bc7ecc
#13 [ffff8ce675ea3d10] alloc_pages_current at ffffffff85c16298
#14 [ffff8ce675ea3d58] __get_free_pages at ffffffff85bc228e
#15 [ffff8ce675ea3d68] nf_ct_alloc_hashtable at ffffffffc06c550e [nf_conntrack]
#16 [ffff8ce675ea3d98] nf_conntrack_init_net at ffffffffc06c8dee [nf_conntrack]
#17 [ffff8ce675ea3db8] nf_conntrack_pernet_init at ffffffffc06c96c4 [nf_conntrack]
#18 [ffff8ce675ea3dd8] ops_init at ffffffff86043054
#19 [ffff8ce675ea3e18] setup_net at ffffffff86043203
#20 [ffff8ce675ea3e60] copy_net_ns at ffffffff860439a5
#21 [ffff8ce675ea3e90] create_new_namespaces at ffffffff85acb469
#22 [ffff8ce675ea3ec8] unshare_nsproxy_namespaces at ffffffff85acb6aa
#23 [ffff8ce675ea3ef8] sys_unshare at ffffffff85a9ae8b
#24 [ffff8ce675ea3f50] system_call_fastpath at ffffffff8618cede
-
tunk
- Posts: 1206
- Joined: 2017/02/22 15:08:17
Post
by tunk » 2019/11/28 10:56:57
Hardware problem? Maybe run a memtest for a day, reseat all cables, cards, modules, etc.
-
xxteknolustxx
- Posts: 15
- Joined: 2019/11/27 14:58:11
Post
by xxteknolustxx » 2019/12/02 14:44:51
I ran a full memtest (outside the OS) no errors. I reseated everything as well.
I reinstalled centos 7 from scratch (see exact version below) and I'm still getting random crashes. The only thing I'm doing is copying (using rsync) files from multiple ntfs disks to ext4 disks and the crashes are fairly random.
3.10.0-1062.4.3.el7.x86_64
centos-release-7-7.1908.0.el7.centos.x86_64
KERNEL: /usr/lib/debug/usr/lib/modules/3.10.0-1062.4.3.el7.x86_64/vmlinux
DUMPFILE: vmcore [PARTIAL DUMP]
CPUS: 24
DATE: Sun Dec 1 15:39:02 2019
UPTIME: 22:03:31
LOAD AVERAGE: 1.33, 0.32, 0.15
TASKS: 359
NODENAME: dedicated-server
RELEASE: 3.10.0-1062.4.3.el7.x86_64
VERSION: #1 SMP Wed Nov 13 23:58:53 UTC 2019
MACHINE: x86_64 (3493 Mhz)
MEMORY: 63.9 GB
PANIC: "general protection fault: 0000 [#1] SMP "
PID: 150
COMMAND: "kswapd1"
TASK: ffff8b3468a71070 [THREAD_INFO: ffff8b4330eb0000]
CPU: 21
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 150 TASK: ffff8b3468a71070 CPU: 21 COMMAND: "kswapd1"
#0 [ffff8b4330eb3698] machine_kexec at ffffffff86865b24
#1 [ffff8b4330eb36f8] __crash_kexec at ffffffff86921ab2
#2 [ffff8b4330eb37c8] crash_kexec at ffffffff86921ba0
#3 [ffff8b4330eb37e0] oops_end at ffffffff86f84798
#4 [ffff8b4330eb3808] die at ffffffff86830a7b
#5 [ffff8b4330eb3838] do_general_protection at ffffffff86f84092
#6 [ffff8b4330eb3870] general_protection at ffffffff86f83718
[exception RIP: mem_cgroup_charge_statistics+14]
RIP: ffffffff86a3859e RSP: ffff8b4330eb3920 RFLAGS: 00010246
RAX: ffffffffffffffff RBX: ffff8b347ec8df50 RCX: 00000000ffffffff
RDX: 0000000000000000 RSI: ffffe4d944037d40 RDI: f7ff8b347f4491f0
RBP: ffff8b4330eb3920 R8: 0000000000000000 R9: 0000000000000046
R10: 0000000000000230 R11: 0000000000000006 R12: ffffe4d944037d40
R13: 0000000000000000 R14: 0000000000000001 R15: f7ff8b347f449000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff8b4330eb3928] __mem_cgroup_uncharge_common at ffffffff86a3a50a
#8 [ffff8b4330eb3968] mem_cgroup_uncharge_cache_page at ffffffff86a3e912
#9 [ffff8b4330eb3978] delete_from_page_cache at ffffffff869be3f4
#10 [ffff8b4330eb39a0] truncate_inode_page at ffffffff869cda2b
#11 [ffff8b4330eb39c0] truncate_inode_pages_range at ffffffff869cdc7a
#12 [ffff8b4330eb3b18] truncate_inode_pages_final at ffffffff869ce24f
#13 [ffff8b4330eb3b38] fuse_evict_inode at ffffffffc10446f9 [fuse]
#14 [ffff8b4330eb3b50] evict at ffffffff86a67cb4
#15 [ffff8b4330eb3b78] iput at ffffffff86a680dc
#16 [ffff8b4330eb3ba8] __dentry_kill at ffffffff86a62de8
#17 [ffff8b4330eb3bd0] shrink_dentry_list at ffffffff86a632fe
#18 [ffff8b4330eb3c18] prune_dcache_sb at ffffffff86a65228
#19 [ffff8b4330eb3c78] prune_super at ffffffff86a4cbc0
#20 [ffff8b4330eb3ca8] shrink_slab at ffffffff869d0e05
#21 [ffff8b4330eb3d48] balance_pgdat at ffffffff869d4b38
#22 [ffff8b4330eb3e20] kswapd at ffffffff869d4ee3
#23 [ffff8b4330eb3ec8] kthread at ffffffff868c61f1
#24 [ffff8b4330eb3f50] ret_from_fork_nospec_begin at ffffffff86f8cd24
crash: inconsistent active task indications for CPU 7:
runqueue: ffff93332921a0e0 "swapper/7" (default)
current_task: ffff9341ea229070 "mount.ntfs"
KERNEL: /usr/lib/debug/usr/lib/modules/3.10.0-1062.4.3.el7.x86_64/vmlinux
DUMPFILE: vmcore [PARTIAL DUMP]
CPUS: 24
DATE: Mon Dec 2 09:16:00 2019
UPTIME: 17:36:07
LOAD AVERAGE: 5.19, 5.53, 5.87
TASKS: 364
NODENAME: dedicated-server
RELEASE: 3.10.0-1062.4.3.el7.x86_64
VERSION: #1 SMP Wed Nov 13 23:58:53 UTC 2019
MACHINE: x86_64 (3492 Mhz)
MEMORY: 63.9 GB
PANIC: "general protection fault: 0000 [#1] SMP "
PID: 150
COMMAND: "kswapd1"
TASK: ffff933328a69070 [THREAD_INFO: ffff9341f0a9c000]
CPU: 22
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 150 TASK: ffff933328a69070 CPU: 22 COMMAND: "kswapd1"
#0 [ffff9341f0a9f6c8] machine_kexec at ffffffff9fe65b24
#1 [ffff9341f0a9f728] __crash_kexec at ffffffff9ff21ab2
#2 [ffff9341f0a9f7f8] crash_kexec at ffffffff9ff21ba0
#3 [ffff9341f0a9f810] oops_end at ffffffffa0584798
#4 [ffff9341f0a9f838] die at ffffffff9fe30a7b
#5 [ffff9341f0a9f868] do_general_protection at ffffffffa0584092
#6 [ffff9341f0a9f8a0] general_protection at ffffffffa0583718
[exception RIP: mem_cgroup_charge_statistics+14]
RIP: ffffffffa003859e RSP: ffff9341f0a9f958 RFLAGS: 00010246
RAX: ffffffffffffffff RBX: ffff93333ec8b940 RCX: 00000000ffffffff
RDX: 0000000000000000 RSI: fffff98b0402e500 RDI: f7ff93333f4491f0
RBP: ffff9341f0a9f958 R8: 0000000000000000 R9: 0000000000000046
R10: 0000000000000230 R11: 0000000000000006 R12: fffff98b0402e500
R13: 0000000000000000 R14: 0000000000000001 R15: f7ff93333f449000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff9341f0a9f960] __mem_cgroup_uncharge_common at ffffffffa003a50a
#8 [ffff9341f0a9f9a0] mem_cgroup_uncharge_cache_page at ffffffffa003e912
#9 [ffff9341f0a9f9b0] __remove_mapping at ffffffff9ffd0a3a
#10 [ffff9341f0a9f9f0] shrink_page_list at ffffffff9ffd1b45
#11 [ffff9341f0a9fb28] shrink_inactive_list at ffffffff9ffd29d6
#12 [ffff9341f0a9fbf0] shrink_lruvec at ffffffff9ffd34d5
#13 [ffff9341f0a9fcf0] shrink_zone at ffffffff9ffd3906
#14 [ffff9341f0a9fd48] balance_pgdat at ffffffff9ffd4b13
#15 [ffff9341f0a9fe20] kswapd at ffffffff9ffd4ee3
#16 [ffff9341f0a9fec8] kthread at ffffffff9fec61f1
#17 [ffff9341f0a9ff50] ret_from_fork_nospec_begin at ffffffffa058cd24
-
TrevorH
- Site Admin
- Posts: 33216
- Joined: 2009/09/24 10:40:56
- Location: Brighton, UK
Post
by TrevorH » 2019/12/02 15:20:51
Looking at that lot, I'd suspect ntfs-3g. You know the idea is not that you use NTFS every day with it, it's to get you access to it so you can copy data off it and onto a supported linux native filesystem. It's not a CentOS supported filesystem and going through fuse, it's not going to be particularly performant either.
-
xxteknolustxx
- Posts: 15
- Joined: 2019/11/27 14:58:11
Post
by xxteknolustxx » 2019/12/02 15:40:42
yea that's why I'm moving everything off ntfs
If it persists afterwards I complete copying all my data, I'll report back
Appreciate the response
Thanks
-
xxteknolustxx
- Posts: 15
- Joined: 2019/11/27 14:58:11
Post
by xxteknolustxx » 2019/12/05 05:11:35
I've removed all the ntfs disks and replaced with ext4 disks (ran fsck on all of them, all ok)
Still crashing with same stack trace, can't look at it in crash utility since I don't see the debuginfo package for 3.10.0-1062.7.1.el7.x86_64 yet
[ 7346.107737] Modules linked in: xt_nat veth fuse ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink xt_addrtype br_netfilter ext4 mbcache jbd2 overlay(T) ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle iptable_security iptable_raw nf_conntrack ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter arc4 iwlmvm snd_hda_codec_hdmi mac80211 iwlwifi edac_mce_amd snd_hda_codec_realtek kvm_amd snd_hda_codec_generic kvm snd_hda_intel snd_hda_codec irqbypass crc32_pclmul snd_hda_core snd_hwdep ghash_clmulni_intel snd_seq
[ 7346.108121] aesni_intel snd_seq_device lrw gf128mul snd_pcm glue_helper ablk_helper cryptd btusb btrtl btbcm btintel snd_timer pcspkr bluetooth snd cfg80211 k10temp i2c_piix4 ccp soundcore rfkill sg gpio_amdpt pinctrl_amd pcc_cpufreq i2c_designware_platform i2c_designware_core acpi_cpufreq ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic uas usb_storage nouveau video drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci igb crct10dif_pclmul libahci crct10dif_common crc32c_intel libata nvme serio_raw ptp drm_panel_orientation_quirks nvme_core pps_core mxm_wmi dca i2c_algo_bit sdhci_acpi iosf_mbi sdhci mmc_core wmi dm_mirror dm_region_hash dm_log dm_mod
[ 7346.108501] CPU: 11 PID: 150 Comm: kswapd1 Kdump: loaded Tainted: G ------------ T 3.10.0-1062.7.1.el7.x86_64 #1
[ 7346.108743] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X399 Taichi, BIOS P3.30 08/14/2018
[ 7346.108982] task: ffff990bfe6b62a0 ti: ffff990bf0ed4000 task.ti: ffff990bf0ed4000
[ 7346.109215] RIP: 0010:[<ffffffffbb438f0e>] [<ffffffffbb438f0e>] mem_cgroup_charge_statistics.isra.20+0xe/0x60
[ 7346.109461] RSP: 0018:ffff990bf0ed7958 EFLAGS: 00010246
[ 7346.109583] RAX: ffffffffffffffff RBX: ffff98fd3ec8b940 RCX: 00000000ffffffff
[ 7346.109709] RDX: 0000000000000000 RSI: fffff8a80402e500 RDI: f7ff98fc366571f0
[ 7346.109836] RBP: ffff990bf0ed7958 R08: 0000000000000000 R09: 0000000000000046
[ 7346.109962] R10: 0000000000000230 R11: 0000000000000006 R12: fffff8a80402e500
[ 7346.110088] R13: 0000000000000000 R14: 0000000000000001 R15: f7ff98fc36657000
[ 7346.110215] FS: 00007f44bbdce700(0000) GS:ffff990bfcb40000(0000) knlGS:0000000000000000
[ 7346.110449] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7346.110572] CR2: 00007f7c4c0c7b18 CR3: 00000005ada10000 CR4: 00000000003407e0
[ 7346.110699] Call Trace:
[ 7346.110817] [<ffffffffbb43ae7a>] __mem_cgroup_uncharge_common+0xba/0x2f0
[ 7346.110944] [<ffffffffbb43f282>] mem_cgroup_uncharge_cache_page+0x12/0x20
[ 7346.111072] [<ffffffffbb3d13aa>] __remove_mapping+0xaa/0x180
[ 7346.111197] [<ffffffffbb400c70>] ? page_get_anon_vma+0xa0/0xa0
[ 7346.111322] [<ffffffffbb3d24b5>] shrink_page_list+0x3c5/0xc30
[ 7346.111447] [<ffffffffbb3d3346>] shrink_inactive_list+0x1c6/0x5d0
[ 7346.111573] [<ffffffffbb3d3e45>] shrink_lruvec+0x385/0x740
[ 7346.111706] [<ffffffffbb3d4276>] shrink_zone+0x76/0x1a0
[ 7346.111839] [<ffffffffbb3d5483>] balance_pgdat+0x383/0x5e0
[ 7346.111973] [<ffffffffbb3d5853>] kswapd+0x173/0x440
[ 7346.112106] [<ffffffffbb2c72e0>] ? wake_up_atomic_t+0x30/0x30
[ 7346.112240] [<ffffffffbb3d56e0>] ? balance_pgdat+0x5e0/0x5e0
[ 7346.112373] [<ffffffffbb2c61f1>] kthread+0xd1/0xe0
[ 7346.112504] [<ffffffffbb2c6120>] ? insert_kthread_work+0x40/0x40
[ 7346.112640] [<ffffffffbb98dd24>] ret_from_fork_nospec_begin+0xe/0x21
[ 7346.112775] [<ffffffffbb2c6120>] ? insert_kthread_work+0x40/0x40
[ 7346.112908] Code: 75 eb 48 81 c6 08 02 00 00 49 01 d0 48 81 fe 20 08 00 00 75 c8 4d 85 c0 0f 95 c0 5d c3 66 66 66 66 90 55 48 89 e5 84 d2 48 63 c1 <48> 8b 17 74 2d 65 48 01 42 08 48 8b 16 80 e6 40 74 08 48 8b 17
[ 7346.113512] RIP [<ffffffffbb438f0e>] mem_cgroup_charge_statistics.isra.20+0xe/0x60
[ 7346.113767] RSP <ffff990bf0ed7958>
-
dunch
- Posts: 66
- Joined: 2018/11/07 13:48:53
- Location: Yorkshire
Post
by dunch » 2019/12/05 11:00:14
[ 7346.108501] CPU: 11 PID: 150 Comm: kswapd1 Kdump: loaded Tainted: G ------------ T 3.10.0-1062.7.1.el7.x86_64 #1
[ 7346.108743] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X399 Taichi, BIOS P3.30 08/14/2018
What's the dodgy driver?
-
TrevorH
- Site Admin
- Posts: 33216
- Joined: 2009/09/24 10:40:56
- Location: Brighton, UK
Post
by TrevorH » 2019/12/05 11:36:11
https://access.redhat.com/solutions/40594 lists the taint codes for the RHEL kernels. G means all loaded modules were GPL. T means
Red Hat extension
Technology Preview code was loaded; cf. Technology Preview features support scope description. Refer to "TECH PREVIEW:" kernel log entry for details.
-
TrevorH
- Site Admin
- Posts: 33216
- Joined: 2009/09/24 10:40:56
- Location: Brighton, UK
Post
by TrevorH » 2019/12/05 11:40:08
How busy is this system? Almost all of the crashes you've posted so far appear to show it's under significant memory pressure.