Server hangs

The server’s been hanging intermittently for the past few weeks. It’s not clear what’s wrong: there’s nothing interesting in /var/log/messages, Red Hat kernels should be sufficiently stable nowadays and I keep up to date on patches. Probably, it’s hardware, something along the lines of bad RAM or a flakey motherboard. There might be a dead fan in the computer; I have to swing by and take a look at the back of the thing. The server room is cool, though I suppose a bad fan in the wrong place will make the rest of it heat up. I don’t believe it’s a bad hard drive, or I would have seen something in the logs by now. When I have time, I’ll swap memory or maybe move the drives to another machine.

But since I’m somewhat short of time this month to do these necessary repairs, there might be intermittent outages.

Update:
Correction. I did get an Oops last night. It was just a few hours before the last entry in messages. Looking further, I got a similar Oops before the last crash, again some time before the last entry. Here’s the most recent one:

Mar 31 01:12:53 fincher kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000104
Mar 31 01:12:53 fincher kernel: printing eip:
Mar 31 01:12:53 fincher kernel: c0148833
Mar 31 01:12:53 fincher kernel: *pde = 00000000
Mar 31 01:12:53 fincher kernel: Oops: 0000
Mar 31 01:12:53 fincher kernel: sd_mod scsi_mod 3c59x ipt_REJECT ipt_limit ipt_LOG ipt_state ip_conntrack iptable_filter ip_ta
bles ext3 jbd
Mar 31 01:12:53 fincher kernel: CPU: 0
Mar 31 01:12:53 fincher kernel: EIP: 0010:[] Not tainted
Mar 31 01:12:53 fincher kernel: EFLAGS: 00010256
Mar 31 01:12:53 fincher kernel:
Mar 31 01:12:53 fincher kernel: EIP is at prune_icache [kernel] 0x93 (2.4.18-27.7.x)
Mar 31 01:12:53 fincher kernel: eax: 00000000 ebx: 00000000 ecx: 00000000 edx: c11ae000
Mar 31 01:12:53 fincher kernel: esi: fffffff8 edi: 00000000 ebp: 0000a80a esp: c11affa0
Mar 31 01:12:53 fincher kernel: ds: 0018 es: 0018 ss: 0018
Mar 31 01:12:53 fincher kernel: Process kswapd (pid: 4, stackpage=c11af000)
Mar 31 01:12:53 fincher kernel: Stack: c11ae000 00000395 c232ddf8 c0de7c48 c02da360 00000000 00000000 66666667
Mar 31 01:12:53 fincher kernel: c0148900 0000217f c012f5a8 00000006 000001f0 c15230b0 000001f0 c7fd1f9c
Mar 31 01:12:53 fincher kernel: 00010f00 c7fd1f9c c0105000 0008e000 c0106eaa 00000000 c012f2c0 c7fd0000
Mar 31 01:12:53 fincher kernel: Call Trace: [] shrink_icache_memory [kernel] 0x20 (0xc11affc0))
Mar 31 01:12:53 fincher kernel: [] kswapd [kernel] 0x2e8 (0xc11affc8))
Mar 31 01:12:53 fincher kernel: [] stext [kernel] 0x0 (0xc11affe8))
Mar 31 01:12:53 fincher kernel: [] arch_kernel_thread [kernel] 0x26 (0xc11afff0))
Mar 31 01:12:53 fincher kernel: [] kswapd [kernel] 0x0 (0xc11afff8))
Mar 31 01:12:53 fincher kernel:
Mar 31 01:12:53 fincher kernel:
Mar 31 01:12:53 fincher kernel: Code: 8b 86 0c 01 00 00 a9 38 00 00 00 8b 7f 04 75 5a 0b 86 cc 00

Time for google. Time to look at memtest86. Alternatively, I can just swap out the RAM and see if that fixes it.

Comments are closed.