Re: 6.2-RELEASE-p4/SMP: fault on nofault entry

From: Ebbe Hjorth (none@info--ebbehjorth.dk.lh.bsd-dk.dk)
Date: Mon 17 Sep 2007 - 10:13:19 CEST


Date: Mon, 17 Sep 2007 10:13:19 +0200 (Romance Daylight Time)
Subject: Re: 6.2-RELEASE-p4/SMP: fault on nofault entry
From: "Ebbe Hjorth" <none@info--ebbehjorth.dk.lh.bsd-dk.dk>
To: bsd-dk@bsd-dk.dk


> Ok. Jeg er lidt på bar bund - næste projekt bliver at skifte hardwaren.
> Men
> det er jo en langsommelig fejlsøgning når man skal vente 14 dage på fejlen
> :-)
>
> Har overvejet at se om det ikke hjælper at reboote hver søndag nat
> (=windows-løsning)
>

Så vil jeg hellere kigge på hardwaren ;)

>
> MVH / Regards
> Gert Lynge
>
> -----Oprindelig meddelelse-----
> Fra: owner-bsd-dk@hobbes.bsd-dk.dk [mailto:owner-bsd-dk@hobbes.bsd-dk.dk]
> På
> vegne af Ebbe Hjorth
> Sendt: 17. september 2007 09:30
> Til: bsd-dk@bsd-dk.dk
> Emne: Re: 6.2-RELEASE-p4/SMP: fault on nofault entry
>
>
> Hej,
>
> Okay, har bare oplevet noget ligende på en virtual server, hvor det var
> det virtuelle server program der var problemet.
>
> Forsat god fejl-løsnings-jagt, beklager at jeg ik kunne hjælp!
>
>
> -> Ebbe
>
>> Hej Ebbe
>>
>> Dedikeret hardware (altså ikke en virtuel server)...
>>
>> MVH / Regards
>> Gert Lynge
>>
>> -----Oprindelig meddelelse-----
>> Fra: owner-bsd-dk@hobbes.bsd-dk.dk
>> [mailto:owner-bsd-dk@hobbes.bsd-dk.dk]
>> På
>> vegne af Ebbe Hjorth
>> Sendt: 17. september 2007 09:10
>> Til: bsd-dk@bsd-dk.dk
>> Emne: Re: 6.2-RELEASE-p4/SMP: fault on nofault entry
>>
>>
>> Kører du den virtuelt eller direkte?
>>
>>
>> -> Ebbe
>>
>>
>>> Hej igen liste
>>>
>>> Nu fik jeg en ny panic - med lidt mere info., men stadig ingen
>>> dump-fil:
>>> ---
>>> Fatal trap 12: page fault while in kernel mode
>>> Cupid = 2; apic id = 2
>>> Fault virtual address = 0x5a
>>> Fault code = supervisor read, page not present
>>> Instruction pointer = 0x20:0xc07e40a3
>>> Stack pointer = 0x28:0xe6a35b74
>>> Frame pointer = 0x28:0xe6a35c40
>>> Code segment = base 0x0, limit 0xfffff, type 0x1b
>>> = DPL 0, pres 1, def32 1, gran 1
>>> Processor eflags = interrupt enabled, resume, IOPL = 0
>>> Current process = 44 (pagezero)
>>> Trap number = 12
>>> Panic: page fault
>>> Cupid = 2
>>> Uptime 17d22h47m51s
>>> Dumping 2048 MB (2 chunks)
>>> Chunk 0: 1MB (150 pages)ipfw: xxxxxxxx
>>> ipfw: xxxxxxxxxx
>>> ---
>>> (ovenstående er et afskrift fra serverens konsol)
>>>
>>> Denne gang havde jeg IKKE xcache og http accept filters slået til - så
>>> dem
>>> kan vi vist udelukke.
>>> Serveren er en Supermicro med ECC-RAM og en Intel Core 2 Quad 6700.
>>>
>>> Er der nogen der har et hint til hvad der evt. kunne være galt - og om
>>> det
>>> er software eller hardware relateret?
>>> Det er en 6.2 box som er helt up to date med "freebsd-update" og som
>>> jeg
>>> jævnligt (manuelt) kører portsnap/portupgrade på. Den har dog ikke den
>>> sidste nye apache22 server release da jeg gerne ville have afklaret den
>>> fejl
>>> her inden jeg piller for meget...
>>>
>>> Er det i øvrigt normalt at ipfw stadig skriver til konsollen efter en
>>> panic?
>>> (ipfw linje nr. to kom mens jeg stod og afskrev konsollen - mindst 10
>>> minutter efter panic'en)
>>>
>>> MVH / Regards
>>> Gert Lynge
>>>
>>> -----Oprindelig meddelelse-----
>>> Fra: Gert Lynge [mailto:gert@lynge.org]
>>> Sendt: 5. september 2007 09:06
>>> Til: 'bsd-dk@bsd-dk.dk'
>>> Emne: 6.2-RELEASE-p4/SMP: fault on nofault entry
>>>
>>>
>>> Hej liste
>>>
>>> Får en gang imellem den her:
>>> panic: vm_fault: fault on nofault entry, addr: e92cf000
>>> cpuid = 3
>>> Uptime: 10d0h24m25s
>>> Dumping 2046 MB (2 chunks)
>>> chunk 0: 1MB (150 pages) ... Ok
>>> chunk 1: 2046MB (523744 pages)_
>>> (jeg er ikke sikker på at ovenstående adresse er den samme hver gang,
>>> idet
>>> det ikke altid er mig der kan reboote serveren)
>>>
>>> ...og her dør serveren så - dvs. hænger og kommer ikke videre. Enkelte
>>> gange
>>> har den dog også resat af sig selv - men i de tilfælde aner jeg ikke
>>> hvad
>>> der stod på konsollen.
>>> Efter en reboot, er der ikke noget i /var/crash (selvom jeg har opsat
>>> dumpdev).
>>> Det lader til at være load-relateret (mellem-busy Apache/Mysql server)
>>> -
>>> og
>>> det kan ske fra flere gange om dagen og til ca. en gang om ugen eller
>>> hver
>>> 14. dag.
>>> Det går mest amok når Apachen bruger http-accept-filters, så da
>>> serveren
>>> kører produktion har jeg slået det fra indtil videre...
>>> Der er _INTET_ i log-filerne der lugter af fejl - og selv IPMI-kortets
>>> log
>>> er tom.
>>>
>>> Nogen ideer?
>>> Lugter det af hardware eller software?
>>>
>>> PS: Jeg har en lille smule mistanke til xcache (PHP accelrator), så den
>>> har
>>> jeg også lige prøvet at slå fra... Og strengt taget har jeg da ikke
>>> haft
>>> en
>>> reboot siden.
>>>
>>> PPS: Jeg kører jævnligt freebsd-update/portsnap/portupgrade (især fordi
>>> jeg
>>> har det her problem), så kerne og ports burde være up-to-date.
>>>
>>> MVH / Regards
>>> Gert Lynge
>>> ---
>>> ws# uname -a
>>> FreeBSD x.x.x 6.2-RELEASE-p4 FreeBSD 6.2-RELEASE-p4 #0: Thu Apr 26
>>> 17:55:55
>>> UTC 2007 root@i386-builder.daemonology.net:/usr/obj/usr/src/sys/SMP
>>> i386
>>> ---
>>> ws# cat /etc/rc.conf
>>> defaultrouter="x.x.x.x"
>>> font8x14="cp865-8x14"
>>> font8x16="cp865-8x16"
>>> font8x8="cp865-8x8"
>>> hostname="x.x.x"
>>> ifconfig_em1="inet x.x.x.x netmask x.x.x.x"
>>> saver="daemon"
>>> usbd_enable="YES"
>>> keymap="danish.cp865"
>>> keyrate="fast"
>>> sshd_enable="YES"
>>> firewall_enable="YES"
>>> firewall_type="x"
>>> firewall_logging="YES"
>>> ntpd_enable="YES"
>>> ntpd_sync_on_start="YES"
>>> mysql_enable="YES"
>>> apache22_enable="YES"
>>> #apache22_http_accept_enable="YES"
>>> inetd_enable="YES"
>>> clear_tmp_enable="YES"
>>> #log_in_vain="1"
>>> sendmail_enable="YES"
>>> rsyncd_enable="YES"
>>> syslogd_flags="-a x.x.x.x/x:*"
>>> clamav_freshclam_enable="YES"
>>> local_startup="/usr/local/etc/rc.d"
>>> dumpdev="/dev/ar0s1b"
>>> ---
>>> ws# cat /var/run/dmesg.boot
>>> Copyright (c) 1992-2007 The FreeBSD Project.
>>> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993,
>>> 1994
>>> The Regents of the University of California. All rights reserved.
>>> FreeBSD is a registered trademark of The FreeBSD Foundation.
>>> FreeBSD 6.2-RELEASE-p4 #0: Thu Apr 26 17:55:55 UTC 2007
>>> root@i386-builder.daemonology.net:/usr/obj/usr/src/sys/SMP
>>> Timecounter "i8254" frequency 1193182 Hz quality 0
>>> CPU: Intel(R) Core(TM)2 Quad CPU Q6700 @ 2.66GHz (2660.01-MHz
>>> 686-class
>>> CPU)
>>> Origin = "GenuineIntel" Id = 0x6fb Stepping = 11
>>>
>>>
>>
> Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA
>>> ,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
>>>
>> Features2=0xe3bd<SSE3,RSVD2,MON,DS_CPL,VMX,EST,TM2,<b9>,CX16,<b14>,<b15>>
>>> AMD Features=0x20000000<LM>
>>> AMD Features2=0x1<LAHF>
>>> Cores per package: 4
>>> real memory = 2146304000 (2046 MB)
>>> avail memory = 2095165440 (1998 MB)
>>> ACPI APIC Table: <PTLTD APIC >
>>> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
>>> cpu0 (BSP): APIC ID: 0
>>> cpu1 (AP): APIC ID: 1
>>> cpu2 (AP): APIC ID: 2
>>> cpu3 (AP): APIC ID: 3
>>> ioapic0 <Version 2.0> irqs 0-23 on motherboard
>>> ioapic1 <Version 2.0> irqs 24-47 on motherboard
>>> kbd1 at kbdmux0
>>> ath_hal: 0.9.17.2 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413,
>>> RF5413)
>>> acpi0: <PTLTD RSDT> on motherboard
>>> acpi0: Power Button (fixed)
>>> Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
>>> acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0
>>> cpu0: <ACPI CPU> on acpi0
>>> cpu1: <ACPI CPU> on acpi0
>>> cpu2: <ACPI CPU> on acpi0
>>> cpu3: <ACPI CPU> on acpi0
>>> pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
>>> pci0: <ACPI PCI bus> on pcib0
>>> pcib1: <ACPI PCI-PCI bridge> irq 16 at device 1.0 on pci0
>>> pci1: <ACPI PCI bus> on pcib1
>>> pcib2: <ACPI PCI-PCI bridge> irq 17 at device 28.0 on pci0
>>> pci9: <ACPI PCI bus> on pcib2
>>> pcib3: <ACPI PCI-PCI bridge> at device 0.0 on pci9
>>> pci10: <ACPI PCI bus> on pcib3
>>> pci9: <base peripheral, interrupt controller> at device 0.1 (no driver
>>> attached)
>>> pcib4: <ACPI PCI-PCI bridge> irq 17 at device 28.4 on pci0
>>> pci13: <ACPI PCI bus> on pcib4
>>> em0: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port
>>> 0x4000-0x401f mem 0xe0200000-0xe021ffff irq 16 at device 0.0 on pci13
>>> em0: Ethernet address: 00:30:48:8d:1f:5e
>>> pcib5: <ACPI PCI-PCI bridge> irq 16 at device 28.5 on pci0
>>> pci14: <ACPI PCI bus> on pcib5
>>> em1: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port
>>> 0x5000-0x501f mem 0xe0300000-0xe031ffff irq 17 at device 0.0 on pci14
>>> em1: Ethernet address: 00:30:48:8d:1f:5f
>>> uhci0: <UHCI (generic) USB controller> port 0x3000-0x301f irq 23 at
>>> device
>>> 29.0 on pci0
>>> uhci0: [GIANT-LOCKED]
>>> usb0: <UHCI (generic) USB controller> on uhci0
>>> usb0: USB revision 1.0
>>> uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
>>> uhub0: 2 ports with 2 removable, self powered
>>> uhci1: <UHCI (generic) USB controller> port 0x3020-0x303f irq 19 at
>>> device
>>> 29.1 on pci0
>>> uhci1: [GIANT-LOCKED]
>>> usb1: <UHCI (generic) USB controller> on uhci1
>>> usb1: USB revision 1.0
>>> uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
>>> uhub1: 2 ports with 2 removable, self powered
>>> uhci2: <UHCI (generic) USB controller> port 0x3040-0x305f irq 18 at
>>> device
>>> 29.2 on pci0
>>> uhci2: [GIANT-LOCKED]
>>> usb2: <UHCI (generic) USB controller> on uhci2
>>> usb2: USB revision 1.0
>>> uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
>>> uhub2: 2 ports with 2 removable, self powered
>>> uhci3: <UHCI (generic) USB controller> port 0x3060-0x307f irq 16 at
>>> device
>>> 29.3 on pci0
>>> uhci3: [GIANT-LOCKED]
>>> usb3: <UHCI (generic) USB controller> on uhci3
>>> usb3: USB revision 1.0
>>> uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
>>> uhub3: 2 ports with 2 removable, self powered
>>> ehci0: <Intel 82801GB/R (ICH7) USB 2.0 controller> mem
>>> 0xe0000000-0xe00003ff
>>> irq 23 at device 29.7 on pci0
>>> ehci0: [GIANT-LOCKED]
>>> usb4: EHCI version 1.0
>>> usb4: companion controllers, 2 ports each: usb0 usb1 usb2 usb3
>>> usb4: <Intel 82801GB/R (ICH7) USB 2.0 controller> on ehci0
>>> usb4: USB revision 2.0
>>> uhub4: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
>>> uhub4: 8 ports with 8 removable, self powered
>>> pcib6: <ACPI PCI-PCI bridge> at device 30.0 on pci0
>>> pci15: <ACPI PCI bus> on pcib6
>>> pci15: <display, VGA> at device 0.0 (no driver attached)
>>> isab0: <PCI-ISA bridge> at device 31.0 on pci0
>>> isa0: <ISA bus> on isab0
>>> atapci0: <Intel ICH7 UDMA100 controller> port
>>> 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x30a0-0x30af at device 31.1 on
>>> pci0
>>> ata0: <ATA channel 0> on atapci0
>>> ata1: <ATA channel 1> on atapci0
>>> atapci1: <Intel ICH7 SATA300 controller> port
>>> 0x30e8-0x30ef,0x30dc-0x30df,0x30e0-0x30e7,0x30d8-0x30db,0x30b0-0x30bf
>>> mem
>>> 0xe0000400-0xe00007ff irq 19 at device 31.2 on pci0
>>> atapci1: AHCI Version 01.10 controller with 4 ports detected
>>> ata2: <ATA channel 0> on atapci1
>>> ata3: <ATA channel 1> on atapci1
>>> ata4: <ATA channel 2> on atapci1
>>> ata5: <ATA channel 3> on atapci1
>>> pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
>>> acpi_button0: <Power Button> on acpi0
>>> atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
>>> atkbd0: <AT Keyboard> irq 1 on atkbdc0
>>> kbd0 at atkbd0
>>> atkbd0: [GIANT-LOCKED]
>>> psm0: <PS/2 Mouse> irq 12 on atkbdc0
>>> psm0: [GIANT-LOCKED]
>>> psm0: model IntelliMouse Explorer, device ID 4
>>> sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on
>>> acpi0
>>> sio0: type 16550A
>>> sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
>>> sio1: type 16550A
>>> fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on
>>> acpi0
>>> fdc0: [FAST]
>>> ppc0: <ECP parallel printer port> port 0x378-0x37f,0x778-0x77f irq 7
>>> drq
>>> 3
>>> on acpi0
>>> ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
>>> ppc0: FIFO with 16/16/9 bytes threshold
>>> ppbus0: <Parallel port bus> on ppc0
>>> plip0: <PLIP network interface> on ppbus0
>>> lpt0: <Printer> on ppbus0
>>> lpt0: Interrupt-driven port
>>> ppi0: <Parallel I/O> on ppbus0
>>> pmtimer0 on isa0
>>> ipmi0: <IPMI System Interface> on isa0
>>> ipmi0: KCS mode found at io 0xca8 alignment 0x4 on isa
>>> orm0: <ISA Option ROMs> at iomem
>>> 0xc0000-0xcafff,0xcb000-0xcf7ff,0xcf800-0xd07ff,0xd0800-0xd17ff on isa0
>>> sc0: <System console> at flags 0x100 on isa0
>>> sc0: VGA <16 virtual consoles, flags=0x300>
>>> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on
>>> isa0
>>> Timecounters tick every 1.000 msec
>>> acd0: CDROM <CD-224E-N/1.AA> at ata0-master UDMA33
>>> ad4: 76319MB <Seagate ST380811AS 3.AAE> at ata2-master SATA300
>>> ad6: 76319MB <Seagate ST380811AS 3.AAE> at ata3-master SATA300
>>> ipmi0: IPMI device rev. 0, firmware rev. 2.2, version 2.0
>>> ipmi0: Number of channels 4
>>> ipmi0: Attached watchdog
>>> ar0: 76316MB <Intel MatrixRAID RAID1> status: READY
>>> ar0: disk0 READY (master) using ad4 at ata2-master
>>> ar0: disk1 READY (mirror) using ad6 at ata3-master
>>> SMP: AP CPU #1 Launched!
>>> SMP: AP CPU #2 Launched!
>>> SMP: AP CPU #3 Launched!
>>> Trying to mount root from ufs:/dev/ar0s1a
>>> WARNING: / was not properly dismounted
>>> WARNING: /tmp was not properly dismounted
>>> WARNING: /usr was not properly dismounted
>>> WARNING: /var was not properly dismounted
>>> ipfw2 (+ipv6) initialized, divert loadable, rule-based forwarding
>>> disabled,
>>> default to deny, logging disabled
>>>
>>>
>>
>>
>>
>>
>
>
>
>



This archive was generated by hypermail 2b30 : Sun 30 Sep 2007 - 23:00:03 CEST