Forums » Bugs

loading screen of death

Aug 25, 2004 Zer0cool 666 link
while i was in deneb system everytime i tried to go to sector e-7, it stayed on the loading screen no matter how many times i tried to get there.
Aug 25, 2004 paedric link
Did you see a loading bar or was it just the splash screen? If you just saw the splash screen, then the sector has crashed. If you saw a loading bar and it hung after maxing out, then there is a misbehaving storm in that sector. (these have been my experiences anyway)
Aug 25, 2004 a1k0n link
I wrote a response to this before, but then my browser crashed. Whee.

I spent some time tracking this down. The sector doesn't crash, it just doesn't start properly. Something very bizarre happens when it first fork()s; it seems to inherit a file descriptor of a completely different sector, and then during its login process it is totally confused because it sees people flying around instead of the server asking it which sector it's supposed to be handling. I'm not sure if it's a Linux bug (we're using 2.4.22 with some crazy NPTL patch.. maybe that's causing problems..) or just some obscure, hard-to-reproduce bug in our own code.

Right now, we're more concerned with client releases since we need to make a client release candidate very shortly; we have plenty of room for server debugging during the beta. So we may not have time to fix this right now; sorry it's taking so long.
Aug 25, 2004 roguelazer link
a1k0n? The way I read it, NPTL was 2.6 series -only- or very, very bad things could happen. Maybe putting a very new process management method on an unsupported kernel could be a cause of your process management problems? just a thought... Heck, I don't even use NPTL on my bleeding-edge system, it breaks totem.
Aug 25, 2004 a1k0n link
Well, don't ask me. I didn't compile this kernel. It's a stock Fedora installation; we threw it on there last time the box ate itself just to get back up and running. For the next version I think we're gonna run sectors on the linux 2.6.5 boxes, which we bought for that specific purpose (but we haven't needed them because the load is... 0.00001)
Aug 26, 2004 mr_spuck link
Transcode had a few issues with NPTL too. I never experienced them myself, though.
IIRC a workaround was to set LD_ASSUME_KERNEL=2.4.19 or even 2.2.5. But be careful it's possible that this breaks some applications.

EDIT: more information about this variable is here: http://people.redhat.com/drepper/assumekernel.html

EDIT2:
> we haven't needed them because the load is... 0.00001
heh sounds like you guys are prepared for the stress test :)
Aug 26, 2004 roguelazer link
Nah, the load's just so low because all the sectors crashed. :P
Aug 26, 2004 Celkan link
That was a low blow, rogue. Real low.

Not to mention mean. :P
Aug 26, 2004 Turmoyl link
I feel that it's worth mentioning that Fedora is a horrible platform to host a test on as it is nothing but a perpetual beta test itself.

The entire idea of Fedora is a testing platform for packages before they get pushed into RedHat's paid-for versions.

Just like any beta product you should never expect stability nor normal behavior out of it. ;-)
Aug 26, 2004 roguelazer link
Yeah, use gentoo instead. Then when it eats itself, restore time is only 48 hours! :p Actually, I use gentoo, but I am trying to be fair. I reccommend either debian or gentoo + tar. tar as in make a clone of the system and restore it in 10 minutes.
Aug 26, 2004 a1k0n link
Yeah, we didn't realize how much Fedora sucked at the time. Oh well. Debian would have been a better choice but we had a fedora boot CD handy for some strange reason.

For most of our servers though we're using a hand-rolled super-minimal installation which was originally compiled with gentoo (so basically it's similar to your "gentoo + tar" suggestion, except it's on a CD). It boots off a CD, grabs an IP from our host server, formats the disk, mounts /usr with the libraries and compilers and other necessary goodies, downloads the server files it needs to run, and then runs them. The upshot is we can just buy more of these boxes, plop a CD in, turn them on, and poof, instant cluster expansion. Then if a disk dies, it's pretty easy to reinstall! Also, we're veering pretty far off topic! Go me!
Aug 27, 2004 Zer0cool 666 link
well whenever i get the load screen it fully loads and then nothing happens but after getting it a fair amount of times ive noticed it only happens when i hit a storm
Aug 28, 2004 andreas link
Use White Box Enterprise Linux. Its a rebuild of RHEL3 and it is rock solid.
Aug 29, 2004 RelayeR link
I'm sorry...I really don't mean to drag this on topic anymore than it is but, Andy... I sent you a verbose 6 errors log on this problem for Pyronis n6 with storm (I still haven't been able to reproduce the jump *out* bug that Elder God reported but there's 4 or 5 attempts to get *into* n6 in the file).

Get to it when you can.