Forums » General

Scheduled Downtime

Sep 07, 2007 incarnate link
We're scheduling a game downtime from 6-7pm CST today, to do maintenance on our cluster servers. Hopefully the downtime will actually be rather shorter than an hour, but we're blocking out that window just in case.

I would have scheduled the outage earlier, but we weren't sure when the upgrade would be possible, and it looks like today is a good day. Hope this doesn't inconvenience anyone too much. Thanks.

EDIT: To clarify, this will only impact the game, and shouldn't affect the website at all.
Sep 07, 2007 upper case link
Might I enquiry about wich aspect of the game this will impact and in what way?
Sep 08, 2007 voiceofra link
@upper case - ummm..."scheduling a game downtime" usually means you won't be able to play because, well, the game server is, ummmm...down.
Sep 08, 2007 MSKanaka link
upper case was asking what parts of the game were being fixed/updated/changed by the maintenance.
Sep 08, 2007 incarnate link
The game actually didn't go down because of this at all. Although our ISP (and our ISP's ISP) have had some problems. Cogent had some issues at Chicago earlier today, and I'm not sure what happened around midnight.

This scheduled outage was to do some maintenance on our cluster of game servers. But we didn't end up going as far as we originally intended, choosing to just migrate one machine and do further testing, completing the migration next week.

Part of this migration is to get all the machines running the same version of the same operating system (FreeBSD 6.2). Part of it is to start them booting off of the network (PXE) instead of self-booting cdroms. Plus we're adding machines to the overall cluster to increase capacity. Plus more extensive monitoring of each machine. So on and so forth.
Sep 08, 2007 yodaofborg link
The VO Server runs off live cds? NO WAI!
Sep 08, 2007 davejohn link
I love the phrase " server migration " . Brings to mind an image of a vee formation of beige boxes with wings flying south for the winter .....

Anyway , good luck with all that inc.
Sep 08, 2007 upper case link
Er... did you really mean to say the servers are booted off CDs? As in boot volume?

I understand FreeBSD is lean and all but you can't really avoid page faults even for system resources. Loading that from a slow medium like a CD would really impact on performance.

How do you avoid this? Is it worth it to save... what... 600 megs?

I'm not arguing your system architecture. I'm just curious (and puzzled).

(And quite frankly I'm eager to register again...)

Edit:
voiceofra: No shit, Sherlock! ;-)
Sep 08, 2007 incarnate link
No no, we ran off of a specialized CD that Andy made. It's a liveCD in the sense that it does boot off of the CD with a fully encompassed OS, but then it makes a filesystem (including swap) and installs all the critical stuff on disk. All the game-related functionality is pulled off of the network onto the harddrive. It only loads off of the CD at boot-time (which, for all these machines, was at least months ago, maybe years) and occasionally if someone logs into them and uses a program that wasn't installed to disk. Generally the game is compiled on one of them, and then the binaries are pushed to the rest via rsync.

PXEboot is a lot more elegant, since we can change things more dynamically and update more easily, but we're still working through some aspects of it. The entire OS is installed on the HD, no different from a regular install, we're just using the network for booting and installation.

Keep in mind that the game doesn't really need a fully fledged OS in most respects. We need a running kernel, swap, memory, ethernet, and the ability to run our software (which I think is actually statically linked even, at present). That said, it's a lot more convenient to have stuff on them.. like a compiler, GDB, etc, in case stuff goes wrong or needs to be debugged, etc.

We originally did the bootable-CD thing because it was a simple way for us to roll N number of servers, and bring up new ones quickly (buy a new machine, throw it in a rack, stick a CD in it). PXEboot is even simpler (buy machine, plug in ethernet, turn on). These "cluster" servers that sit behind the actual core server are responsible for running individual sectors, which scale across the available cluster based on load and available memory (sectors spawn and die based on user activity, if users fly into them, etc). So, easy expansion in the event of increased load is important. Load like.. more users, or more intensive sector activity (like capship battles).
Sep 09, 2007 mdaniel link
interesting... so if I fly through lots of rarely used sectors I will get the servers going good hehe...
Sep 09, 2007 themnemonic link
incarnate: just a wild guess... is the bots state saved when you shut down a sector?

sometimes when i jump on an empty sector i feel like it has just be turned on, because all collector bots are flying together from one position toward the asteroids for mining them.

I wonder if this might be the cause i had a hard time to find orun's core units in one sector. The mission help said it should be easier to find them on bots there were being mining for some time. But actually those bots were all brand new because of the sector boot up... I remember i destroy all of them and got zero core units. Coincidence?
Sep 09, 2007 incarnate link
themnemonic: Yes, your guess is correct, and that's also something we've been discussing recently. We "virtually" handle the activities of bots when sectors are offline, but it isn't done very accurately. We may move towards doing this more accurately, so sector states are more seamless, which also would potentially allow us to shut down idle sectors more quickly and result in better scalability. Right now, for instance, sectors continue to "live" for 5 minutes after a player leaves, before shutting down. Part of this is for better handling of combined bot/player activity, like participation in an Escort mission.. we want the bot state to be as accurate as possible, so we have a reasonable timetable for the bots to proceed. If we make our virtual states more accurate, then we can shorten the idle timeout to only a few seconds, which will result in less server overhead for the kind of "users flying through rarely used sectors" that mdaniel mentions. We could separately prioritize the timeouts of "important" sectors (major stations, wormholes), to minimize startup times in those common cases.

Anyway, it is an issue we're looking at, I'm not sure how we'll proceed, but it's another aspect "back-end" reworking that would make the game better as a whole.
Sep 09, 2007 csgno1 link
I imagine that once VO has thousands of players online, all the sectors with stations, cool stuff, and wormholes will stay active due to constant use. Would not the sector time out issue then become less important? As long as the system is sized to keep all the popular sectors running well at once.
Sep 09, 2007 upper case link
In theory, thousands of players online means thousands of paying members too, which translates to actual income which means added servers to handle the load which means less sector offloading anyhow because users manage to keep sectors alive through their activity.

For now, things just get offloaded and loaded as are needed.
Sep 09, 2007 themnemonic link
thank you for your answer, incarnate! I'm pleased i got that right.