![]() |
|
|||||||
| FastMail Forum All posts relating to FastMail.FM should go here: suggestions, comments, requests for help, complaints, technical issues etc. |
![]() |
|
|
Thread Tools |
|
|
#1 |
|
Senior Member
Join Date: Mar 2003
Location: UK
Posts: 168
|
I do not have the greatest understanding of IT so if this is an obvious question forgive me!
Why am I seeing so many server outages on FM? FM offers a great service but the number of outages is an unsatisfactory point. I do not seem to remember any before the accounts were transfered to the new servers a few months ago (I am certainly not saying this is the reason but making on observation based on my own experience). I remember two yesterday and now there is another. If I understood why this happens it might aleviate the urge to beat my head on the desk which I currently have. Would someone be kind enough to explain in non-technojargon why this happens, why it seems to happen quite often (user in UK and N Europe) and what might be done to reduce the problem. Larry |
|
|
|
|
|
#2 | |||||
|
Junior Member
Join Date: Oct 2002
Location: Dundee, Scotland
Posts: 25
|
Re: Mechanics of a Server Outage
Quote:
![]() Quote:
Quote:
Quote:
Quote:
You're welcome, Richard ![]() |
|||||
|
|
|
|
|
#3 | |
|
Cornerstone of the Community
Join Date: Apr 2002
Location: UK
Posts: 590
|
Re: Re: Mechanics of a Server Outage
Quote:
|
|
|
|
|
|
|
#4 | |
|
Junior Member
Join Date: Oct 2002
Location: Dundee, Scotland
Posts: 25
|
Re: Re: Re: Mechanics of a Server Outage
Quote:
BTW, my reply was kind of tongue-in-cheek, and shouldn't be taken too seriously... Richard |
|
|
|
|
|
|
#5 | |
|
Junior Member
Join Date: Oct 2002
Location: Dundee, Scotland
Posts: 25
|
A reply devoid of humour
Quote:
My point about hardware failure still stands. Harddrives and fans, etc, all have a limited lifespan and, although the manufacturer can give an *average* lifespan, the average does not mean the device will not fail in 5 minutes after installation. I believe MailSnare had a harddrive failure not so long ago. It's the way things are... Buying good quality stuff, ensuring the environment is appropriate, etc, helps to reduce the failure rate. Also, every part of the hardware and software was designed and written by humans. If software is written by other humans, we will not necessarily know where the imperfections are. We will only find out when the software crashes or behaves in a way that was not predicted. The only way this can be improved is by good design, rigourous testing, and fixing of the bugs. I'm afraid that your view that downtime happens "quite often" is a perception. Whether 3 faults occur within 3 days, or 3 faults being spread out over a month, still means there were 3 faults. We cannot predict when faults will occur. If faults are bunched together does not *necessarily* mean that there is a deeper problem (although that may be the case). It may simply be a case of Bad Luck(tm). Richard |
|
|
|
|
|
|
#6 |
|
Member
Join Date: Aug 2002
Posts: 45
|
Also, more often than not I only become aware of an outage when I read about it on the forum, and I check my email a lot.
I'd guess that just about every small hiccup gets reported publicly here, together with the FM-teams response. As long as the response is fast and appropriate I wouldn't worry too much. With a lot of providers you never know about the outages unless you discover them yourself, and even then you don't always know if the problem was at your side, or at the server. |
|
|
|
|
|
#7 |
|
Senior Member
Join Date: Mar 2003
Location: UK
Posts: 168
|
Richard
I found your humorous reply to my post very amusing, I was barely able to contain myself. I do not know of another e-mail service with the same level of communication and support as offered by FM. As for 100% uptime I wouldn’t have a clue, but if you know what the uptime of other mail providers is then would you tell me as it would be beneficial to have stats rather than a perception-based reply, which as you pointed out does not really cut the mustard. I think that the reason I wished to beat my head on the desk was more due to frustration at not being able to access my mail rather than due to a deep rooted psychological problem, but if the latter really is the case then my perception may indeed be impaired. I completely accept your point that machines break at unpredictable times and that human beings are not perfect. However I do not feel that this explains why I used the most common e-mail provider for several years and never encountered an ‘outage’ whereas there have been quite a number in FM recently. I would have also mentioned outages I have seen before this time but to be honest I can remember exactly how many there were or when they occurred but I can tell you there have been others and out of the web-sites that I regularly use I feel that FM is down more than other major sites. I think that your comment about up-time should be clarified. There are two types of downtime; planned and unplanned. To say that the total uptime is satisfactory may be true in terms of the total time. However if the downtime is made up of short bursts of unplanned outage this is more disruptive to the user and more costly to the company than planned downtime of which users are notified in advance. I think that reducing the frequency of these short outages should take priority over the numbers published by the bean-counters about total outage time. If FMs total outage time is really low the possibility that this is due to the fact that problems are cured after they occur rather than before the event should be entertained. I for one would be willing to accept more frequent planned outages if this allowed unplanned downtime to be reduced. Larry |
|
|
|
|
|
#8 |
|
Ultimate Contributor
Join Date: Dec 2001
Location: Canada.
Posts: 10,355
|
Having worked for more than twenty years on computerized building control systems I would like add my two cents worth to this thread. Computerized systems for all applications are similar in a way; they all rely on a central computer (or a number of central computers) to get the job done. In the controls industry systems that are deemed esssential are always designed with built in redundancy. This redundancy usually consists of a number of stand by (or slave) machines that are "always" communicating with a master machine. Any computer that can take down a whole network will be configured to have some kind of backup. Data between machines on a master/slave network is synchronized 24/7. A smart box will switch machines if the online machine fails (for any reason) and an alarm will be sent out. Before a new system is put into service redundancy is extensively tested, this happens when a system is first commissioned and on a monthly basis after that. Yes hardware can break down, but one failure (on one machine) does not have to take down a whole network of machines. Building control systems and email systems are similar in way. They are a bunch of computers that are networked together that talk to the real world. So, how does this work in a real world application? Well it works very well. I frequently make calls to buildings where central computers have failed, but systems will still be online and working and will have seen no interruption of service. Having said this I would like to add that these systems are very expensive and it might not be possible to sell a system configured this way and remain competitive. --david
|
|
|
|
|
|
#9 |
|
Cornerstone of the Community
Join Date: Sep 2002
Location: SF, CA
Posts: 700
|
One thing I have noticed when people say they can't access fastmail is not really that FM is down, but that it is unreacable from their network location.
The internet is HUGE, and vastly complex. The worldwide routing tables (which we carry in our router) are enormous. So, routes go down, and you can't access parts of the internet, including FM. So, thats one problem which I bet it common The other one, as others have mentioned, is someftware and hardware failture. From the note that Rob, Jeremy, Onno, etc post, I don't feel like FM has a lot of hardware failure. Maybe they do. That leaves software. Yes, some of these problems can be mitigated by server redundancy. Here at my college, we have our mail server (iPlanet) clustered across 3 Sun's to ensure avilability, but that takes over $100,000 worth of equipment and knowledge of Sun Cluster software. Basically, my point is this. I bet FM has fewer problems than most other providers given what they offer. You only don't see it with places like Hotmail because they have some ungodly number of servers load balanced behind an Arrowpoint or something, so you never see one go down. FM I am sure doesn't have that kind of money, so you will see a failure of software or hardware affect users. Whats my point? 1. These failures come in clusters, so we often dont see any for months 2. They aren't major in terms of data loss (yes, if you rely on it for business email its important to YOU, of course!) 3. They really are short. 4. They aren't common 5. They are to be expected. If you want near 100% uptime, then you need to be going out, buying a Sun Cluster ($50K), and admin to support it ($50K/yr), and service contracts for those machines (15K/yr). |
|
|
|
|
|
#10 |
|
Cornerstone of the Community
Join Date: Dec 2002
Location: Boston
Posts: 611
|
My 2-cents: I think a lot of us would like FastMail to have 99.999% scheduled uptime, or 5 minutes downtime a year. As I understand it, "five nines" is the threshold demanded for most server applications.
Regarding Fastmail, FM's mail-queue and ISP seem to have a 99.999+% uptime -- incoming mail almost never bounces. However, outgoing mail has issues (spamcop), and server uptime seems more like 99.9% which is fine for government work, but too low for business power users. I'm not a computer reliability engineer, but my guess is that it's unrealistic to expect FM to maintain 99.999% uptime when there are frequent updates/upgrades to the software/hardware, and a quickly growing userbase. It's a tradeoff. What can be done about this? The options I can think of are: 1) tolerate the 99.9% uptime 2) host critical accounts on a dedicated server 3) host noncritical accounts on a beta server 4) give in and pay spamcop the $1000 or whatever it takes to keep FM off their blacklist |
|
|
|
|
|
#11 |
|
Senior Member
Join Date: Mar 2003
Location: UK
Posts: 168
|
Thanks for the replies guys. They help me to understand the situation. As so often is the case frustration is caused through lack of knowledge.
For me at least these comments are good enough to explain the situation what the solutions are and to let everyone decide for themselves whether they are worth implenting. Larry |
|
|
|
|
|
#12 |
|
Administrator
Join Date: Aug 2001
Location: UK
Posts: 3,118
|
Here's my $0.02 on the (infrequent - this must be stressed) outages.
A) People often start 5-6 threads on the same outage, not all of which necessarily get munged into a single thread if they start to veer off along different conversational directions. So you have to be very careful to distinguish e.g. 3 threads chatting about 1 outage, and 3 threads chatting about 3 separate outages. B) What has happened a few times in the past (and I think that the recent outage is similar) is that something breaks that has never broken before! The Fastmail team rush to fix the problem, patch the bug or what-have-you... and they're generally on the job very quickly, I can tell you! Unfortunately, as with most very complex software installations, there are all kinds of dependencies within the Fastmail code, and what can happen in practice is a fix in one part of the code exposes or creates a new vulnerability (in a kind of domino knock-on effect) that takes the server down again within a few hours. So it's only when the FM tech folks have navigated their way though all the knock-on issues that the server will stabilize itself again for a period of weeks or months. The above two observations, taken together, go a long way towards explaining why, on the rare occasion when Fastmail "fails", it either fails a number of times in relatively quick succession or appears to have done upon a casual perusal of the threads on this Forum. |
|
|
|
|
|
#13 |
|
Cornerstone of the Community
Join Date: Dec 2002
Location: Boston
Posts: 611
|
E-
How about making a sticky FM outages thread? |
|
|
|
|
|
#14 | |
|
Cornerstone of the Community
Join Date: Sep 2002
Location: SF, CA
Posts: 700
|
Quote:
|
|
|
|
|
|
|
#15 | |
|
Administrator
Join Date: Aug 2001
Location: UK
Posts: 3,118
|
Quote:
Most of the time, the most recent outages thread is buried about 5 pages into the site (because outages are infrequent) and that's how it should be. |
|
|
|
|