« Viewer review "Iran: Going Nuclear" | Main | End of the NPT? »

Moscow goes off-line (updates)

Jonah and I were up again last night ...

msk-ix-full-daily.pngOvernight the MSK-IX (Moskova Internet eXchange point) lost power and most RU networks were lost from full view. [The publication cited, MosNews is run by right-wing ex-pat American business men with an ax to grind, and the head of UES Russia (the power distribution system) Anatoly Chubais, is a rival.]

As you can see from the Moscow exchange point daily, they got clobbered. The cascading failure affected Moscow and towns as far as 200km south of the city. 95% of people were evacuated from the Metro by 1pm. They are restoring power in phases, hospitals are expected to be running by 3 pm Moscow time. The outage hit the southern half of Moscow and some parts of the northern half of the city. Many traffic signals were down and militia officers were manually directing traffic. The space flight centre in Korolev is functioning normally.

Interfax reports that Russian prosecutors on Wednesday opened a criminal case against the management of power monopoly Unified Energy System (UES) after a major power outage in Moscow.

msk-ix-full-daily1.pngSeveral hours later and steady-state appears to have been obtained. There was a drop-out 14 hours prior to the event, some drop-outs 4 hours post-event. Via NANOG.
Update: Moscow Times, RIA Novosti.

msk-ix-full-daily2.pngUpdate: The technical roots of the failure have been blamed on equipment dating from 1958-1962 timeframe which wasn't kept repaired and up-to-date. This is reminiscent of the Comair systems failure and the 9-11 problems with emergency responder preparedness, the Challenger O-ring failure and the Columbia foam incident.

Michael Dillon writes to NANOG this morning:

Finally, a bit more info found in this Russian article link.

According to the director of a well-known but unnamed Russian telecoms company, there were no diesel generators at MSK-IX. They had 3 external power feeds which all failed at once due to the cascading failure. UPS systems lasted from one-half to two hours. He says that they learned the lesson that they need to build a few distributed and technically independent exchanges even in the capital, Moscow.

Some background on the power failure. It started with a fire in old equipment which caused
a major power station to shed load and shut down in the middle of the night. As the sun rose and
Moscow's power demands grew, this initiated a cascading failure which spread 200 kms south.
However, it did not affect most of the northern half of the city. It did not affect the military
who switched to their own generators. This is rather important considering that this "military" is responsible for roughly half of the serious nuclear weapons arsenal in the world. The military brought out their portable generators to support hospitals so it would appear that all hospitals did not have independent backup power.

In Southern Moscow, much of the cellular telephone service also failed. In one of the regional towns a chemical factory released a cloud of nitrogen oxides which cause the population to panic and begin evacuation because in that town even the landlines had failed.

After a lot of work, most power stations were back online this morning. There were only 400 apartment buildings with no power compared to thousands yesterday. The damaged station where the fire occurred is still not functioning and some backup power generation is still in place. The metro is running but some suburban electrical train lines are still shutdown.

All in all, this was a remarkable event. The causes were identified so quickly. They recovered from the outage so quickly. The country's major Internet exchange was shown
to be remarkably short-sighted.

Note: Even a geographically distributed system can fail, as the double hit on a Gulf Coast (National Finance Center New Orleans) and its "backup" in Northern Virginia (Equinix Ashburn F) government data facilities showed last hurricane season (link). Also, unlike John Ashcroft's US, in Russia it is still legal to publish outage data, in fact, not publishing outage data is a crime.

TrackBack

TrackBack URL for this entry:
http://wampum.wabanaki.net/cgi-bin/mt/mt-tb.cgi/1643

we're using {mt v4.x || wp v2.x || drupal v6.x}, {mysql v 5.x || postgresql v8.x}, perl v5.8.8, php v5.2.5, python2.5.2 and apache v2.x, all running on freebsd-releng_7, on one of four ixsystems, housed in the usawebhost colo space in portland maine. everything is minded by ebw. all work by mb williams and eric brunner-williams are © wampum.