What do Sandy, Netflix and CIMI have in common?

Watching hurricane Sandy from a safe distance reminded me that our online culture is vulnerable to a long list of threats: floods, wind storms, wild fires, cyber-attacks can all turn our wonderful handhelds, laptops, and desktops into useless bricks of plastic and steel.

Stormy sky - Publicdomain - Laurie WilliamsWatching hurricane Sandy from a safe distance reminded me that our online culture is vulnerable to a long list of threats: floods, wind storms, wild fires, cyber-attacks can all turn our wonderful handhelds, laptops, and desktops into useless bricks of plastic and steel. Disasters are inevitable, but planning and preparedness can reduce or avoid the damage.


In IT, disaster planning and response are called “failover” and “disaster recovery.” When a system fails over, alternate resources automatically replace compromised resources. Backup generators that take over when the power grid fails are a familiar form of failover. Disaster recovery is the execution of plans to restore service quickly when a disaster occurs. Laptop and desktop users exercise a form of disaster recovery when they restore data from backups after a hard drive fails.


The cloud presents both new opportunities for failover and disaster recovery and new challenges when disaster strikes. Enterprises today have plans to failover to a cloud if their data center is compromised. Many individuals already benefit from cloud backup services that automatically backup their disks and restore them in a few clicks.


Sandy


But what happens when the cloud is the compromised resource? This can happen. Hurricane Sandy brought down a number of data centers in its path. Popular websites like Huffington Post and Gawker went offline when their providers failed. Given that New York is a center for Internet content, it surprised me that the more sites were not affected.


Sandy was bad, but it could be worse if a massive cloud provider like AWS (Amazon Web Services) or Google suddenly goes out of service. The big providers take pains to reduce their vulnerability. Their services are distributed; they avoid single points of failure by locating data centers all over the world. A disaster like Sandy might take down one center, but other centers in distant locations can take over the load. In addition, part of data center planning is to locate where disasters are less likely, away from flood zones and close to power supplies.


Enterprise Disaster Preparations


Despite provider best efforts, sometimes services are degraded or unavailable, and enterprises must cope with the problem themselves.


One method they have to protect themselves is Availability Zones (AZs) offered by cloud providers like AWS and Google. AZs are blocks of cloud resources that are unlikely to fail simultaneously. Exactly how the providers implement AZs depends on the provider. AWS also offers Elastic Load Balancing by Region, representing a geographic area like Western United States or Ireland. By judiciously distributing cloud deployments in different regions and AZs, consumers can exercise some control of their vulnerability to provider failure.


Case Example: Netflix


However, it takes more than careful deployment for consumers of cloud services to avoid service interruptions.


Netflix is a good example. It depends on AWS to provide its popular on-demand streaming media service to its customers. If AWS is not available, Netflix customers get error messages instead of movies, and Netflix loses both immediate revenue for the unavailable content and future revenue from customers who can easily switch to a competitor when Netflix does not deliver. No surprise that Netflix has invested in failover and disaster recovery to minimize these consequences.


They have thus far successfully minimized the effects of AWS outages with automated tools. One technique they use is “zone evacuation” in which they rapidly move services from one zone to another. Using this technique and others, Netflix was able to dodge the bullet on October 22. Fortunately, AWS stayed up through Sandy, but Netflix had prepared for the worst.


Standards in the Mix


Of course, zone evacuation will not help when an entire provider fails. This is unlikely, but I can think of at least two situations that might cause a provider wide disaster. First, we have heard about cyber warfare lately. A single cloud provider with tightly connected similar operating software in all their data centers is vulnerable to an attack that could bring down an entire system in a short time. Second, legal action or fiat could abruptly shut down a provider. I admit that both of these approach the limits of plausibility, but before last week, few imagined that a windstorm could fill New York subways with seawater.


Is there a response to the total failure of a cloud provider? Can providers be evacuated like Netflix evacuates zones? The answer is yes. With interoperable standard management interfaces, such as CIMI (Cloud Infrastructure Management Interface) or OCCI (Open Cloud Computing Interface), cloud consumers like Netflix can consider automated movement of deployments from one cloud provider to another using technology similar to moving them from one AZ to another. As standard management interfaces are adopted, computing will progress toward a rock-solid universal cloud infrastructure that will become the foundation of a next revolution in information technology. 


For a more detailed description of CIMI and OCCI, see my book Cloud Standards.


*Public domain image courtesy of Laurie Williams via PublicDomainPictures.net.

Written by

Marvin Waschke

Marv Waschke is a senior principal architect at CA Technologies. He has represented CA Technologies…

Published in

View this topic
  • James Holland

    This is great. Hooray for Disney’s imagineers!

  • http://www.sheistocktips.com/ SHRISTOCKTIPS

    SHRISTOCKTIPS has
    become a new brand in the share market research with its accurate research. Proven
    itself always right whether market is bull or bear. Last week all paid clients
    booked handsome profit in NIFTY, BANKINIFTY & STOCKS. Now for the coming
    week we expect more correction can come in NIFTY as the IRAQ issue is getting
    more tense, If it happens more then you will see a sharp fall in all world marketNSE BSE, STOCK TIPSbecause as we know all world run on
    crude & most of the crude comes from IRAQ. So be ready for a sharp fall so
    sell will be the best strategy for next week also. Traders can make a sell
    position in NIFTY around 7600-7650 with stoploss 7750 for the target of
    7300-7200.One can also make a sell call NIFTY 50 stocks as per NIFTY levels. You
    can also take our two days free trial to check our accuracy. For further updates
    you can visit our website. http://goo.gl/sMgZ7n

    Regards

    SHRISTOCKTIPS TEAM

  • king lear

    testing comment functionality, please do not publish this

  • http://www.rachelmacik.com Rachel Macik

    Love the personal pic :)

    • CAHighlight

      Thank you!

  • Plutora Inc

    This is a good case study. 2.3 sec’s off a login transaction is big.

  • http://www.linkedin.com/in/michelehudnall Michele Hudnall

    While the analysts were hyping DevOps, I posted the oversight of not including security as part of that discussion as you are highlighting here. Instead of just talking DevOps, it should be DOS (what’s old is new again :-) – DevOpsSec. As a previous AppDev person, it’s the app, who’s using it, why and where rather than the device and having the service available.

    As you rightly point, out Security should be baked into the solution.
    https://www.netiq.com/communities/data-center-solutions/accelerating_business_overhauling_service_management/

    Nice Post and Timely!

    @HudnallsHuddle

    • CAHighlight

      Thank you for your feedback Michele. Agreed – security cannot be overlooked. Appreciate your input!

  • Mitesh

    I would love a printed copy

  • Lars Johansson

    I love the idea of BYOID! This makes me choose if I am almost anonymous (with my Hotmail Nicname) or official with identity from an official organisation. My Identity Provider will attach identity with right level of LoA according to the need of the Service provider.

    • CAHighlight

      Thank you for your comment. BYOID has tangible benefits for end users and relying parties but it also has to be weighed in the balance with potential risks and liability concerns. It will be interesting to see how BYOID plays out in the enterprise.