January 2010 Archives

We define hybrid cloud storage as utilization of private cloud storage at an enterprise data center, or a private cloud hosted by an IT service provider with some combination of additional IT service provider-based public and/or private cloud storage.  

In a recent post, Cloud Storage for the Enterprise - Part 1:  The Private Cloud, we covered the definition and requirements of cloud storage as an enterprise solution, and as a technology deployed within enterprise-owned data centers (or at least within their co- location racks and cages).  Fundamentally, a private cloud is also a non multi-tenant cloud (i.e., used by only one entity or related parties within an enterprise or a public sector agency) that is behind the firewall(s).  An additional solution that many enterprises are contemplating is the hybrid cloud, and we will look at the aspects of that solution in this post.  

Before we begin our investigation of hybrid cloud, let's review some of the basics.  The following diagram reviews the differences between public and private clouds:

public_private_clouds.gif
Figure 1.   Comparison of public and private cloud

Many enterprises are beginning their cloud evaluation with a "private cloud."  I extend the definition of private cloud to be a "single tenant" cloud, as some enterprises may chose to use a single tenant cloud hosted at a service provider, versus hosting their cloud within their own data centers.  In the following diagram, we show two private clouds, connected via policy-based replication in two data centers.  This provides the assurance of backup and disaster recovery that many enterprises require.  A third location could easily be added for even higher levels of backup and disaster recovery.

pvate_cloud_entpse.gif
Figure 2.   Private cloud inside an enterprise.

The growth of storage is driving increased costs, and the enterprise is on a continuous search to improve the way they can cost-effectively manage this growing data.  The primary difference between hybrid cloud and private cloud is the extension of service provider-oriented low cost cloud storage to the enterprise.  The service provider based cloud may be a private cloud (single tenant) or a public cloud (multi-tenant).  There are several implementations of hybrid cloud, and several examples are included.   The service provider cloud may enable enterprises to leverage the volume efficiencies of the service providers to realize additional savings. 

A hybrid cloud provides a way of securely using service provider-based cloud storage in combination with enterprise clouds.  Another implementation could be use of single tenant service provider-based private clouds at multiple locations. 

Some examples of hybrid clouds are offered for your consideration, although not every potential approach is covered herein:

hybd_cloud.gif
Figure 3.  Hybrid cloud variation 1: private cloud inside
an enterprise affiliated with a public cloud via a ser
vice provider.

hybd_cloud2.gif
Figure 4.  Hybrid cloud variation 2: private cloud inside
an enterprise with affiliated private cloud via a service provider.


hybd_cloud3.gif
Figure 5. Hybrid cloud variation 3: Private clouds at a
service provider with multiple clouds.

Since the primary motivation for hybrid cloud is economics, let's begin the discussion with an understanding of the economics of cloud storage and then extend that discussion to the hybrid cloud environment. 

The primary cost components of cloud storage include:

1.    Data center occupancy - leased (co-location) or owned and depreciated.
2.    Data center environmental - utilities, cooling, heating, etc.
3.    Storage hardware (leased expense or capital requirements & associated depreciation).
4.    File system and storage management (may be bundled in the storage hardware).
5.    Cloud enablement or platform (discreet or bundled with the storage system).
6.    Systems management and operational overhead.
7.    Backup and disaster recovery.

While it can be argued that the economics at a large scale enterprise are very similar to those at a service provider, listed below are some of the most common reasons enterprises do turn to service providers for their technology solutions:

1.    Capital conservation.
2.    Distraction associated with infrastructure management.
3.    Desire to outsource functions that are required but not associated with core competency (focus dilution).
4.    Poor history of infrastructure management.
5.    Specific issues, for example, out of data center space and not projecting long term needs to add additional data centers, or unable to expand existing data centers and no desire for an additional site.
6.    Redundancy of networks available in data centers that may not be available in the enterprise with assuming additional costs.

Whatever the reason, service providers can solve these problems.  In each of the three hybrid cloud scenarios, there are costs and security tradeoffs that each cloud use-case will consider.  For example, in hybrid cloud variation #1, the economics can be quite appealing, but there are significant security concerns.  One approach to mitigate these concerns is to encrypting an object before replication to a public cloud might mitigate the threat.

Understanding where key functionality is applied in your cloud stack is critical for successful implementation and highly dependent on the cloud and storage subsystem technology, cloud interoperability capabilities, and data use case.  Critical technologies that provide benefits are: de-duplication, compression, encryption for data at rest and data in motion, geo location, geo replication, tagging and search capabilities, and cloud access methods.  I will address underlying cloud technology requirements for the enterprise in my next post.

Cloud Use Case Definitions:

Data Archiving - Storing data for retention management requirements (such requirements may be internally generated, or associated with regulatory and compliance needs).  Archive data must be highly secure, highly reliable over the archive period, and easily searchable.  Archive data is generally encrypted, compressed and stored in a proprietary format. Access to the data is usually very infrequent and thus typical enterprises have leveraged slower access, cheaper tape media or redundant NAS to control costs.  Typical data issues associated with archiving are maintaining the archive and eliminating what is known as bit rot of the data, which is where data becomes corrupt if stored in the same media for long periods of time and not accessed.

Data Backup - Storing data as a replacement copy in the event the original copy is somehow damaged or lost due to user error, system failure, or as a result of a disaster scenario.  Back up data may or may not need to be highly secure or easily searchable, but must be available for quick restore when needed.  This data is also generally encrypted, compressed and stored in a proprietary format. Access to the data is more frequent than with archive data and can be at any level of the organization.  A single file, user, server, site, or the entire enterprise could potentially need to be restored to proper service and backup data must support these highly variable access needs.

Data Access - Storing data in its original format for access by users or other applications.  This type of data is frequently accessed and is the superset of the data that comprise backup and archive data.  Access takes precedence over security, but needs to be easily and quickly searchable and retrievable by users and applications and thus highly available.  Typical issues with access data are the need for fast accessibility of frequently used data balanced against the overall cost associated with storing all the data.  Enterprises often implement tier strategies to stage data in progressively lower cost media based on frequency of access.

hybd_cloud_eq.gif
 Figure 6. Hybrid enterprise use case cloud technology requirements.

Hybrid cloud storage, which we have loosely defined as utilization of private cloud storage at an enterprise data center, or a private cloud hosted by an IT service provider with some combination of additional IT service provider-based public and/or private cloud storage, offers an approach that allows use case, economics and security to prevail when selecting the appropriate approach.  Implementation will also be driven by the technological capabilities of the three building blocks of cloud storage, the cloud abstraction layer, file/object system choice and storage subsystem hardware.

So, our discussion of hybrid cloud storage has likely demonstrated at least one significant additional aspect, and that is complexity.  Starting with use case definition and security requirements, combined with a clear understanding of the unique issues within each enterprise that effect cost, you can map a clear path to the cloud technology and selection of one or more cloud service providers.  Finally, the trusted service provider continues to be another significant requirement for exploitation of hybrid cloud.

  1. Security will continue to be a big issue for the cloud, and, unfortunately, there will be at least one event this next year that is disruptive to Cloud Storage adoption, be it data loss or unauthorized data access.  Security will be an even more important point of evaluation for the use of specific Cloud Storage service offerings. The “trusted service provider“  becomes a requirement when selecting a cloud offering.

  2. Cloud Storage will be characterized by a single word, “more”!  More adoption, more cloud storage offerings by more IT service providers, more variation in cloud capabilities, and more worries and concerns about the cloud.

  3. The intersection of enhanced mobile devices with better wireless bandwidth will be combined with Cloud Storage to create exciting new work/life blended digital life applications. The user experience is of paramount importance.

  4. Cloud Storage will see extraordinary adoption as a solution for backup, archiving and for policy-based georeplication for disaster recovery.
If you're accessing your data anytime, anywhere in the cloud, location shouldn't matter, right?

As it turns out, it does. There are several reasons why it matters where your cloud storage is located:

Legal & Regulatory Policy: How do companies ensure they are archiving and protecting business data to comply with  electronic data laws? According to BCS for example, no matter what data storage and security strategy an organization uses, IT decision makers should consider these six key questions:

  1. Will content be stored and remain unaltered over the required retention time frame?
  2. How will this technology stay updated to ensure long-term availability of records?
  3. Does this technology enable the organization to retrieve data quickly enough to respond to a legal request within the stipulated deadline?
  4. Can this technology grow with the business and meet regulatory requirements?
  5. Can this technology be used with other content generating applications?
  6. How will this data storage architecture address litigation and discovery challenges?
Add to this the effect of country and international compliance regimes and you understand why companies need to determine which data storage regulations affect them and require compliance.  Since the cloud is so new, I can safely wager that the data storage laws of most countries will not yet have a statute for the cloud. Thus, physical data storage laws will still apply.  So your cloud storage may have to be located in-country. This is possible through geo-location and geo-replication.

Performance: To reduce network latency, cloud storage and the applications that access it should be as close together as possible, even in the cloud, and they need to be close to the end-user.  Thus New York-based users who use NY-based applications should have their storage in a cloud in the NY area as well. 

Backup & Replication: Cloud-based backup and recovery makes sense as well. Having multiple instances of your data replicated by geography is a key function for distributed datacenter replication, and shows potential for rapid growth. 

So, at Mezeo, we see three ways to think about cloud storage and geographic options and how to improve the distribution of data across geographically distributed data networks:

Geo-Location: Locating stored objects close to where they will be used for. Faster access via the closest cloud storage instance using data center peering (this also allows you to define where you store your data/objects).

Geo-Replication: Replication through policies, with uninterrupted access to content.

Single Namespace: Providing a single means of access to stored objects regardless of where the objects are located.
 
Geographic placement supports creation of an object in a specific cloud storage instance.  At Mezeo, our replication policy allows for the specification of the locations of the replicants.  For example, the policy indicates "create the object in New York, LA, and Houston."  If an object is created in New York, it will be replicated to LA and Houston.  If it created in Houston, it will be replicated in New York and LA.

Some storage vendors support replication as a component of their disaster recovery recommendations.  If your selected storage vendor offers this option, then the storage solution could ensure there are at least two copies of every object in every instance of Mezeo's cloud storage.  Recovery in the case of disaster with this approach would be handled by the storage vendor's solution. 

By considering a combination of replication provided by storage vendors and replication provided by Mezeo, a service provider could offer a highly differentiated service.  Your customers would be assured of recovery in the case of any possible failure, from a single disk failure to a catastrophic data center loss.  Mezeo works with our service providers to determine the benefits of various replication options and the impact as you design your SLA level(s).

Policies are assigned in the onboarding/provisioning process and may be updated if requirements change.  There are also special situations for policy updates, such as if a particular data center has a catastrophic outage, the policies associated with replication to the Mezeo instance in that data center can be modified.

Sponsors

About this Archive

This page is an archive of entries from January 2010 listed from newest to oldest.

December 2009 is the previous archive.

February 2010 is the next archive.

Find recent content on the main index or look in the archives to find all content.