Recently in Cloud Services Category

Is OpenStack "Off the Rack"?

| Comments | TrackBacks
openstack.gifOn July 19, 2010, Rackspace led the announcement of OpenStack, with a goal of creating an open source cloud software solution for use on industry-standard hardware.  The initial releases contemplate solutions for both cloud compute and object storage.  While these are the first two releases, they are separate offerings.  Remember, cloud storage is not just the storage target for cloud computing, it is one potential storage target for cloud computing, and is in and of itself a stand alone cloud offering of programmable storage.

Now, I have purposely used a term from the clothing industry, "off the rack", to spend a moment looking at a framework for evaluating the opportunities this may present.  With dress shirts, you can buy off the rack, semi custom, or custom, each with a unique value proposition based on fit, choice and cost.   Interestingly enough, this may be a good lens through which to consider the possibilities of OpenStack, and in particular, OpenStack Object Storage.

Rackspace has made no secret of its motivations for leading this initiative, and its desire to focus on "fanatical" service as it's key differentiator versus the fundamental technology on which the service is based.  Fair enough, and so the question becomes, is the rapidly emerging and immature cloud marketplace already "mature" enough to seek homeostasis?  (Homeostasis is the property of a system, either open or closed, that regulates its internal environment and tends to maintain a stable, constant condition.)  Have enough models and innovations, from startups, academia, open source movements and large tech companies, been tested in the marketplace to the extent that we can already race to the common denominator?  Perhaps now is a good time to start, as long as you are willing to acknowledge that the desired results are a good ways off.

Before we jump off into "Off the Rack" software, a quick look back at open source is helpful.  For more reading on the open source software industry a good introduction is The Cathedral and the Bazaar. Six things are particularly interesting: 

  1. An open source alternative can emerge as a follow on to a successful commercial technology and can become pervasive versus the commercial offerings it succeeded (LINUX versus UNIX is the reference case here).
  2. A second result of this approach can also end up with a big success, although in more of a niche than a pervasive replace for the earlier commercial offerings (MySQL versus Oracle, IBM and Microsoft in the relational data base space).  
  3. An open source effort can also emerge earlier in a technology cycle and come of age as a pervasive solution (Apache Web Server comes to mind here).
  4. Open source generally requires very careful cultivation of the community of developers, with active interest by academia (and partnering with NASA is part of the formula here).  Commercially sponsored open source efforts are becoming more common, although it as of yet has not been proven as the typical "breeding ground" for most great open source successes.  Eucalyptus, with its roots at University of California Santa Barbara, seems to be a more traditional route.
  5. Open source is not necessarily reflective of rapid commercial opportunities for success.  Eucalyptus is obviously beginning to maneuver towards a repeat of the commercialization model.  OpenStack is taking the approach most favored by other open source successes like Apache.  A couple of good reads here are this article from BusinessWeek and this. See also Derrick Harris' post over at GigaOm.
  6. There are also hundreds of thousands of open source projects that had mixed success or languished altogether. A quick look at  SourceForge (an open source project hosting site) shows nearly a quarter million hosted projects. How many of these have languished or had little impact on the market.
So, the first issue is that there will exist for some time to come a real question as to the adoption potential of OpenStack.   I believe that adoption is driven by applicability to need.  In a moment we will address a serious issue which OpenStack Object Storage must overcome to be successful, at best, and at worst, will confine it to a niche market.  My views are very much directed at the Object Storage offering, versus the compute offering, which I believe exists in a different space and as a different type of solution.  With this backdrop, let's have a look at the cloud storage marketplace today, and use the analogy of off the rack, semi custom and custom:

  • Off the Rack:  implement as is, one size fits all, each with unique approaches for performance, scalability, bit integrity, may or may not provide geo services.
  • Semi Custom:  Select from storage types (DAS, SAN, NAS, JBOD), shared or distributed file systems and object systems, mix and match storage for different SLA and cost/usage patterns on the same infrastructure, multiple APIs, meta data and catalog abstracted from storage layer, geo services.
  • Custom:  Generally a service only offering and not available as deployable infrastructure, specifics will vary widely based on service provider offering strategy.

Infrastructure

Type

Comments

Eucalyptus

Off the Rack

Limited S3 APIs

OpenStack

Off the Rack

CloudFiles APIs

Scality

Off the Rack

S3 APIs

Mezeo

Semi Custom

Mezeo ReST APIs and S3 APIs

NetApp

Off the Rack

Bycast APIs, NetApp storage

EMC Atmos

Off the Rack

Atmos ReST APIs, EMC storage

Service

Type

Comments

Amazon S3

Custom

S3 APIs

Microsoft Azure

Custom

Windows centric

Rackspace

Off the Rack

Is the basis for OpenStack

Nirvanix

Custom

SOAP APIs, multi node

Google

Custom

Offers S3 APIs

AT&T Synaptic

Off the Rack

Based on EMC Atmos

OpSource, SoftLayer, Layered Tech and others

Custom

Based on Mezeo

As you can see from the summary above, there exist as many views of what constitutes either a cloud storage service or a desirable cloud storage deployable infrastructure as there are service providers and vendors.  Note that a semi custom infrastructure results in a "custom" service as implemented.  "Off the rack" results in very similar services by those who utilize the same infrastructure unless they make their own major additions.  Any offering can be differentiated by service, and the degree and quality of service is critical to customer satisfaction and plays a strong role in value creation.

The OpenStack announcement as it regards Object Store and its approach to cloud storage seems to view cloud storage infrastructure as highly akin to an operating system (or at least a "hypervisor") and more similar to a selection of LINUX or Windows than that of an application or middleware layer.  While I agree that cloud compute is very close to this model, cloud storage is a service oriented architecture, with programmability for new applications that can tolerate Internet latency because of Web Services (like ReST APIs). The industry constantly overlooks this key point as it is consumed with the low cost, pay for use and thin provisioning capabilities of this storage tier.  Solutions for thin provisioning and low cost have been available far longer than cloud storage. Further, pay for use is more of a business decision than a technology. 

In the earliest days of cloud storage, there existed initial confusion that cloud storage was defined by cost, scalability, pay for use, and thin provisioning only and not programmable access (usually via ReST APIs).  ParaScale paid a huge price for not understanding that cloud storage requires Web services (like ReST API) access.  Now, with OpenStack Object Store, we see a follow on case of this same perspective, but with basic APIs for Put, Get and List.   Yes, it provides for Internet access via ReST APIs, but the focus continues to be primarily cost based versus new application enablement based.  It could be argued that the open source approach will provide for the appropriate additions of "advanced services" to be added.  However, even the use of the platform by NASA is more focused on cost of storage than on advanced functionality because NASA stores much more data than almost any institution or enterprise in the world.

I think Savio Rodrigues states this view very well in his post:

"Select products based on business needs, not license alone: It's also interesting to note that very few enterprises are in NASA's position with regards to size of IT investment and skills in-house. While NASA engineers were ready and willing to contribute new features into the Eucalyptus open source community, few companies have the skills or governance to consider allowing their developers to contribute to open source projects.  Summary trend number 7 from the 2010 Eclipse survey results highlighted this issue.

To suggest that NASA's buying or IT decision making patterns represents much more than the top 1 percent of IT buyers would be a stretch."

The overwhelming majority of enterprises would rather pay a vendor to deliver, maintain, support and enhance their private cloud software infrastructure than place that burden on internal IT staff. Whether the enterprise is paying for a closed source commercial product, a commercial product based on an open core product, or a subscription to an open source product, the product selection decision will be made based on business requirements much broader than 'is the product open source or not?' "

Keep in mind that cloud storage is a stand alone service associated with application delivery over the Internet and also associated with low cost, pay for use, scalable storage resources.  Social media applications and many Web based applications exploit these capabilities; for example publishing a file to a URL and significant tagging of files.

This view of cloud storage as nothing more than cost and volume-based ignores its extraordinary importance as a service-oriented architecture for new application enablement.  I believe both views are equally important and need to be equally served.  Will OpenStack, with its pervasive cost focus, be able to drive its community to this additional view of needed contributions of advanced services for cloud storage?  Lydia Leong of Gartner Group provides an interesting view of the open source community issues associated with this in her post:

"At the same time, open sourcing is not necessarily a way to software success. Rackspace has a whole host of new challenges that it will have to meet. First, it must ensure that the roadmap of the new project aligns sufficiently with its own needs, since it has decided that it will use the project's public codebase for its own service. Second, it now has to manage and just as importantly, lead, an open-source community, getting useful commits from outside contributors and managing the commit process. (Rackspace and NASA have formed a board for governance of the project, on which they have multiple seats but are in the minority.) Third, as with all such things, there are potential code-quality issues, the impact of which become significantly magnified when running operations at massive scale."

One last comment on this business of vendor lock in and cloud storage APIs (another focus of the OpenStack announcement).  I would submit that while a specific set of APIs has the potential to create vendor lock in, this is a much smaller problem than what is experienced in other technologies.  If you are really worried about it, you probably have never actually written a ReST API call.  It is written in many languages, and we have seen cases where applications that run on S3 run unchanged on Mezeo.  Others need very minor modifications, and still others are excited to take advantage of some of the unique Mezeo services.  It just is not a problem, and this is much more related to FUD (fear, uncertainty and doubt) and marketing zealotry than it is associated with technological reality.  The APIs of choice will shake out, and it is far to early to say if it will be S3, OpenStack, CDMI or a combination of all of these, and others, as yet unforeseen.  (At Mezeo, we have never believed there will be one winner, and instead focused on architecture to enable easy and effective delivery of whichever APIs stand the test of time.)

The interesting view that seems to be missing here is that marketplace competition by service providers already serves to drive down the price of cloud storage, so
a commoditized stack embraced by most is unlikely to yield extraordinary incremental savings.  At the same time, while the competitive market conspires to drive cloud storage costs ever lower, the need to differentiate, and deliver solutions as well as a programmable storage to enable multiple new and exciting types of applications will rapidly replace the pure cost and scale focus of current cloud storage offerings.  Sometimes, the "new" application is simply enabling it in the cloud, to produce the same result at a lower cost!  This requires significant cloud storage functionality in order to make this easy and productive.  Amazon continues to prove this with their many additions and capabilities which differentiate their service.  Mezeo sees much the same view on the part of our customers.  The focus is on what cloud storage can do, what problems will it solve, what business opportunities does it create, what new applications can it enable and all of these views assume it will be competitively priced.

Cloud storage represents significant opportunities for institutions, the enterprise (see my recent post on the business case for enterprise cloud storage) and for the IT service provider.  Cloud storage is substantially different from cloud compute, and requires that you understand this difference in order to effectively evaluate the impact of this announcement, as well as your next steps.
There is no doubt that every enterprise has devoted some time and energy to evaluating how cloud technologies can best be put to work in their ongoing pursuit of cost reduction and to a lesser extent for potential improved service levels particularly around rapid provisioning of compute and storage resources.  Mezeo has recently begun to work with various enterprises, and I want to share some of the opportunities that appear to align strongly with these two goals.

In terms of cost, most enterprises are experiencing continued and significant growth in unstructured data.  As they look at the cost of this growth, it is more than just physical storage, data center occupancy, bandwidth utilization and power and the accompanying management demands; it is also the backup and disaster recovery requirements and the ability to quickly satisfy users who need more storage in order to execute whatever tasks and jobs they have.  Against this backdrop, the drumbeat of Amazon S3 and other public storage clouds advertises storage at costs that are generally below the internal "advertised" cost of the typical Fortune 500 company.  What gives?

First, cents/GB/month is only the tip of the iceberg, and bandwidth along with access charges gives a more realistic cost appraisal.  Next, real and legitimate concerns about data security exist (will someone gain unauthorized access, by accident or via an attack, to company data stored in a multi-tenant public storage cloud?).  Also, data integrity concerns are well founded (will the bits I store be returned, and will they be backed up and appropriate DR measures taken?).  Finally, can I absolutely trust the service provider to execute to the extent deemed necessary, and if they do, can they really save me any real money versus the assumed risk profile?  Private cloud computing is an appropriate strategy for addressing these issues.  

Not all unstructured data is a candidate for the latency of cloud storage as delivered from an IT service provider via the Internet.  So, while some tiers of data may be appropriate for a cloud storage service, it is a subset of the enterprise unstructured data requirement and not a lower cost panacea.  Hopefully, CIOs can easily make this case with their peers in senior management, although it may sometimes seem like they are making an excuse for keeping control and not exploiting new technologies.

Question one surrounds the cost proposition, and our analysis suggests that, even at sub petabyte initial cloud sizes, the enterprise can deliver economics for in-house cloud storage that compare very favorably.  In fact, it may even be lower than what is available from a service provider.  The Mezeo team comes from both a hosting and a cloud storage background, and this just reinforces our view that the cost proposition for private cloud storage has favorable economics.  However, if you are being forced to allocate capital for data center build outs, or you are otherwise CAPEX constrained, the hosted public cloud economics can be quite appealing.  Since businesses require positive margins, this further drives up the cost of cloud storage as hosted at a public service provider.

The case for improved user satisfaction is similar, regardless of public versus private, because the cloud gives users the capabilities they want.  First, with rapid provisioning of pay-as-you-go low cost cloud storage, the end user gets what they need when they need it via a frictionless interface.   Second, several benefits drive end user demand for cloud storage; including: avoidance of workstation storage upgrades, one solution for file sharing and collaboration, new capabilities and applications that exploit file search, tagging and publishing to a public URL, and the ability to access your storage anytime, anywhere and on any device.  Third, the solution is also ideal for implementing a workstation backup solution with sync.  It is not hard to see why end users would find all of these capabilities appealing.

Cloud storage clients, gateways and edge devices are also beginning to appear, and can solve many different issues.  For example, a client gives the end user access to multiple cloud storage accounts at multiple providers.  Why not replace that tape backup operation at a remote location with an iSCSI interface directly to a storage cloud, for a scheduled backup without local user intervention (get rid of the tape backup of your local file server, forever)?  Speaking of file servers, multiple solutions for replacing or even displacing file servers are coming to the market.  The savings from removing an entire layer of infrastructure are quite compelling.

New applications, including use of social media, may require file publishing.  Cloud storage allows you to store training videos, and make them easily available at every end user in the company.  Tagging and search offers new application capabilities, and new opportunities to support existing compliance requirements.  Secure file sharing, versus file publishing, may be a significant requirement as you work with customers and business partners.  Partner, customer and employee portals can reach new levels of capability with API accessible cloud storage, as the availability and the management of information is delivered via the cloud.

Our observation is that the early adopters have begun the move to cloud storage.  Why?  Simply, enterprise private cloud storage allows you to gain many of the benefits and set aside the security and data integrity concerns of public cloud storage.  At the same time, data tiering and private and public could solutions will drive "hybrid" cloud approaches that will allow the enterprise to exploit the best of both worlds.  In an upcoming post, we will offer up some tools to examine the cost and the benefits of cloud storage for the enterprise. 
It is a little difficult to discuss an article like What's a Hybrid Cloud and Where Can I get One? without at least agreeing upon some sort of definition.  We've already heard many of these definitions, but I'm not sure they're good enough.  Note: we did try to define the term hybrid earlier, as part of our Cloud Storage Maturity Model.

Well, what is it? 

First, let's look at the textbook definition of the word hybrid:

A hybrid is the combination of two or more different things, aimed at achieving a particular objective or goal.
A "hybrid car" has both an electric motor and an internal combustion engine, and the combination of the two serves to propel your automobile while providing a more efficient use of fuel.  So, a hybrid cloud is a combination of a public and a private cloud, aimed at providing a common cloud computing experience.

But for what purpose?  Hybrid computing clouds provide cloud computing that delivers the appropriate offerings with provisioning, pay-as-you-go for relatively limitless capacity, and improved security, and some would say at a lower cost than an internal cloud. Hybrid clouds can and do offer the opportunity to provide baseline processing within your own facilities, and use service providers for peak requirements.  By doing this, they can lower the cost versus private cloud computing. 

I've seen some hybrid cloud definitions that include edge or gateway devices, but I do not think that is definitive for hybrid cloud.  Now, with this definition, we can sort out what a hybrid cloud actually delivers.  In general, the argument that a multi-tenant public cloud is lower cost (on an absolute cost basis) than a private cloud is hogwash, in my experience.  I have seen examples of all of these, and in the case of a large enterprise, they may very well run private clouds for their own use that cost less than what they can buy the resources for on an open market basis.  (Now, before the switchboards light up with capex versus opex and idle resource arguments, I want to assure you that even taking these issues into account, the theory holds water).  This still begs the question as to what purpose does hybrid cloud serve?

In its most general case, the business value of hybrid cloud lies in its ability to bridge the gap between baseline computing and peak computing, assuming all things are equal or if not equal, at least acceptable (in terms of security and other incremental costs associated with hybrid cloud).  Otherwise, why go to the trouble?

There are other examples that are associated with backup and disaster recovery versus cost that also can be of high value with a hybrid approach, particularly if you only have one data center.  I store my backup locally, in case I need to do a speedy recovery.  I store an encrypted copy remotely, at a service provider, for DR purposes.  Voila!  Low cost, secure, multiple requirements solved.  Hybrid, it's a beautiful thing.

The hybrid cloud can also allow you to "bridge the gap" if you are in a data center bind, i.e. out of space or between build-outs.  This is a special case of bridging the gap.

Where can you get this, now? 

This is exactly our game plan (at Mezeo), and working with backup and archive providers, as well as Mezeo-based cloud storage service providers, and Mezeo private storage clouds for the enterprise, we deliver this solution today - in a matter of days and weeks, not months!
Cloud Storage Strategy interviewed Gladinet co-founder Jerry Huang on cloud desktops, cloud gateways, and his company's business model. 

[NOTE: Gladinet is a customer of Mezeo Software.]

gladinetlogo.jpg

How does Gladinet position itself as the "desktop in the cloud?" What does that mean?
Actually we position ourselves as "a cloud on the desktop" instead of "a desktop in the cloud". The "desktop in the cloud" is more of an EC2 use case; you have a virtual machine in the cloud and use the Remote Desktop Protocol to access it.
 
"Cloud on the Desktop" is different. We view the PC as important infrastructure in this picture, because PC performance and functionality continue to improve, while broadband gets faster and cloud services leverage economies of scale, driving the price down or the SLA up. We see local storage growing side by side with cloud storage. We view the desktop as a feature rich portal where cloud storage and services live side by side with local storage and applications. The desktop provides an important platform these services to interact with each other.
 
How do you define the term Cloud Gateway? What is Gladinet's contribution to this space?
A cloud gateway is a piece of software or an appliance that facilitates connectivity between the end user's PC and cloud services.
 
Gladinet's CloudAFS (Cloud Attached File Server) has cloud gateway capability. It can help native CIFS/NFS clients (on an end user's PC) to connect through AFS and reach out to the cloud services. It can also help individual Cloud Desktops to reach out. Another important part of AFS is identity management. When you have a group of users with windows identities, the ID management is part of the functionality of a gateway. 
 
In our view, the Cloud Gateway is different from the Cloud Desktop Client that sits directly on the user's PC. While the desktop client serves one single user and one single PC, the Gateway serves a group of users and a group of PCs.

For the IT folks, how do you attach the Cloud to your existing IT infrastructure instead of migrating existing IT Infrastructure to the Cloud? How does this mitigate the risk and lower costs?
Different stages may have different usage patterns. We view the current stage (2009-2010) as an early stage of cloud storage adoption. If you tell a CIO now to throw away existing IT infrastructure and migrate to the cloud, it may not sell. If you tell a CIO to keep the existing IT infrastructure and expand it with the advantages that the cloud has, it may be easier to get adoption.  So we aligned our product and marketing messaging around attaching and expanding IT infrastructure in a non-disruptive way.  The picture we were painting is that you install CloudAFS and you then expand your existing file server with Cloud Storage. The existing file servers still runs, still providing file shares to existing users. Yet, the file server is backed up by the tier 2 cloud storage and the cloud storage may replace tape backup.

However, if we were in 2013 or2014 and looking back to this stage, we can view this expanding local IT infrastructure with Cloud as the starting stage of migration. When people start to experience the mixed environment of tier 1(local) and tier2 (cloud), they can see and experience how to best take advantage of both and can drive up cloud storage usage.
 
Mitigating the risk comes from a non-disruptive addition to the file server capacity. Lower cost can come from different places, like replacing tape backup.
 
How does Gladinet's business model give it a leg up over the competition? 
An analogy could be made with the start of the PC makers. At the beginning, there were many PC makers. IBM/Compaq/HP/Dell were the big ones, and there were also Packard Bell and other small ones. A successful business model then could be to create a component that all the PC makers can use instead of focusing on only on a few.

Today, there are many cloud storage vendors, mostly in the US. Clones from Germany, Japan and other countries are also coming as well. We believe creating a component that every cloud storage vendor can use to help cloud storage sales is more useful than focusing on just a couple of the big ones. 
Cloud storage is already showing signs of Phase Two (see our post on the cloud storage maturity model), as a new set of solutions arrive in the marketplace.  These solutions are referred to as cloud gateways, on ramps, cloud clients, edge devices and other exotic names. 

For ease of discussion, lets use "cloud client" to describe a solution that is on a single user device (workstation, PDA, Tablet) and "cloud gateway" or just "gateway" for a solution that is delivered on a server or router for many users.  Whether they are a client or a gateway, some store a "blob" of data, and some store "chunks" of data that are parts of the original object.  Others store the actual object.  What's the difference and is it important? Should you consider it in your cloud gateway use plans?

What is a blob?  A blob can start as either a single object or a collection of objects, for example, all of the files on a single server, or a VM image.  Then, you do something to it in the client/gateway device that requires it to be brought back through the original client/gateway to be returned to a useful state.  Examples include de-duplication and compression followed by encryption prior to transmission of the object to the cloud (I call this D/C/E).  The result is a "blob" of data, an object that is minimized in size, and must be retrieved by the application that created it in order to be useful again. 

A chunk is part of an object, and the original object must be re-assembled by the gateway that parsed it in the first place. Some gateways store blobs.  Some store the object in chunks.  Finally, some store the actual object with its original file type, intact.  These may be workstation clients, or interface solutions that allow for a CIFS or iSCSI (today, TwinStrata is an example of the iSCSI capability) attached device to store in the cloud.  There are trade-offs and advantages associated with each approach, and your cloud storage use case and objective must be carefully analyzed in order to determine the applicability of the gateway to your business requirement.

Now, let's consider D/C/E.  This provides savings in addition to the savings associated with cloud storage.  D and C gives you a small object size, so your bandwidth cost is lower, and your overall storage cost is lower.  When there is a change to the stored objects, chunks allow you to send only the changed part of the object, reducing bandwidth and potentially improving performance.  Encrypting, or chunking, or both, may improve security and relieve you of the costs and management associated with other security approaches.

So, blobs and chunks sound pretty good, providing better security and lower costs.  What's the catch?  First, storage clouds are great places to provide anytime and anywhere access to your data, from multiple devices.  If you have to go back to a gateway to get the original version of the object, that flexibility may be very limited or non-existent.  Clouds are also a great place for sharing and collaboration, which is not in play if the object in the cloud is not in a useful form.  Finally, vendors are not giving gateway solutions away - we must ask what they cost, and are they worth it?

As usual, the answer is, it depends.  What services can I get from the cloud? And what services can I get from the gateway?

An example that is getting a lot of attention is file server replacement, or even better, file server displacement.  I get less excited about replacing a file server with another server that is a policy driven cache, because I still have this layer of technology in place.  However, if you can displace most of your file servers, then the potential for significant cost savings become obvious.  

I tend to look at single user clients as very interesting on ramps to the cloud.  A client, using some modest amount of workstation storage as a cache, can deliver most of the benefits of a file server.  Companies like Gladinet, SMEStorage, GoodReader, Mezeo and others have very interesting cloud clients.  You will still need a few file servers if you need to provide a place for very large files.  Interestingly enough, those very large files are often rich media (like training videos), and streaming them to a reader on the client from the cloud is often good enough.  Another cloud client capability we expect to see will allow the end-user to store files and move them across multiple storage providers - from private to public and vice-versa, for example.  This functionality could also be in a server-based gateway.

Another cloud client capability might include giving encryption capability to the end user, and let them decide if they want to encrypt the file themselves.   Or, use a cloud that provides user selectable encryption.  Give your end users or customers the power of choice, the freedom of access anytime and anywhere, the ability to get the amount of storage they need when they need it (what Gartner calls "reservationless", and kudos for them, great term).  Don't tie users to a "home base" gateway that does not store their object in it's original format, or at least give them a choice.  All that being said, we are seeing that some mix of clients for file server displacement, and file server replacement gateways may ultimately be the appropriate solution.  

Backup and archive is a different story, and here a gateway can make a lot of sense.  First, there is quite a bit of local housekeeping associated with these solutions, and the solution can decide if utilizing the cloud for some or all of the files makes sense. Speed of restore is a major consideration for a backup, and may drive local versus cloud based storage solutions.  Further, the need for a disaster recovery site, or to archive, can often be a cloud use case.  Companies like Zmanda and CommVault are very active in cloud based backup solutions.  What if you have applications that do not speak REST APIs, like a legacy backup solution?  There are gateways that can attach these legacy applications to the cloud, for example, TwinStrata.

Special purpose gateways can also solve an immediate problem.  Blue Thread offers a cloud storage interface for SharePoint.  The marketplace is rapidly developing a portfolio of cloud storage gateways and clients, as well as backup and archive solutions and all have their own unique perspective on cloud use.  Examples include StorSimple, Cirtas, Gladinet (who also makes clients), and EntropySoft.  Venture capital companies are deploying significant capital for these sorts of solutions.  Each of these solution providers sees a clear path to adding significant value to cloud storage solution delivery.

Cloud storage requires significant use case consideration to evaluate the functionality required, both in the cloud and in the gateway or client, and where the application or user can best exploit the functionality.  After all, cloud storage is also about empowering the end user with the storage they need, when they need it, at a favorable price, and providing advanced functionality, like publishing and sharing.

At Mezeo, we have both a deployable cloud infrastructure, and clients.  That causes us to look at where the best place to put the functionality is.  That creates a slightly different perspective, and we think it creates very useful products.  On the other hand, nothing gets us more excited than the thought of more solutions that drive cloud storage adoption and usefulness.  For this reason, we are rolling out a new marketing and certification program, Mezeo Ready

With Mezeo Ready™, service provider public storage clouds can easily identify their offering as being "Ready" for use by Mezeo Ready clients or gateways, and backup and archive solutions.  Users of these products can pick one of many trusted service providers hosting Mezeo Ready cloud storage solutions.  This cloud storage on ramp and cloud storage provider "ecosystem" ultimately delivers valuable solutions to customers and is a big part of Mezeo's vision for the cloud storage market.

So, more to come on Mezeo Ready, we are nearing the official announcement of the program, and will extend it to storage providers and file system providers who work with Mezeo to deliver storage clouds, both private and public.  Other solutions, like billing and provisioning systems will also be in the Mezeo Ready™ program.  The changes the cloud is delivering are new and useful, and deliver real value to the institutions and businesses that are embracing them.  The ecosystem is critical to the value delivery chain, and key to providing unique, desirable solutions.

According to a recent Gartner press release, 20% of businesses will own no IT assets by 2012:

Several interrelated trends are driving the movement toward decreased IT hardware assets, such as virtualization, cloud-enabled services, and employees running personal desktops and notebook systems on corporate networks.

The need for computing hardware, either in a data center or on an employee’s desk, will not go away. However, if the ownership of hardware shifts to third parties, then there will be major shifts throughout every facet of the IT hardware industry. For example, enterprise IT budgets will either be shrunk or reallocated to more-strategic projects; enterprise IT staff will either be reduced or reskilled to meet new requirements, and/or hardware distribution will have to change radically to meet the requirements of the new IT hardware buying points.
This is a bold statement. If we believe Gartner, it means that we are at the beginning of an explosion in cloud-based services managed by trusted providers on behalf of the enterprise. Of course not all businesses will choose this path, but a substantial number of industries can and will. As I blogged about earlier, the message from the CFO office is clear. We will see adoption rates rise dramatically as the benefits of cloud services become more obvious to business leaders.

A second point of interest is the prediction that by 2012, India-centric IT services companies will represent 20 percent of the leading cloud aggregators in the market (through cloud service offerings).

Here’s the take-away:

Gartner is seeing India-centric IT services companies leveraging established market positions and levels of trust to explore nonlinear revenue growth models (which are not directly correlated to labor-based growth) and working on interesting research and development (R&D) efforts, especially in the area of cloud computing. The collective work from India-centric vendors represents an important segment of the market’s cloud aggregators, which will offer cloud-enabled outsourcing options (also known as cloud services).
We are witnessing examples of what GE innovation consultant Vijay Govindarajan calls reverse innovation in IT. Natarajan Chandrasekaran, the CEO of Tata Consultancy Services notes:

I’ve seen the new cloud-based computing models for applications and processes gaining currency in emerging markets. Rural cooperative banks and small and medium businesses in India are actually far ahead of their western counterparts in adopting these models. In fact, companies from emerging markets, buoyed by strong domestic revenues and revival in growth, have been making adjustments to their global strategies and fine-tuning their investments in order to be part of the recovery process in the west and build on their global expansion plans.
As the enterprise embraces the cloud, they’ll need a maturity model to help them on their journey. My next post will explore what the maturity model for cloud storage looks like. 

We define hybrid cloud storage as utilization of private cloud storage at an enterprise data center, or a private cloud hosted by an IT service provider with some combination of additional IT service provider-based public and/or private cloud storage.  

In a recent post, Cloud Storage for the Enterprise - Part 1:  The Private Cloud, we covered the definition and requirements of cloud storage as an enterprise solution, and as a technology deployed within enterprise-owned data centers (or at least within their co- location racks and cages).  Fundamentally, a private cloud is also a non multi-tenant cloud (i.e., used by only one entity or related parties within an enterprise or a public sector agency) that is behind the firewall(s).  An additional solution that many enterprises are contemplating is the hybrid cloud, and we will look at the aspects of that solution in this post.  

Before we begin our investigation of hybrid cloud, let's review some of the basics.  The following diagram reviews the differences between public and private clouds:

public_private_clouds.gif
Figure 1.   Comparison of public and private cloud

Many enterprises are beginning their cloud evaluation with a "private cloud."  I extend the definition of private cloud to be a "single tenant" cloud, as some enterprises may chose to use a single tenant cloud hosted at a service provider, versus hosting their cloud within their own data centers.  In the following diagram, we show two private clouds, connected via policy-based replication in two data centers.  This provides the assurance of backup and disaster recovery that many enterprises require.  A third location could easily be added for even higher levels of backup and disaster recovery.

pvate_cloud_entpse.gif
Figure 2.   Private cloud inside an enterprise.

The growth of storage is driving increased costs, and the enterprise is on a continuous search to improve the way they can cost-effectively manage this growing data.  The primary difference between hybrid cloud and private cloud is the extension of service provider-oriented low cost cloud storage to the enterprise.  The service provider based cloud may be a private cloud (single tenant) or a public cloud (multi-tenant).  There are several implementations of hybrid cloud, and several examples are included.   The service provider cloud may enable enterprises to leverage the volume efficiencies of the service providers to realize additional savings. 

A hybrid cloud provides a way of securely using service provider-based cloud storage in combination with enterprise clouds.  Another implementation could be use of single tenant service provider-based private clouds at multiple locations. 

Some examples of hybrid clouds are offered for your consideration, although not every potential approach is covered herein:

hybd_cloud.gif
Figure 3.  Hybrid cloud variation 1: private cloud inside
an enterprise affiliated with a public cloud via a ser
vice provider.

hybd_cloud2.gif
Figure 4.  Hybrid cloud variation 2: private cloud inside
an enterprise with affiliated private cloud via a service provider.


hybd_cloud3.gif
Figure 5. Hybrid cloud variation 3: Private clouds at a
service provider with multiple clouds.

Since the primary motivation for hybrid cloud is economics, let's begin the discussion with an understanding of the economics of cloud storage and then extend that discussion to the hybrid cloud environment. 

The primary cost components of cloud storage include:

1.    Data center occupancy - leased (co-location) or owned and depreciated.
2.    Data center environmental - utilities, cooling, heating, etc.
3.    Storage hardware (leased expense or capital requirements & associated depreciation).
4.    File system and storage management (may be bundled in the storage hardware).
5.    Cloud enablement or platform (discreet or bundled with the storage system).
6.    Systems management and operational overhead.
7.    Backup and disaster recovery.

While it can be argued that the economics at a large scale enterprise are very similar to those at a service provider, listed below are some of the most common reasons enterprises do turn to service providers for their technology solutions:

1.    Capital conservation.
2.    Distraction associated with infrastructure management.
3.    Desire to outsource functions that are required but not associated with core competency (focus dilution).
4.    Poor history of infrastructure management.
5.    Specific issues, for example, out of data center space and not projecting long term needs to add additional data centers, or unable to expand existing data centers and no desire for an additional site.
6.    Redundancy of networks available in data centers that may not be available in the enterprise with assuming additional costs.

Whatever the reason, service providers can solve these problems.  In each of the three hybrid cloud scenarios, there are costs and security tradeoffs that each cloud use-case will consider.  For example, in hybrid cloud variation #1, the economics can be quite appealing, but there are significant security concerns.  One approach to mitigate these concerns is to encrypting an object before replication to a public cloud might mitigate the threat.

Understanding where key functionality is applied in your cloud stack is critical for successful implementation and highly dependent on the cloud and storage subsystem technology, cloud interoperability capabilities, and data use case.  Critical technologies that provide benefits are: de-duplication, compression, encryption for data at rest and data in motion, geo location, geo replication, tagging and search capabilities, and cloud access methods.  I will address underlying cloud technology requirements for the enterprise in my next post.

Cloud Use Case Definitions:

Data Archiving - Storing data for retention management requirements (such requirements may be internally generated, or associated with regulatory and compliance needs).  Archive data must be highly secure, highly reliable over the archive period, and easily searchable.  Archive data is generally encrypted, compressed and stored in a proprietary format. Access to the data is usually very infrequent and thus typical enterprises have leveraged slower access, cheaper tape media or redundant NAS to control costs.  Typical data issues associated with archiving are maintaining the archive and eliminating what is known as bit rot of the data, which is where data becomes corrupt if stored in the same media for long periods of time and not accessed.

Data Backup - Storing data as a replacement copy in the event the original copy is somehow damaged or lost due to user error, system failure, or as a result of a disaster scenario.  Back up data may or may not need to be highly secure or easily searchable, but must be available for quick restore when needed.  This data is also generally encrypted, compressed and stored in a proprietary format. Access to the data is more frequent than with archive data and can be at any level of the organization.  A single file, user, server, site, or the entire enterprise could potentially need to be restored to proper service and backup data must support these highly variable access needs.

Data Access - Storing data in its original format for access by users or other applications.  This type of data is frequently accessed and is the superset of the data that comprise backup and archive data.  Access takes precedence over security, but needs to be easily and quickly searchable and retrievable by users and applications and thus highly available.  Typical issues with access data are the need for fast accessibility of frequently used data balanced against the overall cost associated with storing all the data.  Enterprises often implement tier strategies to stage data in progressively lower cost media based on frequency of access.

hybd_cloud_eq.gif
 Figure 6. Hybrid enterprise use case cloud technology requirements.

Hybrid cloud storage, which we have loosely defined as utilization of private cloud storage at an enterprise data center, or a private cloud hosted by an IT service provider with some combination of additional IT service provider-based public and/or private cloud storage, offers an approach that allows use case, economics and security to prevail when selecting the appropriate approach.  Implementation will also be driven by the technological capabilities of the three building blocks of cloud storage, the cloud abstraction layer, file/object system choice and storage subsystem hardware.

So, our discussion of hybrid cloud storage has likely demonstrated at least one significant additional aspect, and that is complexity.  Starting with use case definition and security requirements, combined with a clear understanding of the unique issues within each enterprise that effect cost, you can map a clear path to the cloud technology and selection of one or more cloud service providers.  Finally, the trusted service provider continues to be another significant requirement for exploitation of hybrid cloud.

  1. Security will continue to be a big issue for the cloud, and, unfortunately, there will be at least one event this next year that is disruptive to Cloud Storage adoption, be it data loss or unauthorized data access.  Security will be an even more important point of evaluation for the use of specific Cloud Storage service offerings. The “trusted service provider“  becomes a requirement when selecting a cloud offering.

  2. Cloud Storage will be characterized by a single word, “more”!  More adoption, more cloud storage offerings by more IT service providers, more variation in cloud capabilities, and more worries and concerns about the cloud.

  3. The intersection of enhanced mobile devices with better wireless bandwidth will be combined with Cloud Storage to create exciting new work/life blended digital life applications. The user experience is of paramount importance.

  4. Cloud Storage will see extraordinary adoption as a solution for backup, archiving and for policy-based georeplication for disaster recovery.
If you're accessing your data anytime, anywhere in the cloud, location shouldn't matter, right?

As it turns out, it does. There are several reasons why it matters where your cloud storage is located:

Legal & Regulatory Policy: How do companies ensure they are archiving and protecting business data to comply with  electronic data laws? According to BCS for example, no matter what data storage and security strategy an organization uses, IT decision makers should consider these six key questions:

  1. Will content be stored and remain unaltered over the required retention time frame?
  2. How will this technology stay updated to ensure long-term availability of records?
  3. Does this technology enable the organization to retrieve data quickly enough to respond to a legal request within the stipulated deadline?
  4. Can this technology grow with the business and meet regulatory requirements?
  5. Can this technology be used with other content generating applications?
  6. How will this data storage architecture address litigation and discovery challenges?
Add to this the effect of country and international compliance regimes and you understand why companies need to determine which data storage regulations affect them and require compliance.  Since the cloud is so new, I can safely wager that the data storage laws of most countries will not yet have a statute for the cloud. Thus, physical data storage laws will still apply.  So your cloud storage may have to be located in-country. This is possible through geo-location and geo-replication.

Performance: To reduce network latency, cloud storage and the applications that access it should be as close together as possible, even in the cloud, and they need to be close to the end-user.  Thus New York-based users who use NY-based applications should have their storage in a cloud in the NY area as well. 

Backup & Replication: Cloud-based backup and recovery makes sense as well. Having multiple instances of your data replicated by geography is a key function for distributed datacenter replication, and shows potential for rapid growth. 

So, at Mezeo, we see three ways to think about cloud storage and geographic options and how to improve the distribution of data across geographically distributed data networks:

Geo-Location: Locating stored objects close to where they will be used for. Faster access via the closest cloud storage instance using data center peering (this also allows you to define where you store your data/objects).

Geo-Replication: Replication through policies, with uninterrupted access to content.

Single Namespace: Providing a single means of access to stored objects regardless of where the objects are located.
 
Geographic placement supports creation of an object in a specific cloud storage instance.  At Mezeo, our replication policy allows for the specification of the locations of the replicants.  For example, the policy indicates "create the object in New York, LA, and Houston."  If an object is created in New York, it will be replicated to LA and Houston.  If it created in Houston, it will be replicated in New York and LA.

Some storage vendors support replication as a component of their disaster recovery recommendations.  If your selected storage vendor offers this option, then the storage solution could ensure there are at least two copies of every object in every instance of Mezeo's cloud storage.  Recovery in the case of disaster with this approach would be handled by the storage vendor's solution. 

By considering a combination of replication provided by storage vendors and replication provided by Mezeo, a service provider could offer a highly differentiated service.  Your customers would be assured of recovery in the case of any possible failure, from a single disk failure to a catastrophic data center loss.  Mezeo works with our service providers to determine the benefits of various replication options and the impact as you design your SLA level(s).

Policies are assigned in the onboarding/provisioning process and may be updated if requirements change.  There are also special situations for policy updates, such as if a particular data center has a catastrophic outage, the policies associated with replication to the Mezeo instance in that data center can be modified.
As we enter 2010, I am going to focus on a series of articles to define the cloud storage opportunity and the business issues for the enterprise.  First, there are some "universal truths" that we need to better understand and define. 

The growth in unstructured data will continue, unabated.  We all know and understand that.  The issue is how to manage this phenomenon, while operating with the assumption that the growth will likely accelerate.  Since the growth is driving increased costs, the enterprise is on a continuous search to improve the way they can cost-effectively manage this growing data.  

Data may exist on removable media, on PCs and PDAs, on various servers within the organization, at data centers, at remote facilities, and potentially at various outsourced service providers.  The data may range from employee personal information (and even personal information from the employees associates) that is not associated with the needs of the business to non-confidential and confidential business information, some of which may be highly critical.  Disparate policies will need to be applied to the data ranging from no control to extreme control.   Of course, there will be the existence of  multiple versions of files adding to the total storage and further exacerbating the challenges of management.

There are many potential solutions to the problem as stated above, and most of them involve some sort of additional controls, policies and restrictions that control the proliferation of data and make it more orderly and secure.  These solutions are then combined with additional focus on reducing storage costs by staying aligned with new storage technology (which continues to reduce costs of storage), and the cycle repeats, endlessly.  In each cycle, trade-offs associated with costs, availability, security, access, restrictions occur, and rarely is there a "perfect" solution.

Is cloud storage a possible solution to the issues as surfaced above?  Is it a discontinuity, a departure, from the "business as usual" cycles associated with ongoing, incremental and continuous storage improvements when new technologies are introduced as they can be accommodated?  

Let's start with discussing cloud storage and its various capabilities.  Note that we are talking about a storage cloud that is housed at the enterprise data center, not a storage service provider.

(1) First, centralize the storage problem:

Cloud Storage addresses the necessary size and scale of unstructured data growth in the enterprise.  Generally, highly scalable file systems, including newer object based systems, provide the ability to manage incredibly large numbers of objects (objects of all sizes) in an efficient fashion.  This is combined with low cost commodity storage devices and servers.  Then a centralized storage pool is ready for use.  It is generally easy to add additional storage to this pool, and both backup and disaster recovery schemes are in place.  So, the first well known method of problem solving that cloud storage utilizes is "centralization."  Let's get a solution in place that we know can scale to the size of the data needs of the enterprise.
 
(2) Second, make it easy to use:

You can't use it if you can't get it, and this is where the topic of "thin provisioning" emerges.  Thin provisioning just means that it is easy to get a storage account (whether I am an individual user or an application / server) and I can get it quickly, no matter how much I need (in theory).  Further, as my storage needs increase, it is easy to get more - quickly.  There are issues like accounting for storage; managing growth and billing for it that also surround the notion of thin provisioning. 

Access is another big topic that surrounds ease of use. The enterprise has multiple needs here.  Legacy applications, utilizing file access methods like CIFS or NFS, will want to utilize the storage cloud.  New applications, written to REST Web services APIs, will also want to coexist.   Finally, individual users will want access from all their device types, including PCs (Windows and Mac, Linux), the Web, and PDAs.  All of this access manifests itself in interesting ways, including identity management of the credentials associated with using the service, bandwidth requirements for accessing the service from many diverse locations, and geo location of data (i.e., if you have several locations where the cloud data is kept, how do you decide which location to use?).

(3) Third, sync your files to the cloud:

Now that you have cloud storage, you ought to think about backup and sync to the cloud.  These two applications are different but somewhat linked.  Sync to the cloud can be used for both cloud loading (getting the data from the device to the cloud, in a background way so that the latency will not be a problem) as well as keeping a current copy in the cloud, but using the local copy on your device (the best of both worlds).  Since your most current copy is in the cloud, it is your backup copy.  Sync is also a solution for keeping files "sychronized" between devices and the cloud, so you always have an authoritative source of your file stored in the cloud.  Of course, all this is based on having cloud access from any device, anywhere (see number two, above).

(4) Fourth, create new, higher impact applications with programmable storage:

Programmable (using http, SOAP or REST APIs) access to storage is the next big revolution in storage.  Tagging, sharing, collaboration, easy search, easy and secure access and multiple views make creating new, high impact applications easier than before.  Take advantage of new functionality that is easily delivered.  Create applications that rely on your data and data that is external to the enterprise.  Develop these applications quickly and at lower cost.  If all you want is cheaper storage, you may be able to get by without a cloud, but without this capability you are missing the revolution that is upon us.

(5) Fifth, secure your cloud:

In my own survey of the industry, security is the major issue on the minds of the IT department evaluating cloud storage for the enterprise.  Several different aspects of security come into play.  Many of these issues are most often associated with using a multi-tenant storage cloud from a storage service provider. Nevertheless, four major security issues prevail before we even begin to consider the issues of going to the cloud at a service provider.

The four issues are:  physical security, unauthorized access, data loss (disaster or device failure related) and bit rot (a subset of data loss, granted).   All of these issues are no different than what you face with your traditional shared storage solutions and most of the solutions are similar.  Your current IT physical security solutions apply to an enterprise hosted cloud.   The identity management policies and practices associated with creating and maintaining account credentials address unauthorized access, just as they do with your current data management practices. Encryption can provide additional protection from unauthorized access. As a matter of fact, the security issues are already in play with your current storage methodology, so nothing new here, unless you move to a service provider hosted cloud (more on this later).

(6) Sixth, lower the cost of storage:

Cloud storage delivers the benefits as discussed in items one through four above, while requiring similar security to current storage activities.  How does it address costs?  First, cloud storage solutions generally allow for using commodity hardware, very scalable file systems, and highly automated provisioning and management solutions.  So, the hardware price equation of differentiation and premium pricing is disrupted.  True, the software doesn't come cheap, but remember that the public cloud storage services are "making the market" and the combination of commodity hardware, environmentals, and enabling software (file system, management and middleware from one or more suppliers) is meeting the external marketplace pricing.  Here is a simple model you should use (all figures expressed in cents/GB/Mo):

Commodity Hardware depreciation                                      $  .02
Environmentals  (data center, power and cooling)                     .02
Management (primarily people resources)                                .02
Enabling Software                                                                  .03 
Other                                                                                    .01                           

Total costs:                                                                      $  .10 (10 cents/GB/Month)

This represents a significant saving for a solution that provides all the capabilities that cloud storage delivers.  What's the catch?  Well, not every type of application and use case for unstructured data is ideally served by cloud storage.  However, many are, and the exceptions should be dealt with as one offs.  The real catch is not taking advantage of this new technology, and all the opportunities it offers, for lowering cost while delivering improved capabilities to end users and applications around the enterprise.

My next post will discuss hybrid, private and public cloud storage offerings, and where savings and security can drive significant benefits for enterprises who take advantage of the cloud storage offerings of service providers.

Sponsors

About this Archive

This page is an archive of recent entries in the Cloud Services category.

Cloud Maturity Model is the previous category.

Cloud SLA is the next category.

Find recent content on the main index or look in the archives to find all content.