As we enter 2010, I am going to focus on a series of articles to define the cloud storage opportunity and the business issues for the enterprise. First, there are some "universal truths" that we need to better understand and define.
The growth in unstructured data will continue, unabated. We all know and understand that. The issue is how to manage this phenomenon, while operating with the assumption that the growth will likely accelerate. Since the growth is driving increased costs, the enterprise is on a continuous search to improve the way they can cost-effectively manage this growing data.
Data may exist on removable media, on PCs and PDAs, on various servers within the organization, at data centers, at remote facilities, and potentially at various outsourced service providers. The data may range from employee personal information (and even personal information from the employees associates) that is not associated with the needs of the business to non-confidential and confidential business information, some of which may be highly critical. Disparate policies will need to be applied to the data ranging from no control to extreme control. Of course, there will be the existence of multiple versions of files adding to the total storage and further exacerbating the challenges of management.
There are many potential solutions to the problem as stated above, and most of them involve some sort of additional controls, policies and restrictions that control the proliferation of data and make it more orderly and secure. These solutions are then combined with additional focus on reducing storage costs by staying aligned with new storage technology (which continues to reduce costs of storage), and the cycle repeats, endlessly. In each cycle, trade-offs associated with costs, availability, security, access, restrictions occur, and rarely is there a "perfect" solution.
Is cloud storage a possible solution to the issues as surfaced above? Is it a discontinuity, a departure, from the "business as usual" cycles associated with ongoing, incremental and continuous storage improvements when new technologies are introduced as they can be accommodated?
Let's start with discussing cloud storage and its various capabilities. Note that we are talking about a storage cloud that is housed at the enterprise data center, not a storage service provider.
(1) First, centralize the storage problem:
Cloud Storage addresses the necessary size and scale of unstructured data growth in the enterprise. Generally, highly scalable file systems, including newer object based systems, provide the ability to manage incredibly large numbers of objects (objects of all sizes) in an efficient fashion. This is combined with low cost commodity storage devices and servers. Then a centralized storage pool is ready for use. It is generally easy to add additional storage to this pool, and both backup and disaster recovery schemes are in place. So, the first well known method of problem solving that cloud storage utilizes is "centralization." Let's get a solution in place that we know can scale to the size of the data needs of the enterprise.
(2) Second, make it easy to use:
You can't use it if you can't get it, and this is where the topic of "thin provisioning" emerges. Thin provisioning just means that it is easy to get a storage account (whether I am an individual user or an application / server) and I can get it quickly, no matter how much I need (in theory). Further, as my storage needs increase, it is easy to get more - quickly. There are issues like accounting for storage; managing growth and billing for it that also surround the notion of thin provisioning.
Access is another big topic that surrounds ease of use. The enterprise has multiple needs here. Legacy applications, utilizing file access methods like CIFS or NFS, will want to utilize the storage cloud. New applications, written to REST Web services APIs, will also want to coexist. Finally, individual users will want access from all their device types, including PCs (Windows and Mac, Linux), the Web, and PDAs. All of this access manifests itself in interesting ways, including identity management of the credentials associated with using the service, bandwidth requirements for accessing the service from many diverse locations, and geo location of data (i.e., if you have several locations where the cloud data is kept, how do you decide which location to use?).
(3) Third, sync your files to the cloud:
Now that you have cloud storage, you ought to think about backup and sync to the cloud. These two applications are different but somewhat linked. Sync to the cloud can be used for both cloud loading (getting the data from the device to the cloud, in a background way so that the latency will not be a problem) as well as keeping a current copy in the cloud, but using the local copy on your device (the best of both worlds). Since your most current copy is in the cloud, it is your backup copy. Sync is also a solution for keeping files "sychronized" between devices and the cloud, so you always have an authoritative source of your file stored in the cloud. Of course, all this is based on having cloud access from any device, anywhere (see number two, above).
(4) Fourth, create new, higher impact applications with programmable storage:
Programmable (using http, SOAP or REST APIs) access to storage is the next big revolution in storage. Tagging, sharing, collaboration, easy search, easy and secure access and multiple views make creating new, high impact applications easier than before. Take advantage of new functionality that is easily delivered. Create applications that rely on your data and data that is external to the enterprise. Develop these applications quickly and at lower cost. If all you want is cheaper storage, you may be able to get by without a cloud, but without this capability you are missing the revolution that is upon us.
(5) Fifth, secure your cloud:
In my own survey of the industry, security is the major issue on the minds of the IT department evaluating cloud storage for the enterprise. Several different aspects of security come into play. Many of these issues are most often associated with using a multi-tenant storage cloud from a storage service provider. Nevertheless, four major security issues prevail before we even begin to consider the issues of going to the cloud at a service provider.
The four issues are: physical security, unauthorized access, data loss (disaster or device failure related) and bit rot (a subset of data loss, granted). All of these issues are no different than what you face with your traditional shared storage solutions and most of the solutions are similar. Your current IT physical security solutions apply to an enterprise hosted cloud. The identity management policies and practices associated with creating and maintaining account credentials address unauthorized access, just as they do with your current data management practices. Encryption can provide additional protection from unauthorized access. As a matter of fact, the security issues are already in play with your current storage methodology, so nothing new here, unless you move to a service provider hosted cloud (more on this later).
(6) Sixth, lower the cost of storage:
Cloud storage delivers the benefits as discussed in items one through four above, while requiring similar security to current storage activities. How does it address costs? First, cloud storage solutions generally allow for using commodity hardware, very scalable file systems, and highly automated provisioning and management solutions. So, the hardware price equation of differentiation and premium pricing is disrupted. True, the software doesn't come cheap, but remember that the public cloud storage services are "making the market" and the combination of commodity hardware, environmentals, and enabling software (file system, management and middleware from one or more suppliers) is meeting the external marketplace pricing. Here is a simple model you should use (all figures expressed in cents/GB/Mo):
Commodity Hardware depreciation $ .02
Environmentals (data center, power and cooling) .02
Management (primarily people resources) .02
Enabling Software .03
Other .01
Total costs: $ .10 (10 cents/GB/Month)
This represents a significant saving for a solution that provides all the capabilities that cloud storage delivers. What's the catch? Well, not every type of application and use case for unstructured data is ideally served by cloud storage. However, many are, and the exceptions should be dealt with as one offs. The real catch is not taking advantage of this new technology, and all the opportunities it offers, for lowering cost while delivering improved capabilities to end users and applications around the enterprise.
My next post will discuss hybrid, private and public cloud storage offerings, and where savings and security can drive significant benefits for enterprises who take advantage of the cloud storage offerings of service providers.
The growth in unstructured data will continue, unabated. We all know and understand that. The issue is how to manage this phenomenon, while operating with the assumption that the growth will likely accelerate. Since the growth is driving increased costs, the enterprise is on a continuous search to improve the way they can cost-effectively manage this growing data.
Data may exist on removable media, on PCs and PDAs, on various servers within the organization, at data centers, at remote facilities, and potentially at various outsourced service providers. The data may range from employee personal information (and even personal information from the employees associates) that is not associated with the needs of the business to non-confidential and confidential business information, some of which may be highly critical. Disparate policies will need to be applied to the data ranging from no control to extreme control. Of course, there will be the existence of multiple versions of files adding to the total storage and further exacerbating the challenges of management.
There are many potential solutions to the problem as stated above, and most of them involve some sort of additional controls, policies and restrictions that control the proliferation of data and make it more orderly and secure. These solutions are then combined with additional focus on reducing storage costs by staying aligned with new storage technology (which continues to reduce costs of storage), and the cycle repeats, endlessly. In each cycle, trade-offs associated with costs, availability, security, access, restrictions occur, and rarely is there a "perfect" solution.
Is cloud storage a possible solution to the issues as surfaced above? Is it a discontinuity, a departure, from the "business as usual" cycles associated with ongoing, incremental and continuous storage improvements when new technologies are introduced as they can be accommodated?
Let's start with discussing cloud storage and its various capabilities. Note that we are talking about a storage cloud that is housed at the enterprise data center, not a storage service provider.
(1) First, centralize the storage problem:
Cloud Storage addresses the necessary size and scale of unstructured data growth in the enterprise. Generally, highly scalable file systems, including newer object based systems, provide the ability to manage incredibly large numbers of objects (objects of all sizes) in an efficient fashion. This is combined with low cost commodity storage devices and servers. Then a centralized storage pool is ready for use. It is generally easy to add additional storage to this pool, and both backup and disaster recovery schemes are in place. So, the first well known method of problem solving that cloud storage utilizes is "centralization." Let's get a solution in place that we know can scale to the size of the data needs of the enterprise.
(2) Second, make it easy to use:
You can't use it if you can't get it, and this is where the topic of "thin provisioning" emerges. Thin provisioning just means that it is easy to get a storage account (whether I am an individual user or an application / server) and I can get it quickly, no matter how much I need (in theory). Further, as my storage needs increase, it is easy to get more - quickly. There are issues like accounting for storage; managing growth and billing for it that also surround the notion of thin provisioning.
Access is another big topic that surrounds ease of use. The enterprise has multiple needs here. Legacy applications, utilizing file access methods like CIFS or NFS, will want to utilize the storage cloud. New applications, written to REST Web services APIs, will also want to coexist. Finally, individual users will want access from all their device types, including PCs (Windows and Mac, Linux), the Web, and PDAs. All of this access manifests itself in interesting ways, including identity management of the credentials associated with using the service, bandwidth requirements for accessing the service from many diverse locations, and geo location of data (i.e., if you have several locations where the cloud data is kept, how do you decide which location to use?).
(3) Third, sync your files to the cloud:
Now that you have cloud storage, you ought to think about backup and sync to the cloud. These two applications are different but somewhat linked. Sync to the cloud can be used for both cloud loading (getting the data from the device to the cloud, in a background way so that the latency will not be a problem) as well as keeping a current copy in the cloud, but using the local copy on your device (the best of both worlds). Since your most current copy is in the cloud, it is your backup copy. Sync is also a solution for keeping files "sychronized" between devices and the cloud, so you always have an authoritative source of your file stored in the cloud. Of course, all this is based on having cloud access from any device, anywhere (see number two, above).
(4) Fourth, create new, higher impact applications with programmable storage:
Programmable (using http, SOAP or REST APIs) access to storage is the next big revolution in storage. Tagging, sharing, collaboration, easy search, easy and secure access and multiple views make creating new, high impact applications easier than before. Take advantage of new functionality that is easily delivered. Create applications that rely on your data and data that is external to the enterprise. Develop these applications quickly and at lower cost. If all you want is cheaper storage, you may be able to get by without a cloud, but without this capability you are missing the revolution that is upon us.
(5) Fifth, secure your cloud:
In my own survey of the industry, security is the major issue on the minds of the IT department evaluating cloud storage for the enterprise. Several different aspects of security come into play. Many of these issues are most often associated with using a multi-tenant storage cloud from a storage service provider. Nevertheless, four major security issues prevail before we even begin to consider the issues of going to the cloud at a service provider.
The four issues are: physical security, unauthorized access, data loss (disaster or device failure related) and bit rot (a subset of data loss, granted). All of these issues are no different than what you face with your traditional shared storage solutions and most of the solutions are similar. Your current IT physical security solutions apply to an enterprise hosted cloud. The identity management policies and practices associated with creating and maintaining account credentials address unauthorized access, just as they do with your current data management practices. Encryption can provide additional protection from unauthorized access. As a matter of fact, the security issues are already in play with your current storage methodology, so nothing new here, unless you move to a service provider hosted cloud (more on this later).
(6) Sixth, lower the cost of storage:
Cloud storage delivers the benefits as discussed in items one through four above, while requiring similar security to current storage activities. How does it address costs? First, cloud storage solutions generally allow for using commodity hardware, very scalable file systems, and highly automated provisioning and management solutions. So, the hardware price equation of differentiation and premium pricing is disrupted. True, the software doesn't come cheap, but remember that the public cloud storage services are "making the market" and the combination of commodity hardware, environmentals, and enabling software (file system, management and middleware from one or more suppliers) is meeting the external marketplace pricing. Here is a simple model you should use (all figures expressed in cents/GB/Mo):
Commodity Hardware depreciation $ .02
Environmentals (data center, power and cooling) .02
Management (primarily people resources) .02
Enabling Software .03
Other .01
Total costs: $ .10 (10 cents/GB/Month)
This represents a significant saving for a solution that provides all the capabilities that cloud storage delivers. What's the catch? Well, not every type of application and use case for unstructured data is ideally served by cloud storage. However, many are, and the exceptions should be dealt with as one offs. The real catch is not taking advantage of this new technology, and all the opportunities it offers, for lowering cost while delivering improved capabilities to end users and applications around the enterprise.
My next post will discuss hybrid, private and public cloud storage offerings, and where savings and security can drive significant benefits for enterprises who take advantage of the cloud storage offerings of service providers.


Leave a comment