in my previous post I presented ClouDedup, our solution for deduplication over encrypted data. I’m now gonna talk about how ClouDedup can be successfully deployed in order to address a very common use case: Backup.
Doing regular backups is a strongly recommended practice, even though it happens very often that people don’t know how important a backup can be. In some cases, having a fresh backup from which your data can be restored, can really save you a lot of time. For a company, a backup can make possible to recover data after a incident, or even worse, a disaster and thus save a lot of money.
That said, nowadays there exist a huge number of backup solutions. Some of them make use of deduplication in order to store duplicate data only once and achieve storage space savings close to 95%.
As I said in my previous post, achieving deduplication and confidentiality at the same time is tricky because deduplication and encryption are conflicting. Indeed, the main requirement for deduplication is to be able to compare two data segments (either a file or a block) and determine if they are identical or not. On the other hand, encryption usually generates indistinguishable ciphertexts. It’s clear that we can’t apply standard encryption and still achieve deduplication. We definitively need to adapt encryption in order to keep deduplication possible.
Recently, convergent encryption has been suggested as a solution to solve the above-mentioned problem. Bitcasa is currently providing a storage solution which claims to combine deduplication and confidentiality by making use of convergent encryption. Unfortunately, it has been shown in here and here (you can find more details in my previous post) that convergent encryption isn’t suitable to provide full data confidentiality. For instance, if data are encrypted with just convergent encryption, the cloud provider can easily verify if you stored a given file or not!
Well, fortunately there is a solution to this apparently unsolvable issue: ClouDedup. You can find all the details of the solution here. In this post we will focus on how it can be easily deployed.
Let’s take as example an enterprise with a number employees (users). They store their data on premises and periodically create a backup which will be stored in the Cloud or even locally. As we all know, backups can be huge and their size constantly increases over the time. That’s why there is a strong need for optimization techniques that aim to minimize the storage space.
Employees have to install a client on their machine which will take care of forwarding their storage requests to ClouDedup. The client is responsible for splitting a file into blocks, encrypting data with convergent encryption, storing the key of the first block of each file and finally building the requests to ClouDedup.
This can be achieved with a dedicated program (e.g. a command line interface, a graphical interface, a web interface, etc.) or might be totally transparent to users by providing a filesystem abstraction module such as FUSE.
Currently, we are working on a program (written in Python) with a command line interface but we plan to develop also a graphical/web user interface.
The gateway can be deployed within the enterprise perimeter on a dedicated machine or a virtual machine. If we want to achieve more security and make sure that the secret key used by the gateway cannot be stolen or leaked, we can employ a hardware security module such as Luna SA, which is provided by SafeNet.
If we don’t want to deploy any additional component on premises and we feel comfortable with the idea of trusting an external service provider (SP), we might decide to rely on a SP for the features of the gateway.
The location in which the metadata manager (MM) can be deployed depends on whether we want to rely on a SP or not. If so, the MM can be deployed remotely (in the Cloud), otherwise we can deploy it on premises.
Finally, the last component of ClouDedup is the storage (SP). Our solution is storage agnostic and completely transparent to the storage provider. That’s possible because we don’t ask the storage provider to provide any exotic function. The only thing we need to do is to store/retrieve/delete blocks, which from the provider’s point of view can be considered as files of small size.
Thanks to all these components, employees can easily backup their data (either manually or through an automatic script).
It wouldn’t be difficult to integrate such an architecture with an existing system for incremental backups. In this case, ClouDedup would allow you to save even more storage space.
We are glad to announce that we are halfway done in the implementation of ClouDedup, which will be available within the firs quarter of 2014.
In the meantime, your comments and questions are welcome!
If you are interested in ClouDedup, please fill the form below or signup here.