In an earlier article I talked about data ownership – or lack thereof – at a low, technical level. There are three principal technical actors: the physical custodian, the logical custodian, and the data originator. This article deals with the problem (for the data originator) to limit the powers the physical custodian has. As the owner of the physical equipment that hosts the data, the physical custodian can perform a number of undesired actions with the data he hosts, specifically: (i) copy and distribute it and (ii) disable physical access to it. In many cases, both actions are not desired by the data originator or consumer.
As a first step towards limiting the physical custodians powers, it is important to make sure that the physical custodian (PC) is not also a logical custodian (LC). By this I mean the following: the PC has access to the physical equipment that hosts the data, as well as the transport infrastructure to get access to it. By denying the PC the role of the logical custodian, he may ultimately host data, but will not be able to use or interpret the data in a meaningful way. An obvious way to achieve this, is to encrypt the data and make sure that the PC does not get access to the key. For most practical purposes, this addresses action (i).
But even if the PC cannot access the data he hosts, he still has the “power of the plug”: if the PC cuts that connection to the network, or switches of the data equipment, all access to data is lost. In order to be able to address this problem, one can use the following scheme:
Data is stored in some atomic units like files, that can be represented as a data stream.
The data stream is encrypted; keys are not stored with the data.
The encrypted stream is chunked into at least two chunks of identical size. The number of chunks is arbitrary.
At least one parity chunk is computed – think RAID 5 or 6.
The chunks are stored on different data services. This could be a traditional data service, but also other services such as a mail service or a blog service could be used to store the chunks. The table linking the different chunks is stored separate from the data.
The effect of creating such a “Redundant Array of Independent Services” (RAIS) is obvious: not only can the physical custodians not access the data since it is encrypted and they only have a portion. Also, since there is at least one parity chunk, if one provider decides to “pull the plug”, the lost data can be reconstructed from the remaining chunks. As an additional protection, users might want to mirror individual chunks on different services as well, thus improving availability.
The obvious open questions are crypto key and chunk table management, especially since these become high-value targets. Master key techniques and independent RAIS systems can address some of these issues through best practices.