On data ownership

Data ownership is a rather nasty topic: at a legal level, we have many rights related to data we create or that is about us: privacy regulations, intellectual property rights, copyrights and trademarks, etc. are all aspects of how society attributes ownership to immaterial goods. This practice has been in place since at least the early 19th century, but even then there were critics, among them Thomas Jefferson and James Madison.

With the advent of digitized storage, reproduction of immaterial data has become cheap and lossless. This has a significant impact on the industry: for example, the entertainment industry is currently facing the consequences of this highly disruptive technology advancement, and has yet to redesign their business model to accommodate this paradigm shift.

But this change goes far beyond the entertainment industry or any specific market: at this time, most people have started to realize that data they release about themselves will be reproduced, indexed, and made available via 3rd party search engines. Once the cat is out of the box, it it too late for restricting distribution.

This leads me to believe that we need to re-think the concept of data ownership, at least at a technology level: it does not make a lot of sense to claim ownership of data if one has no means of asserting this ownership in an effective manner. The judicial processes are too slow and too much bound to physical objects. As a result, only a small portion of data ownership infractions is dealt with by courts, and effective enforcement on a global scale is practically impossible.

As a result, it would seem appropriate to me to abandon the concept of data ownership on a technical level altogether – and replace it with concepts that are better suited to how information systems are designed in the 21st century:

  • A physical custodian of data has access and control over the physical object where the data is stored. In many cases this will be effectively a system administrator that is taking care of the computer and harddrives where the data is stored. It also makes sense to consider the organization that employs the system administrator(s) to be physical custodians. The physical custodian has significant control over the data, since he can simply “pull the plug” and make data unavailable.
  • A logical custodian can access and modify the data. A logical custodian can also grant the logical custodian role to other entities. While in many cases a physical custodian is also a logical custodian, there are important cases where this is not the case: in multi-level security systems or environments where data-at-rest is encrypted, the physical custodian might not have meaningful access to the data. The granting of this role can not be reversed: once an entity has access to data, this data can be copied to other physical systems and be re-used.
  • The data originator is the entity that created the data. While origin may be an important factor to determine authority or validity of the data, it does not guarantee either.

Anything beyond these roles cannot – at least with current technology – be properly modeled without relying on concepts beyond the realm of technology. Nevertheless, even these limited roles can be used to model interesting scenarios. For example, a distributed storage system that stores encrypted and chunked data with parity (i.e. RAID 5 or 6 across different services, not disks), can practically eliminate the role of the physical custodian.

Higher level technologies (such as DRM or multi-party encryption) may be successful in restricting the significant control that a logical custodian to some extent, only external mechanisms (such as system certification, trust models, or judicial redress procedures) can limit the logical custodian.

tags:

Leave a Reply

Your email address will not be published. Required fields are marked *