Identity through dynamic relations

We all know that there is little agreement on the definition of identity, digital identity, identity (meta) system(s), user centricity, etc. There are probably as many definitions of these terms out there, as there are actors playing in this market. While many of these definitions are somewhat similar, there is still a significant semantic gap between how the various “clans” are using them and which terms are the “correct” ones: Just think about the debates around relying party, service provider, and consumer or user agent, and browser.

Traditional Digital Identity

I would like to go back to one of the more common definitions of digital identity. For some time now, I have been operating on the notion that a digital identity is – essentially – a collection of attributes. For example, your digital identity probably has attributes such as name(s), addresses, phone numbers, email addresses, etc. The collection of these attributes – accessible in a machine processable form – constitutes a lot of knowledge about you.

On Identifiers and System Specific Attributes

While not central to the theme of this article, it should still be noted that the user name (or – more generally – the identifier) is yet another attribute in itself that might change. To limit the number of attributes, an identity system might also decide to use an existing attribute that can be taken to be sufficiently unique (e.g. an email address) for the user name/identifier.

In addition to the identifier there may be more attributes that arise through the use of a particular identity system. These can be system internal attributes guaranteeing uniqueness (e.g. GUIDs) or pseudonymous identifiers used with individual relying parties.

All these additional identifiers might be random. Yet, through their usage in the identity system they are tightly coupled to are particular digital identity and they should be treated with the same importance and privacy awareness as any other personally identifiable attribute.

Cryptography

In many cases, this collection is accompanied by some cryptographic keying material, often in the form of a public/private key pair. As the ‘owner’ of this digital identity, you typically have access to the private key and you can use it in transactions to prove that it is you (i.e. the ‘owner’ of the digital identity and its keying material) who participated in this transaction.

Derived Statements

Depending on the context of this digital identity (some people might want to call this context an identity system, federation, or identity meta-system), you [1] can create statements about your collection of attributes that do not necessarily contain all the information about your digital identity, but only a subset: for example, you might be able to create a statement about your email address and name and nothing else. Or it might be handy to create a statement about the fact that you are over 21, without disclosing your actual age or birth date.

Issues

Overall, this concept of a digital identity was – and still is – quite useful in many cases. It has a lot of built-in flexibility and can be applied to a very large number of problems.

The problem with this view is trickling up to the surface, as soon as we get concerned about the privacy of the different actors in this definition. It is quite clear [2] that within the world of this definition privacy breaches are quite easy: As soon as parts of a digital identity become known, these parts (or attributes) can be collected in databases and sold to those who are interested. This fact has already resulted in the massive disruption of email through spammers. Going forward, it is all too easy to imagine a world where private data collectors or nosy governments collect more and more attributes and information about a person’s digital identity[3].

Identity By Relation

I am starting to think about identity (and in particular digital identity) in a more dynamic way:

A digital identity is a collection of relations to (i) itself, (ii) other digital identities, (iii) external entities. These relations can, but do not have to be decorated with one or more attributes.

One of the benefits of this definition is that it becomes intuitively clear that a single digital identity is not necessarily stored in a single place, but much more commonly in a number of different places. This decentralization is a crucial building block for creating a world with strong privacy by segregating as much data as possible by design. At the end of the day, it will be (almost – see below) exclusively the ‘owner’ of a particular digital identity that is capable to correlate across different digital identity storage locations.

With such a definition in mind, you can gather a lot of data about someone by using the identity web services of theirs, but a lot of it may be very ephemeral (e.g., their current geolocation or presence status). As such, it is actually closer to their real ‘in-the-world’ identity.

Correlating Through Auditing

One might argue that this separation of identity data will in turn weaken the capability to effectively correlate information about a given digital identity for legitimate purposes, in particular when it comes to requirements such as “proof of source” or “non-repudiation”. These concerns can be overcome by auditing: while different storage locations are typically not capable of correlating, a concerted action (e.g. based on a court warrant or subpoena) can evaluate audit trails and construct a comprehensive image of a digital identity.

[1] More precisely: A component of the identity system can create such statements about the attributes of your digital identity on your behalf. This could be your identity provider, some active user agent, or another service separate from the identity provider.

[2] Actually from experience: probably all participants in electronic commerce or even simple electronic communication have had some of their digital identity disclosed to parties that should better not have them, e.g. spammers or worse. Frequently, this happens through the sale of this information to marketeers.

[3] This scenario applies to loosely coupled, internet-scale identity systems. In more tightly coupled systems (e.g. in internal business applications or cross-enterprise collaborations) there are usually tight governance models that regulate how data is being handled through contracts and laws.

tag: identity

Gerald Beuchelt