Simplifying legalese

One of the prime issues that crop up frequently in data protection conferences or forums is that on the immanent complexity of legalese that prevents users from understanding the Terms & Conditions* for which they’re signing up when they join a social network, use an app or visit a website. This led to either the Users being disinterested in understanding the T&Cs. This had a huge impact on Data Protection Policies since Users were ignorant about their data was being used and more importantly, there was no clarity on who actually “owned” the data.

While it was clear that a new framework to make them aware or at the very least get them interested to explore further, the prime issue was “How”. In this regard, one of the speakers showed a video of The Scott Trust, which is a part of The Guardian Group. Unfortunately, I am unable to find the video online at the moment (Will share the URL once I do). The video informed the viewer that all data being collected on this site was to generate relevant content for the User without sharing any information with any third party that might be affiliated with The Guardian. It was quite well made and it did a good job in summarising what The Guardian’s T&Cs were in under 2 minutes (I think).

However, I see two problems with this approach. First off, and this was highlighted by another speaker in the panel, that most of the Users won’t bother to check the video or even access the page containing the video. The second problem that I felt – and this is a major concern – is that if the User did in fact, watch this video, he is likely to misconstrue the oversimplified version of the Terms of Service mentioned in the video for ALL Terms & Conditions, when in fact, it’s just a summary. That might be a problem.

Reasons for Complexity

There might be two reasons (readers are encouraged to suggest more) for the complexity of legal articles such as the Terms of Service, EULAs, etc. –

  1. The organisations are trying really hard to be very clear about where they stand regarding the terms of usage of their products/services. Whether one agrees with these terms is a matter of conjecture – one that demands resolution.
  2. Organisations know that a lot of information can be shrouded under Terms of Service since Users don’t usually read them and can create deceptive Terms of Service/EULAs, etc. Contrary to popular belief, it is not the big companies who usually engage in such practices (mostly because they know they are always under the scanner of the public and cannot get away with it).

Either way, it is of supreme importance that the Users be made aware of the implications of the Terms and Conditions. Also, it is important for the organisations to realise that it is in their benefits for Users to realise their Terms of Service since the absence of complexity makes way for a healthier and more trustworthy relationship between the institution/s and the user/s. Case in point – When Amazon decided to pull out George Orwell’s 1984 from its shelves owing to copyright infringement issues (it turns out it had been doing the same for other books too, such as Animal Farm, Twilight, books by Ayn Rand and some books from the Harry Potter series), possibly the most exemplary of comments from the list of user complaints highlighting the confusion arising out of such ambiguities is an exchange between two users –

User#1: What ticked me off is that I got a refund out of the blue and my book just disappeared out of my archive. I emailed Amazon for an answer as to what was going on and they said there was a “problem” with the book, nothing more specific. I’m sorry, when you delete my private property – refund or not – without my permission, I expect a better explanation than that. And, BTW – Pirated books showing up on Amazon – not MY problem – hire more people to check them BEFORE you sell them to me. I call BS on the “sometimes publishers pull their titles” lame excuse someone else got too.

I like the B&N analogy above – but I liken it to a B&N clerk coming to my house when I’m not home, taking a book I bought from my bookshelf and leaving cash in its place. It’s a violation of my property and this is a perfect example of why people (rightly) hate DRM.”

User#2 (in response to User#1):You don’t buy a Kindle book from Amazon. You buy a license to download it. I will bet that if you read all the fine print in the terms of service, you will see that Amazon says they can remove (or rescind, or revoke, or whatever the legal term is) the license if the book in question has been put up in violation of the copyright.

If you buy something that turns out to be stolen, it can be confiscated and returned to the legal owner with no compensation to you. You could try to get your money back from the vendor, but that would be something you would have to pursue yourself; the police wouldn’t do anything about it.

Consider how many posts there have been here where people rant and rave because Amazon doesn’t do enough to help owners of lost or stolen Kindles get them back. Now there are complaints because Amazon does make the effort to get stolen (and that’s what unauthorized books are) books “returned” to the copyright holders. Talk about a no-win situation.”

This wasn’t helped further when Drew Herdner, the spokesperson for Amazon gave the following statement to the reporters – “We are changing our systems so that in the future we will not remove books from customers’ devices in these circumstances.

The underlined phrase (underlined by me for emphasis) is further encouraging complexity since it can technically mean that Amazon can remove any book under a different circumstance/s. Alongside, intangible reputation damages, Amazon suffered from real economic damages, having to pay a plaintiff’s lawyer, $150,000 and an undisclosed sum of money to the plaintiff and the co-plaintiff.

What’s clear is that this issue (and there are other examples) arose from the fact that there was some complexity in the terms of engagement between Amazon and its users, which led to a massive financial and reputation damage for the company.

Now that the need for a clearer set of Terms & Conditions is acknowledged among the Users as well as the Organisations, the prime question is – HOW? In other words, how do we implement measures to remove the complexity arising from the Terms & Conditions?

The hard answer to this question is – There’s no panacea for this issue.

That being said, we can always attempt to minimize the issues arising out of such complexities by trying to make it more succinct and understanding of Users’ hopes and expectations. For that, one solution that I have in mind is what I would call a Terms of Service Commons or TOS Commons for short. The main aim of ToS Commons would be to create a middle ground between Organisations and Users, by attempting the following –

  • Develop a “generic Terms of Service (gToS)”, which would encapsulate a general “philosophy”, that will be applicable for all websites. A good analogy might be to think in terms of generic features of all social media websites. For example, Twitter, Facebook, and LinkedIn are all social media websites with different models and features. Yet, they offer certain common functionalities, so to speak. Such as private messaging, referring to one’s contacts by hyperlinks, sharing of photographs, etc. Thus, if one were to implement a gToS, a starting point would be to understand the common features that describe all social media websites (Please take note that I am using social media as only an example and gToS could vary per the type of website/s). This can be termed a “Read Once – Apply Always” type of document.
  • Anything that is specific to these websites’ business model (for example, Sponsored Tweets are specific to Twitter) can be covered by “specific Terms of Service (sToS)”.

In all of this, I believe an interesting approach would be a bottom-up one to understand the expectations of the User base. Wikipedia is a good inspiration on how one could build a huge corpus, which could reflect the user knowledge base. I am not saying all User recommendations need be accepted, but if a certain idea attains a “critical mass”, it should be imbibed in the larger “philosophy” of gToS.

Of course, there are challenges with implementing this approach. A couple of them that I can think are (Of course, readers are encouraged to advise newer points, one of the prime aims of the ToS Commons in the first place!) –

  • Getting a common consent among the masses on a particularly contentious issue is quite difficult. A certain Standard Operating Procedure on mediation needs to be developed to facilitate a smooth dispute resolution.
  • One of the main aims of the gToS will be to shorten an overall Terms of Service one reads on a website. However, as I mentioned previously, the gToS itself will depend a lot on the “type” of the website”. This is where it might get tricky – Categorisation of websites. Thus, is Facebook a social media site (where people communicate with their friends) or an e-commerce site (where people buy stuff)?  A possible solution would be to develop the ToS Commons in the categories of “function”. That is, Messaging, Photo Sharing, E-Commerce (a very broad term that would need to be defined VERY clearly).

Last, but not the least, the prime issue that arises is that of legitimacy. In other words, how is such an initiative likely to gain the trust and/or acceptance of everyone? The most plausible (although not completely perfect) answer to this might be the creation of a self-regulatory consortium consisting of all organisations conducting any form of interaction online, economic or otherwise. While this has its downsides (The consortium becoming lopsided in favour of the bigger players, few organisations making rules for the rest of the world, etc.), it has one major upside – It might be the first time that a (somewhat) concrete Bill of Rights for the internet could be created and implemented through a gToS.

As a concluding remark, I would like to point out that while insufficient (I don’t think it’s ever possible to create a “sufficient” Terms of Service, since new terms will create newer issues, so on and so forth), I have attempted to tackle this issue from an institutional perspective. There is another side to this issue – The Users’. Human beings are subject to cognitive limitations and numerous issues arise out of it even within the existing set of frameworks. To address this, ToS Commons can be extended to understand and address the limitations arising out cognitive limitations in Users as well by the analyses of their feedback. In a nutshell, there is no one way to simplify the complexity of legalese since it can be considered what one could term a “necessary evil”. Any attempt to simplify the complexity could end up oversimplifying the Terms of Service (as is the case with the Guardian). Therefore, a balance needs to be maintained. ToS Commons could be an interesting, but a crucial step in minimising this quandary.

*For the purposes of this article, the words “Terms & Conditions”, “Terms of Service”, “EULAs”, “Terms of Usage” will be used interchangeably. But all these terms, whether used individually or in groups, will invariably act as the umbrella term/s for all the aforementioned legal instruments.



Active and Passive Internet of Things


Podcast for the article (Please note that there are some differences between the podcast and the article below, although a majority of the content remains the same. Also, the article explains the models created below and summarises their assumptions and limitations. The podcast deals more with the general idea of Passive and Active Data Collection in the Internet of Things).

The most profound technologies are those that disappear. They weave themselves into the fabric of everyday life until they are indistinguishable from it”. – Mark Weiser

The Internet of Things is a step in this very direction. And like all things new and mysterious, it has its fair share of utopian and dystopian soothsayers; with an almost certain probability that neither of their deterministic predictions will completely come to fruition in the future. However, what is interesting is the common basis on which both these viewpoints have been made: increasing reliance on the generation of data by machines as opposed to humans. And this is where I feel, there is a dire need of policy measures even before the IoT infrastructure becomes ubiquitous.

In this regard, I believe that going forward, data needs to be divided into two categories, based on the source of generation. It needs to be noted however, the focus should not be on who is generating the data, but on how the data is being generated. The last definition is crucial because even if in an M2M communication, the root message (primarily, the original data) is created by the User.

The two types of data classification are as follows –

Active Data – Active Data is the kind of data that is generated with the active consent of the User in the sense that the User consciously generates the data. This can be thought as akin to the User Generated Content on Facebook, Twitter, LinkedIn, or any other social media. While the nitty-gritty of the Terms & Conditions of these sites can be argued (i.e. the “fine print”, the opt-in/opt-out debate, etc. ), it is safe to assume that the Users generate most of the content consciously while actively consenting the to the T&C.

Passive Data – When it comes to the Internet of Things (or indeed, as some companies like to call it, The Internet of Everything), the increasing trend will be towards data generated by machines. However, this is not where the point of contention starts; it starts from how this data is generated. And the answer to this question is the subconscious behaviour of the Users. Allow me to explain. I am quite restless by nature and take breaks from sitting in a chair after every 10 – 15 minutes (Imagine sitting through an entire 1-hour lecture!). Now, this is something that I do subconsciously. In a normal non-IoT connected chair, this trait of mine might not be picked up. However, in a chair that is wired to the larger IoT infrastructure and my behavioural data shared with it can generate different insights to the third parties who are constantly monitoring my movements – Is he feeling uncomfortable? Is the ergonomics of the chair not optimal for this kind of User? – The insights can be varied and at times conflicting, thereby probably leading to less than optimal results. That might be a problem.

I am not saying that the generation of subconscious behavioural data is necessarily bad. What will set its usage apart from the good and bad will be the context in which it is used (Imagine having a heart attack in the middle of the street, one would agree that subconscious behavioural data collection would be extremely helpful in such a case!). Thus, what will be crucial from a policy perspective is the ex-post or ex-ante evidence and to understand the context in which one should consider the former over the latter and vice-versa.

The larger IoT infrastructure is a ‘Complex System’ in the sense that it is likely to exhibit ‘Strong Emergence’ – the development of behaviour at the system level that cannot be understood or described in terms of the component subsystems (Cave, 2011). IoT is foreseen primarily as making this world a more efficient place with the lesser reliance of human agency of unessential and mundane aspects of their day-to-day life, thereby allowing us to be more in control of the things that might really matter to them. But, whether such a vision will be implemented even close to its form will depend mainly on the policies that will allow us to take a step back and understand the nature of data and cross-link it with the context in which they are generated. In this regard, the ‘strong emergence’ feature of IoT might compel policymakers to contextualise policies in an ex-post rather than an ex-ante manner, with the focus being more on principles than on rules.


  1. Internet of Things and Data Collection – Active and Passive Internet of Things
  2. Internet of Things and Data Collection – Active and Passive Data under Conditions of Regulation

Model Assumptions

  1. Device_C represents those devices (or groups of devices) to which we consciously feed in data. E.g. Mobile Phones, Laptops, etc.
  2. Device_Sx (where ‘x’ is a numeric suffix) represents those devices (or groups of devices) which monitor our subconscious data. E.g. Any device that’s connected to the IoT infrastructure like a chair.
  3. Device_S1 and Device_S2 are assumed to be complementary to each other. This means that the User can either use Device_S1 OR Device_S2.
  4. All behavioural data has been taken for the average civilian population from the website of Bureau of Labor Statistics.
  5. The numbers on the Y-Axis of the graphs do not mean anything in themselves since the numeric data taken is largely an assumption. However, what is important to be observed is the ratio between the amount of Active and Passive Data collected.
  6. The data generated by the User and collected by the devices is in bits.
  7. For the purpose of this model, I introduce a new unit of inferred information. I call it ‘info.’. This is NOT equal to the number of bits generated. It can be thought as the unit of the amount of inferences or insights that can be generated from the bits of data.
  8. This model is a microcosm of the entire IoT infrastructure representing a User and a finite collection of devices with which he might interact and which might interact among themselves.