panGloss: Google's Transparency Tool: some thoughts

Google has released a tool , to much media and legal interest, which allows the public to see what requests are made by governments for information about users and, in particular, what requests were made to "take down" or censor content altogether. We have therefore one of the first reliable indices of the extent of global government censorship of online content as laundered through private online intermediaries.

This for eg is the data currently disclosed, for the last 6 months, for the UK:

1343 data requests

48 removal requests, for a total of 232 items

62.5% of removal requests fully or partially complied with

Blogger

1 court orders to remove content
1 items requested to be removed

Video

3 court orders to remove content
32 items requested to be removed

Groups

1 court orders to remove content
1 items requested to be removed

Web Search

8 court orders to remove content
144 items requested to be removed

YouTube

6 court orders to remove content
29 non-court order requests to remove content
54 items requested to be removed

and by comparison here is the data for Germany

668 data requests

124 removal requests, for a total of 1407 items

94.3% of removal requests fully or partially complied with

Blogger

8 court orders to remove content
11 items requested to be removed

Video

1 court orders to remove content
2 items requested to be removed

Google Suggest

2 court orders to remove content
3 items requested to be removed

Web Search

47 court orders to remove content
1 non-court order requests to remove content
1094 items requested to be removed

Book Search

2 court orders to remove content
2 items requested to be removed

YouTube

17 court orders to remove content
46 non-court order requests to remove content
295 items requested to be removed

and for the US

4287 data requests

128 removal requests, for a total of 678 items

82.8% of removal requests fully or partially complied with

AdWords

1 court orders to remove content
1 items requested to be removed

Blogger

8 court orders to remove content
45 items requested to be removed

Geo (except Street View)

2 court orders to remove content
2 items requested to be removed

Video

1 court orders to remove content
1 items requested to be removed

Groups

7 court orders to remove content
394 items requested to be removed

Web Search

30 court orders to remove content
2 non-court order requests to remove content
66 items requested to be removed

YouTube

31 court orders to remove content
46 non-court order requests to remove content
169 items requested to be removed

There is an enormous wealth of data here to take in. I was asked to comment on it to the BBC at a time when I had not yet had a chance to examine it in any depth, so this is an attempt to give a slightly more reflective response. Not that I'm in any way reneguing on my first gut response: this is a tremendous step and a courageous one for Google to take and deserves applause. It should be a model for the field and as Danah Boyd and others have already said on Twitter, it raises serious questions of corporate social responsibility if Facebook, the various large ISPs, and other platforms do not now follow suit and provide some form of similar disclosure. If Google can do it, why not the rest?

Pangloss has some appreciation of the difficulty of this step for a service provider. Some years back I attempted to do a small scale survey of notice and take down practices in the UK only, asking data from a variety of hosts and ISPs, including large and small, household names and niche enterprises, major industry players and non profit organisations. It was, it became quickly clear, an impossible task to conduct on any methodologically sound research level. Though many managers, IT folk and sysadmins we spoke to were sympathetic to the need for public research onto private non transparent censorship, nearly all were constrained not to disclose details by "business imperatives", or had no such details to hand in any reliable or useful format, which often came to the same thing. (Keeping such data takes time and labour: why bother when there is only trouble arising from doing so? See below..)

The fact is the prevalent industry view is that there are only negative consequences for ISPs and hosts to be transparent in this area. If they do reveal that they do remove content (or block it) or give data about users, they are vilified by both users and press as censors or tools of the police state. They worry also about publicly taking on responsibility for those acts disclosed- editorial responsibility of a kind, which could involve all kinds of legal risk including tipping off, breach of contract and libel of the authors of content removed or blocked. It is a no win game. This is especially true around two areas : child pornography, where any attempt after notice to investigate a take down or block request may involve the host in presumptive liability for possession or distribution itself; and intercept and record requests in the UK under the Regulation of Investigatory Powers Act 2000 where (inter alia) s 19 may make it a criminal offence to even disclose that the government has asked for certain kinds of interceptions of communications.

Now imagine these legal risks and uncertainties, coupled with the possibility of a PR disaster - coupled with potential heavy handed government pressure - multiplied by every legal jurisdiction for which Google has disclosed data. This gives you some idea of the act of faith being undertaken here.

Google of course have their own agendas here: they are not exactly saints. Good global PR this may accrue among the chattering (or twittering) classes will help them in their various current wars against inter alia the DP authorities of Europe over Google Street View, the Italian state over Google Video and the US content industry over YouTube. But it still remains true as they say that "greater transparency will give citizens insight into these kinds of actions taken by their governments".

Criticisms

The legal risks I talk about above also partly explain some of the failings of the tool so far, some of which have been cogently pointed out already by Chris Soghoian. Notably, it is not yet granular enough, something Google themselves have acknowledged. We have numbers for data requests made (ie information about Google users) , for takedown requests, and which services were affected (Blogger, YouTube etc). We have some idea that Google sometimes received a court order before disclosing or blocking, and sometimes didn't, but we do not know how often they gave in specifically to the latter - only that it is claimed such requests were granted only where Google's own abuse policies were breached eg on Blogger.

Crucially we do not know, for the UK say, if these requests were made under RIPA or the Communications Act s 127 or more generic policing & investigation powers or what. Or how many related to terror material or pro islamic websites, and how many to scam or spam sites or illegal pharma shops or adult porn sites, say. Or even to defamation (this is apparently responsible for a high number of the requests in Germany, according to the FAQ.) Defamation is an odd one here because it is a private law not a criminal matter in the UK at least (some states do have criminal defamation, but it is fairly rarely tried); but it leads to court orders to remove content and disclose IDs, and Google, slightly confusingly, say they count these court orders in with the "governmental" stats. (They don't however include court orders for take down of copyright material, since these almost all come from private parties - and pragmatically, would probably overwhelm the figures.)

(Another important point buried in the FAQ is that these figures don't include removals for child pornography since Google's systems don't distinguish here, they say, between requests received from government, and from private parties - so eg all the take downs and blockings ordered by the IWF in the UK are presumably not included. This also means that those already high figures for Brazilian government requests for take down on Orkut are actually in reality probably a lot higher (?) since Orkut is renowned as a haven for hosting child porn.)

Splitting up requests and takedowns by type of content is critical to understanding the validity of state action, and the more data we get in future on this will be good. Once requests and removals are divided up by type (and legitimate authority), we can also find out what percentage of take down requests in which category were acceded to, still without Google needing to disclose at the possibly dodgy level of individual requests. And also where acceded to with or without court order.

Global comparisons and free speech

Looking at the data on a global comparison basis will be a daunting but fascinating task for commentators for the future, especially as the data grows across time. It is noticeable even from just the 3 countries quoted above that it is really, really complicated to make simplistic comparisons. (This is why few if any commentators yesterday were being dragged into easy condemnations and quicky league table comparisons. )

For example, the UK government made a lot of user data requests (a helluva lot if correlated to population actually - the US has six times the population of the UK but made much less than 4 times as many requests; Germany is a quarter bigger than the UK by population and made c 50% less requests) . By that figure, the UK is the most interrogatory government in Europe.

But Germany by contrast made more requests for take down of content than the UK - and got 94% of its requests accepted, compared to 62% of the UK's such requests). What does this say about the claim to validity of the UK requests overall? Are our LEAs more willing to try it on than Germany's, or was their paperwork just more flawed?? Do we try to get more take down without court orders and Google thus tells us to bog off more? Do we actually censor less content than Germany, or just fail to ask for removal of lots of stuff via one efficient takedown message rather than in a trickle of little ones? Needs further citation, as they say.

Google do interestingly say in the useful FAQ that the number of global requests for removal of speech on "pure" political grounds was "small" . Of course one country's politics is another's law. So approximately 11% of the German removal requests related to pro-Nazi content or content advocating denial of the Holocaust, both of which are illegal under German law - but which would be seen as covered by free speech in say the US.

Non governmental disclosure and take down requests

Finally of course these figures say nothing about requests for removal of content or disclosure of identities made by private bodies (except in the odd case of defamation court orders, noted above) - notably perhaps requests made for take down on grounds of coopyright infringement. There will be a lot of these and it would really help to know more about that. As recent stories have shown, copyright can also be used to suppress free speech too, and not just by governments.

Finally finally..quis custodiet ipse Google?

...a reader on Twitter said to me, yes, it's great but why should we believe Google's figures? He has a point. Independent audit of these figures would help. But it is difficult to know without technical info from an insider (hello Trev!) how far this is technically possible given the need for this kind of information capture on such a huge scale to be automated. (At least if we had the categories of requests broken down by legal justification, we could conceivably check them against any official g9vernmental stats - so, eg, in the UK checking RIPA requests against the official figures?? - though I doubt those currently disclose enough detail and certainly not who the requests were made against? (A. Nope! surprise - see 2009 Interception of Communications Commizssioner's report, eg para 3.8.))

4 comments:

charles said...: Very informative posting - thanks Lilian. More info though please on use of (c) to suppress info - mentioned earlier this evening on twitter. Congrats on strath job. Charles Lovatt; 10:59 pm
Lucy Reed said...: I'd be interested to know how many of these figures relate to material taken down as a result of orders within family court proceedings, for example material identifying children or parties involved in proceedings or discussing the detail of cases.; 11:33 pm
pangloss said...: @Lucy. Good point. I'd like to know that too. I wasn't actually sure if courts did already try to order take down for anonymity from Google (it seems mildly futile if faced with determined disclosure) - are you aware of such orders?; 2:19 pm
Lucy Reed said...: Well, I've got knowledge of cases where injunctive orders have been made against the press and the world at large for removal of material posted on the internet concerning children involved in care proceedings, but no knowledge of how effective the injunction was (or indeed how it was served and upon whom other than via the press association). Also some involvement in children cases which have been the subject of inappropriate commentary on facebook - but removed by agreement early on in proceedings. Again, no verification of whether the material remains publicly available (can you even delete old status updates?).

No direct experience of orders against google or other web or social media organisations.

In practice where there is inappropriate publication of material there is usually no thought through approach to who is publishing or facilitating the publication of material connected to children act proceedings and who are the proper respondents. Usually orders are against parties, extended family or press - but in fact the world is so much more complicated now. As you say though, injunctions don't really have the kind of practical reach that is likely to be effective against social media and the internet generally - and by the time it comes to the attention of the court or the parties it is usually a bit of a futile exercise as you say. The family court system has more than enough to chew on without having to tackle this issue.

There is a lot of material 'out there' which is quite squarely in breach of the rules of court or specific injunctions, or clearly in breach of the privacy rights of one or more parties or the children in the case. However, there is a lot of tutting, a lot of shoulder shrugging, a lot of head in the sand but not much application of law.

ps not sure which profile I logged in with last time I commented familoo = lucy reed; 8:33 pm

panGloss

Thursday, September 23, 2010

Google's Transparency Tool: some thoughts

4 comments:

Followers

Counter CQ

Links