Friday, January 27, 2006

Google Print gets a Clear Field?

A US district court has ruled in Field v Google that Google's cache
feature, which allows users to access copies of web pages made when they
were viewed or "spidered" by Google robots, does not breach copyright in
those web pages. The matter had never been decided in the US courts
before. The case was brought by author and lawyer Blake Field who had taken
exception to Google's caching of about 50 stories posted by Field on his website. He
brought an action for copyright infringement, arguing that the Google cache
feature allowed web users to access copies of his copyrighted material
without his authorisation. The court disagreed.

The court had three bases for its decision. First, if anyone was
breaching copyright when the cached copy was accessed, it was not Google but
whoever made that cached page request. Google was merely "passive in this
process". Secondly, it was shown that Field knew how to disable the caching
feature, using the "do not archive" metatag or the robots.txt code which,
when inserted in a website's HTML code, tells Google spiders not to make
copies of that page. Field could have used that facility, but chose not to.
As such, he was personally barred from claiming copyright infringment
against Google.

Finally, and most crucially, the use Google made of the material was fair
use, said the Court. The four tests usually applied to determine if a use
is "fair use" are:

(1) the purpose and character of the use, including whether such use is of a
commercial nature or is for nonprofit educational purposes;
(2) the nature of the copyrighted work;
(3) the amount and substantiality of the portion used in relation to the
work as a whole; and
(4) the effect of the use upon the potential market for or value of the

Applying the usual USA jurisprudence, he found that Google's use was fair
because, crucially, it was both transformative and socially valuable.

"Because Google serves different and socially important purposes in offering
access to copyrighted works through 'Cached' links and does not merely
supersede the objectives of the original creations, the Court concludes that
Google's alleged copying and distribution of Field's web pages containing
copyrighted works was transformative."

This means the court accepted that making copies in cache s part of the creation of a
database for a search engine, was something very different from, say, making
copies so as to sell pirate copies to the author's potential audience.
Google were not using their cache copies for any commercial purposes which
interfered with the revenues the author would make from them or could
reasonably be anticipated to make. Nor could Google's "socially important"
purpose, to create a comprehensive freely available search database,
including historic records of altered pages, be accomplished without using
caches of the whole page rather than extracts; so the fact that the whole
rather than parts were copied was not fatal to the claim of fair use.

Finally, the court found Google did gain the benefit of the "safe harbor"
defence under the Digital Millennium Copyright Act , s 512 (b) , which which
provides a defence to service providers for the "intermediate and temporary
storage of material on a system or network controlled or operated by or for
the service provider" whereb the storage is carried out by an "automatic
technical process". There had been doubt in the past as to whether this was
intended to cover "long term" cache storage of the sort Google use - around
14 to 20 days storage. The court found this was indeed temporary, since a similar period of 14 days cache had been found legitimate in Ellison v Robertson 357 F.3d 1072, 1081 (9th Cir. 2004).

As OUT-Law note, this ruling
could hardly be more helpful to Google in its ongoing Google Print dispute.
The Google Print project , just like ordinary Google caching, involves the
automated making of full copies of pages of books, scanned in as electronic text, with the intent of making a search index from them which can then deliver limited sections of the books scanned. When book publishers complained this infringed their rights to control the making of copies, Google responded that the publishers had the ability to opt out of scanning. However under pressure, Google reversed their practice on this and asked publishers to explicitly "opt in" to Google Print, rather than leaving the onus on them to "opt out". This of course makes the project of
potentially much lower social value, as well as leaving out "orphan works"
whose copyright holders are unknown.

A court, albeit a District Court only, now seem to have validated Google's
original "opt-out" approach. Not only that, but it has clarified that
scanning in full text as opposed to merely extracts of texts, can be
acceptable fair use. Finally, they have apparently rebutted the damning
argument that Google Print cannot be fair use because it disrupts future
revenues, in the form of as yet uncommenced efforts by publishers to provide
or license similar revenue-generating book-scanning search engines.

Although I am in favour of Google Print as a project (what academic isn't?),
this all seems just a tad too good to be true. For example, in relation to the fair use criteria, Google can hardly claim with a straight face to make no commercial revenue out of providing either cached page links or Google Print in its full glory. Their revenue comes from AdWords , and these sell because so many million people
use Google to search - something providing Google Print can only enhance. This point was raised by Field, but brushed aside : "The fact that Google is a commercial operation is of only minor relevance in the fair use analysis."

Field's works also had little or no commercial value per se. The court
found: "There is no evidence of any market for Field's works. Field makes
the works available to the public for free in their entirety, and admits that he has
never received any compensation from selling or licensing them."

The situation was, therefore, rather different from, say, Oxford University
Press complaining about the scanning and distribution of parts of their
money-making textbooks or encyclopaedias. The court also found that:

"there is no evidence before the Court of any market for licensing search
engines the right to allow access to Web pages through "Cached" links, or
evidence that one is likely to develop."

But this is probably by now not at all true of large scale book scanning operations -it is obvious that the major publishers, stung by the Google and subsequent Yahoo! etc activity, are getting their asses in gear on this one, and that a future search-and-pay-per-view licensed market by each publisher, or consortia of publishers, can well be imagined.

Finally, the application of the DMCA caching safe harbor decision to Google
is right in technical detail, but in terms of purpose, is deeply suspect.
The caching safe harbor of the DMCA (just like its equivalent in the EU, the
EC E Commerce Directive (ECD) Art 13) was intended to protect the common practice
of making highly temporary local copies of multiply-accessed web pages, to
reduce transmission times to local users making page requests, and to reduce
overall Internet congestion. The Google cache services at least one very
different purpose: to make copies of web pages available to users for some time even when the page has moved or been removed (perhaps deliberately to avoid search). Furthermore, since Google spiders periodically return to un-protected pages to refresh the cache, the cache storage of an unaltered page can be seen as permanent, or at least as not "temporary", since it may effectively persist for a much longer period than the 14-20 day cycle cited in court. ( I note with some amusement that in my first post on Google Print months ago I was alreay quizzical about whether Google could take advantage of the caching safe harbors.)

The court seem, indeed, to have gone further in their first finding, by
deeming Google "passive" in the process of making and transmitting a copy to
the user who makes a page request from a Google cache page link. To this
author, that sounds a lot like a finding that Google is not even actively
caching under s 512(b) but merely a "mere conduit" (as we Europeans call
it - see EC ECD, Art 12) - or as stated under s 512(a) of the DMCA, someone
who only provides "transmission, routing, provision of connections or
storage through a system or network controlled or operated by the service
provider." If Google, albeit by automated technologies, initiate the making of cached copies for their own purposes, not for the needs of end users,
they are not, in my view, being passive "mere conduits" and it is misleading
of the court, for whatever well meant purposes, to make that analogy.

In any case, when we come to Google Print, the intentional and active nature of the
copying, even by automated means, becomes even more obvious. Furthermore,
scanned copies of books will be available indefinitely one assumes: so it would be
unreasonable for the caching safe harbor to apply (nor would the hosting safe harbor in either DMCA or ECD be appropriate, since while the content is supplied by a third party, the copying - and potential copyright infringement - is undertaken by Google).

So to sum up: good news for Google on fair use, and very good news indeed on
"opt out" as opposed to "opt in". Watch this space, as I keep saying. Your
humble blogger will be chairing a debate on Google Print at href="">WWW 2006 in sunny Edinburgh - I am looking
forward to it.

No comments: