If you need a paywall, but you also need Google to love you, you have a problem
Thu 15 Oct 2015

Since I don’t speak German, I have no interest in penetrating the paywall that German digital publisher Axel Springer flung around the Bild news website this week, presumably in angry retaliation to its legal defeat in Cologne over the legitimacy of Adblock Plus two weeks earlier.
Nonetheless I am always curious as to how paywalls are actually implemented when the open nature of the internet mitigates against them so much, and so I occasionally experiment with paywall and anti-adblocking mechanisms to see how resilient they are; and frequently find that they can be overcome either with the very adblocking software that they are often seeking to defeat or with no additional software of any kind, beyond the browser that is used to view them.
In the case of Bild, I took a look at the JavaScripts that were being served along with the site, and by blocking them selectively I still wasn’t able to gain access whilst keeping Adblocking software active. But it transpires that I had blocked the wrong scripts in the wrong order, as commenter Viking Rule noted in the comments section: “I blocked core.js, and two other extra…js scripts – (use * = extra,*.js) – did the trick for me.” [My link]
The Wall Street Journal’s paywall, forbidding as it may seem, is far from a decisive block to the non-subscriber, who in fact needs absolutely no software at all to skirt around it. It’s enough to copy the headline of the article that you want to read…
…paste the headline into Google and follow the inevitable link to the article, which will now be displayed without any obfuscation:
Major digital publishers, all of whom are going to want good placement also as sources for Google News, cannot afford to send search engine spiders to content-free pages intended to rope in new subscribers, and so if the referrer for the page request is ‘Google’, you’re escorted promptly past all the bouncers. Since this is an inconvenient way of browsing the content in a publication, the procedure can also be automated into the Referrer Control plugin for Chrome, which lets you invent header referrer sources on a per site basis.
Back in the UK, the paywall mechanism for The Spectator is based on cookies counting the number of pages you have visited. Once you’ve read five free magazine articles, clearing any cookies that the site sets will restore full articles.
You can use the preference controls in any modern web browser to find and remove cookies set by a particular domain, without need of any third-party add-ons and without the inconvenience of deleting all your cookies. Interestingly the ‘Google trick’ which works for the WSJ does not work with The Spectator, which is entirely reliant on local cookie settings for one particular browser, and ignores referrer information for this purpose.
Additionally any site which similarly uses cookies to implement a paywall can be circumvented simply by opening a private browsing window; another ubiquitous feature of modern browsers, Private Browsing will set any new cookies necessary to use the sites visited, but will not read any that are already saved locally, nor remember any of the cookies it sets during the session. Therefore it’s enough to open a new private browsing session once the article limit is reached. If you prefer, you can ‘Save the articles for later’ via app functionality on mobile devices which takes the core text into an offline version with a referrer header that no publisher can ignore.
The Los Angeles Times paywall follows the same technique as Bild, in that it uses JavaScript to detect previous hits on the site from your browser. However that information about previous visits is not being stored in cookies, but more likely in browser session variables, HTML5 storage or even IP logging – though the latter would be tremendously expensive in terms of network traffic and would be subject to failure when multiple viewers are browsing a site from a common IP, for instance via an office network or a shared VPN IP.
It doesn’t really matter; using AdBlock’s ‘Blockable items’ list to ban select JavaScripts from the trbas.com domain restores the articles, as does rewriting two CSS rules in the site’s styling and making them permanent via local stylesheets – which is possible in all modern browsers without the need for extensions.
The New York Times plugged up a simple URL hack that let users bypass its paywall in 2013, but left many routes behind, including the aforementioned Private Browsing or ‘Incognito’ method, along with arriving at the link via Twitter or Google Search. For some time the NYCLean bookmarklet circumvented the updates, but it no longer functions.
However one currently only needs to block any scripts from http://www.nytimes.com/svc/* via any AdBlocking software in order to prevent the network back and forth that institutes the NYT’s paywall.
The future of the internet paywall
A scenic jaunt around any number of similar publishers will turn up similar easy tricks. And since many of the paywalls can be got around without so much as opening up a preference panel in a web browser, one has to wonder if any of it really constitutes ‘hacking’. Publishers seem to be relying on the increasingly non-technical nature of a new generation of readers who have been empowered by the smartphone revolution at the expense of gaining any necessary knowledge to remain in control of their own digital destinies. And perhaps the publishers are right.
There are many online entities, most notably scientific research communities, which successfully institute paywalls simply because they only make summations of the core content available to search engines; no amount of fiddling with CSS or JavaScript can reveal material which is genuinely inaccessible without network authorisation.
Creating paywall systems based on the viewer’s IP address is a terrible risk, since some of your most important readers are going to be sitting in an office sharing an IP with other demographically ‘desirable’ viewers. In one publishing company that I worked for, one IP-based paywall implementation went badly wrong as the permitted views-per-month was blazed through on an IP-basis, preventing any of the journalists creating the site from actually reading it.
The fundamental tension in this sphere remains between Google, which is committed to an ‘open’ web where monetisation occurs via peripheral advertising, and the publishing industry, which is currently being forced to use some of the most ineffective available tools to implement magazine-style paywalls without completely disappearing from the social and internet news scene.
Everyone involved is looking at the problem from their own perspective. Apple is happy to institute content-blocking because consumers want it and Apple makes money from hardware, not from facilitating advertising; Google is happy to maintain ‘open’ news networks which make it extremely difficult for the larger publishers to retain the kind of resources which let them distinguish their journalistic efforts from the blogging world, or to restore lost print advertising revenue with ambient ad placements; and publishers are looking to restore consumer lock-in in a network environment of story-led consumers who have completely abandoned the concept.
It could be that the solution, too democratic to ever be acceptable to commercial concerns, is to create universal advertising systems that go where the traffic is – motivating large and small publishers alike to do their best work. Something like Google’s Contributor, perhaps.