fbpx
The Stack Archive

Why didn’t PDF die like Flash?

Mon 7 Nov 2016

The British government’s Accessibility department has just published the results of a six-week online survey, quizzing users of assistive technology about what aspects of government publishing might need addressing. Many of the users, according to the section’s blog, find the government’s widespread use of the semi-open Adobe PDF format ‘hard to use’, asking for alternative content in HTML. The government is considering these complaints, but civil and municipal retrenchment into PDF-dependence does seem to make change unlikely…

In 2007 Greg Pisocky, Business Development Manager for Adobe Systems said “There’s not a federal agency that does not use PDF…Acrobat software and Adobe PDF are key technologies in some capacity at all branches and levels of government, the military and virtually every agency.” At that time it was estimated that PDF represented 9.2% of all content available online.

The situation is relatively unchanged since then; in a period where HTML5 and the smartphone revolution have changed or massively affected practically every other aspect of how we interact with the digital world, the PDF, amazingly, isn’t facing a serious threat from any contender.

PDF has always been there; it pre-dates the internet* and grew up with it. It’s that older guy in the office who never retires, who watches the newcomers come and go in a quiet spirit of tenured philosophy.

It’s also the core of governmental information infrastructure (certainly in the west), despite being the digital equivalent of a Nokia 3310 in the smartphone age.

Ironically, this aged, print-centric, tree-consuming format is pretty much the only way you can read the latest EU and British government reports about how ‘digital’ is transforming society, and how we must embrace new technologies – and conserve the environment.

Strangely similar histories

The Postscript Document Format dates back to the 1980s, and its roots to the mid-1970s; so it may be difficult for anyone who has grown up with PDF as an online resource (an above-average number of internet users) to understand why the format has such a Teflon reputation, when it shares so many of the same characteristics as the much-maligned – and very much dying – Flash format which Adobe bought from Macromedia in the 1990s.

To wit: the full-fat Adobe Reader browser plugin (which, like Flash, is intended to leverage the ‘lightness’ of the vector format) is a deceptively large install, a notorious CPU hog, and likewise sits grafted onto the browser interface in a sandbox which has been a popular target of hackers for decades.

Furthermore the two plugins, both originally intended to provide rich media functionality at a time when there were few alternatives, are implemented natively in Chrome (with some pending limitations for Flash), in low-fat versions that trim off all those bells and whistles that Adobe added over the years in the hope that either Flash or PDF might become a ‘micro-internet’ in their own right.

Similarly, most of the feats that PDF or Flash are asked to perform can now be handled by more recent technologies such as CSS3 and HTML5 functionality, at a fraction of the operational overhead.

Yet Adobe’s own gradual mothballing of the proprietary Flash plugin and its closed binary output format has not been mirrored with PDF.

‘Digital’ as in ‘watch’

In 2001 the PDF format seemed to have reached a perfect digital apotheosis when Adobe’s own PDF-centric Quartz rendering engine became an integral component in Apple’s new OS X operating system. After OS X the PDF became a ubiquitous resource with Apple desktop products, and the OS can still output the format natively from any printing dialogue (except, ironically, Adobe’s CS suite, which hijacks the print dialogue and provides its own, more complicated PDF generation routines).

A year later the Nielsen Norman consulting group concluded that ‘Forcing users to browse PDF documents makes your website’s usability about 300% worse relative to HTML pages’, advising that while PDF solved many problems in printing documents consistently, it had no other worth in an online context.

With the maturing of open web standards and HTML5, heavy criticism of Flash – based on its opacity, weight, proprietary format and security issues – seems to have made Adobe nervous enough about the new ‘open’ trend to release PDF as an open format to ISO in 2008, apparently in the hope that wider-spread adoption through third party software and workflows would re-cement PDF’s place in the document ecostructure in the face of lighter upcoming XML-based formats, as well as other opposing formats capable of providing PDF’s most desirable features in a less ‘1980s’ manner.

(Nonetheless, Adobe kept several components of PDF reserved for itself, including the Adobe XML Forms Architecture (XFA) and Acrobat’s JavaScript extension.)

Yet despite this late surge of communal concern, the PDF, designed from the ground up as a print format back when ‘digital’ meant ‘watch’, is as locked and closed, for the end user, as any document format could possibly be – against the trend of the time, and in a world which prints less than before, and should probably be printing even less than that.

Your grandfather’s document format

Yes, PDF really is old enough to be your grandfather’s document format. If it weren’t for its cryptographic capabilities, and the fact that you can lock a PDF with a password, and sign it digitally (thus making PDF a portable analogue of the inelastic email format), it is hard to believe that it would have survived this far into the 21st century.

My own complaints about PDF crop up most when browsing scientific documents, and also white papers and long, graphically ornate reports.

In the first case: scientists love big margins, which makes reading their PDFs inconvenient on the desktop and practically impossible on a smartphone, since text reflow is almost never selected when the document is created, to save a few kilobytes at export. This means that you couldn’t reformat the page even if you could change the margins since each sentence would break away from the one next to it.

Despite the UK government’s adoption of PDF as an ‘open’ viewing format, it recommends the rather lighter OpenDocument Format for actual collaboration

It also means, should I need a hard copy, that I have to print out twenty pages of margin for 7-8 pages of text.

In the latter case, those highly elaborate and over-designed corporate reports and white papers, with their vast expanses of colour and rich blacks – I dare not print them, lest the ink cartridge give out before we even get past the index pages.

These elaborate PDFs either use bitmaps (which makes the document absurdly large) or vector (which often causes enough screen draw delay to boil an egg by, making navigation problematic); but either way, they induce that ‘PowerPoint glaze’ in the user, and make content difficult to copy out or even select; and since there’s no stylesheet, no division between content and appearance, there’s no way of simplifying the information. It’s all locked.

Ironically Adobe’s own native PDF processing software is called ‘distiller’.

Bad form

You’re thinking forms, right? PDF forms, where the format finally abandons its chronically locked state and (sometimes) lets the user insert and save information, without the need to print. Forms must be why governments  (which are known to have a fondness for forms) are keeping PDF in the document chain.

Not really. Despite the UK government’s adoption of PDF as an ‘open’ viewing format, it recommends the rather lighter OpenDocument Format for actual collaboration. To my recollection, I have not yet encountered an ODF in years of scanning PDF-stuffed government RSS feeds. If you want to know more about The OpenDocument format, you can download a document about the latest specification from the Organization for the Advancement of Structured Information Standards (OASIS).

It’s a PDF document.

The 800Ib document format

I admit I am not going to win this one. One can argue whether PDF is ‘good enough’ or ‘fit for purpose’, but no-one can dispute that it’s ‘what there is’ – a zero-thought, outmoded default that will persist in the face of better-suited, newer formats – because of the millions it would cost to re-tool ancient PDF workflows across multiple private and government institutions, combined with trepidation about whether Google would index the new format or not.

Nothing will change – the library of congress is married to PDF as an archival format, and the UK is conflicted, but apparently also resigned to it.

The format attempts to update; it grafts on features to its bloated specification (in fact there are now fifteen PDF versions and sub-versions, and counting), such as ‘tagging’, which was instituted in 2014 in an attempt to address the accessibility issues just now highlighted in the UK report. But using these features is effortful, and often they are ignored for logistical reasons.

Flash also grafted accessibility features onto its opaque binary format in the years before its critical decline, and nobody used them either.

A great deal of the problem with current PDF usage is the will, time and money to leverage even these grafted-on, ‘afterthought’ features: a substantial proportion of government-supplied documents are merely scans of snail-mail, without an OCR component; just TIFFs wrapped in an A4/ANSI A container – as if to ensure that the image never exits the print world. And in terms of accessibility, an absolute dead end.

But the PDF ecostructure is no withering vine, and so long as it allows users to embed fonts and keep the end-user a mere spectator; and so long as PDF sits in high-end customised information systems which are far too expensive to retool, atop an ever-growing archive mountain which will likely never be updated to a more modern format, granddad seems set for an extraordinarily long life.


* In as much as PDF is a wrapper for PostScript.

Tags:

feature government
Send us a correction about this article Send us a news tip