RAISR: Is Google’s AI-driven image resizing algorithm ‘dishonest’?
Tue 15 Nov 2016
Google has released the fruits of new research into upscaling low-resolution images using machine learning to ‘fill in’ the missing details. Compared to the hoary standards Photoshop users have been used to for over twenty years, the results are quite impressive. But in a climate where the camera lies a lot more than it used to, how ‘real’ is the result?
Opinion It would be nice for us fans of Charles Dickens if some scientist could put The Mystery Of Edwin Drood into a novel chemical solution and ‘grow’ the missing second half of the book. However, we have to resign ourselves to the fact that the great man died during the writing of it, and that his substantial plans for its conclusion died with him. The data is irretrievable, and if we want to amuse ourselves with guessing, that’s fun; but it isn’t Dickens.
For some reason, however, there’s a general public consciousness that deep within the resolution of a photo is some greater resolution that we haven’t been smart enough to find yet. It’s a misconception that has been promulgated by Hollywood for at least fifty years – including the 240p CCTV footage that CSI operatives can magically enhance to DNA level. At least in Blade Runner, such granular zooming is presented in a science-fiction context.
Those of us who have been using Photoshop for decades and have had to resize images upwards have ended up Googling the five options the program presents us with during a resize; namely Nearest Neighbor, Bilinear, Bicubic, Bicubic Smoother, and Bicubic Sharper (the latter two being more recent innovations of the CS suite). Intuition tells us that if any of them were really any good, there wouldn’t even be five options – the program would just default to the one that actually works. But none of them really do, because you can’t get a quart out of a pint pot.
Google believes otherwise – although it’s not easy to see how the new algorithm from Google Research could easily be integrated as a sixth option in Photoshop. RAISR (Rapid and Accurate Image Super Resolution) uses machine learning to develop ‘routes’ from low to higher resolution versions of an originally small image, based on sampling the differences between smaller and (genuinely) higher-resolution versions of data training images in a set.
‘The suggested algorithm requires a relatively small set of training images to produce a low-complexity mechanism for increasing the resolution of any arbitrary image not seen before. The core idea behind RAISR is to learn a mapping from LR images to their HR versions. The mapping is done by filtering a ”cheap” upscaled version of the LR image with a set of filters, which are designed to minimize the Euclidean distance between the input and ground-truth images.’
In addition to the RAISR paper, Google has released supplementary examples of the differences between original low-res photos, and upscaling results achieved by conventional algorithms and by RAISR’s machine-informed approach:
Results vary across the kinds and qualities of initial samples but actually seem most impressive in resolving indistinct text, as in the above example. Some of the source images are not discrete photos, but small sections of high-resolution originals, and here the apparent improvement is less marked:
What is notable across all the examples is how none of them feature human faces, surely the prime subject of photography. But this example of the ‘reconstruction’ of a lower-res cat image indicates the level of interference and inference that could make AI-driven photo reconstruction controversial:
It seems here that RAISR has added fur detail – and it invites speculation as to what the algorithm would consider an accurate reconstruction of a person’s skin or other significant details.
AI-driven upscaling results might offend an actor’s vanity, but perhaps the most worrying possibility for image reconstruction is in cases where even broad detail is indistinct in the original, but could potentially be ‘interpreted’ – almost as an act of invention:
Forensic image enhancement is an established field, and one that straddles uneasily the budget-driven impetus not to replace low-res CCTV unit infrastructure with the higher capture capabilities that are acknowledged to be necessary – despite the potential expenditure in cameras, streaming networks and storage. Using AI-driven enhancement as a compromise between technological progress and budgetary constraints.
Courts place significant trust in the testimony of digital imaging technicians who have worked up details from low-resolution imagery; faith in the fidelity of these ‘enhanced’ images routinely convicts defendants. Google Research is taking a questionable stance by continuing to propagate the idea that images contain some kind of abstract ‘DNA’, and that there might be some reliable photographic equivalent of polymerase chain reaction which could find deeper truth in low-res images than either the money spent on the equipment or the age of the equipment will allow. If we want 4K conclusions, we need full-frame 4K sources, not segments thereof.