The Stack Archive

Science goes in search of your lost files

Wed 17 Feb 2016

A Chinese scientist has addressed a problem which has vexed UI developers and end-users alike ever since domestic computers became capacious enough to store a large number of items – how to find a file that you haven’t seen for a long time, but now need.

A Ranking Algorithm for Re-finding [PDF], by Gangli Liu of the Department of Computer Science at Tsinghua University, posits a wizard-based interrogation routine wherein the user answers a number of key questions that may help the algorithm to narrow down the possibilities.

The first set of criteria applied in these questions adhere to more traditional OS-based methods of finding local files, such as metadata-related queries about the author, path, keywords or file-name – pretty much the digital equivalent of being asked ‘Well, where did you last see it?’. However since such specific data is likely to identify the file, if the user’s memory can be jogged on any of those aspects simply by being asked, it’s worth a go.

The second approach is more original and speculative, asking the user if they can remember when they last remember seeing the file, whether they printed it, whether they discussed the file with anyone else, if the file (assuming a document) contains images, and where it came from originally, amongst other queries.

a-paradigm-of-the-ranking-and-logging-algorithmA full use of the deeper and more abstruse investigation method would involve the gathering of metadata which is not currently available as standard, such as understanding what rough percentage of a document a user has ever read, and also currently unavailable connections between already extant metadata, such as a link between a file’s author and the same person in the user’s contacts.

Windows, OSX and Linux operating systems all provide varying constantly updating indexing services to which additional parameters could be added, though possibly at some compute cost, depending on the nature of the facets to be monitored and logged.

Searching a completely unindexed file-system is one of the most intensive tasks a device can undertake, since every file found needs to be evaluated for content and in some way read by the computer before it can be eliminated or identified as a positive search result. Large non-binary installations, usually of open source software, can add thousands, or even hundreds of thousands of files to the local count if the enquiry should need to be opened up beyond the ‘documents’ folder and other familiar and likely locations for the missing file.


Asia China news research
Send us a correction about this article Send us a news tip