IBM’s Cognitive Storage decides the value – and the destination – of data
Tue 5 Apr 2016
New research published today in the Institute of Electrical and Electronics Engineers’ journal Computer outlines a new project from IBM called Cognitive Storage. The project’s ambit is to determine how important data is, and what priority it should be given, in terms of secure accessibility in data centres and other computing environments.
It’s an important challenge in data storage, since low-latency, highly-responsive and secure data streams utilise the most expensive computing resources, in terms of hardware, working hours and actual energy expenditure. Thus IBM’s initiative, led by Zurich-based IBM researcher Giovanni Cherubini, seeks to teach computers the difference between ‘memories’ and ‘information’:
‘[If]1,000 employees are accessing the same files every day, the value of that data set should be very high, just like a priceless Van Gogh. A cognitive storage system would learn this and store those files on fast media like flash. In addition, the system would automatically backed up these files multiple times. Lastly, the files may want to have extra security so they cannot be accessed without authorization.’
IBM researchers tested the prototype system using 177 million files among seven users, assigning data rank of 1, 2 or 3 based on metadata that includes group ID, user ID, creation date/time, file extension and permissions, file size and directory location. The paper claims a data value assignation accuracy of almost 100% for the smaller class set in the trials.
Though Cognitive Storage’s methods and algorithms propose, in part, a ‘popularity contest’ for data, the systems developed also need to consider issues of data governance and regulation, which may mandate how apparently unimportant (even zero-accessed) data must be treated. The researchers cite the example of tax records, which can be deleted after seven years to free up resources.
However safeguards are necessary in order to prevent infrequently-accessed material being excessively de-prioritised, and Cognitive Storage provides user metatags to ‘stick’ apparently low-rated information into higher-level access streams.
The system, not currently a commercial product, was inspired by IBM’s work with Astron on the Dome Square Kilometre Array (SKA) radio telescope. The project draws in data at a rate of a petabyte a day, and has a (fairly literal) signal-to-noise ratio problem which requires intense analysis in order to assign data value and extract workable result-sets.