White House summons AI community to mine dataset of 29,000 coronavirus research papers
Written by James Orme Tue 17 Mar 2020

Microsoft, NLM, CZI, Allen Institute for AI, and Georgetown University assemble mammoth collection at the request of US government
A dataset of over 29,000 scientific articles related to the coronavirus family has been shared publicly to help the scientific and medical community better understand Covid-19 and its related viruses.
The machine-readable collection was collated so AI technologies, specifically text and data mining tools, could digest the scientific literature for insights on how Covid-19 can be tackled.
The resource was requested by the White House Office of Science and Technology, which described the CORD-19 dataset as the “most extensive machine-readable Coronavirus literature collection available for data and text mining to date”.
Georgetown University coordinated the collaborative effort with Microsoft, the National Library of Medicine (NLM), the Chan Zuckerberg Initiative (CZI) and the Allen Institute for AI.
Microsoft’s academic curation tools identified and assembled worldwide scientific research and results, CZI provided access to pre-published papers and NLM did the same for its extensive academic archive. Once the resource was brought together, the Allen Insitute for AI transformed the content into a machine-readable form.
The CORD-19 resource can be found on Allen Institute’s SemanticScholar.org website and will be updated as new research is published.
The White House Office of Science and Technology called upon AI experts to create natural language processing tools to scour the dataset for insights. Their efforts will be hosted on Google Cloud’s Kaggle, a machine learning and data science community that makes AI tools available to a global community of over 4 million data scientists.
“It’s difficult for people to manually go through more than 20,000 articles and synthesize their findings,” said Anthony Goldbloom, Co-Founder and CEO at Kaggle.
“Recent advances in technology can be helpful here. We’re putting machine-readable versions of these articles in front of our community of more than 4 million data scientists. Our hope is that AI can be used to help find answers to a key set of questions about COVID-19,” he added.
Michael Kratsios, US Chief Technology Officer, The White House, thanked the organisations involved and called on the US research community to “put artificial technologies to work”.
“Decisive action from America’s science and technology enterprise is critical to prevent, detect, treat, and develop solutions to COVID-19,” he said.
Dr. Eric Horvitz, Chief Scientific Officer at Microsoft, said it was important companies, governments and scientists came together to help fight the pandemic.
“The COVID-19 literature resource and challenge will stimulate efforts that can accelerate the path to solutions on COVID-19,” he said.
Written by James Orme Tue 17 Mar 2020