Microsoft AI researchers accidentally leak terabytes of sensitive data on GitHub
Written by Rebecca Uffindell Thu 21 Sep 2023

Microsoft AI researchers have accidentally leaked tens of terabytes of sensitive data. The incident occurred when publishing an open source training data storage bucket on GitHub.
Discovered by cloud security startup Wiz, the leaked data comprised of 38 terabytes of sensitive information. It included personal backups from two Microsoft employees’ computers, service passwords, secret keys, and over 30,000 internal Microsoft Teams messages from hundreds of employees.
“AI unlocks huge potential for tech companies… However, as data scientists and engineers race to bring new AI solutions to production, the massive amounts of data they handle require additional security checks and safeguards,” said Ami Luttwak, Wiz Co-Founder and CTO to TechCrunch.
Wiz said it discovered a GitHub repository linked to Microsoft’s AI research unit while investigating accidental exposure of cloud-hosted data.
Those who accessed the GitHub repository were directed to download the models from an Azure Storage URL. Wiz found the URL unintentionally granted access to the entire storage account. This resulted in the accidental exposure of additional data.
The URL had been inadvertently exposing this data dating back to 2020. It was set up with incorrect permissions, specifically granting ‘full control’ access instead of ‘read-only’ access. This error allowed anyone with the URL to potentially manipulate or compromise the data.
Wiz stated the Azure storage account was not directly exposed. Instead, Microsoft AI developers included a shared access signature (SAS) token in the URL, which was overly permissive. SAS tokens in Azure enable the creation of shareable links for accessing data in Azure Storage accounts.
“With many development teams needing to manipulate massive amounts of data, share it with their peers or collaborate on public open source projects, cases like Microsoft’s are increasingly hard to monitor and avoid,” said Luttwak.
How did Microsoft Respond to Reports of Leaked Data?
Wiz shared its findings with Microsoft on June 22, and Microsoft promptly revoked the SAS token on June 24. It concluded its investigation into potential organisational impact by August 16.
The Microsoft Security Response Centre stated that this issue did not expose any customer data or put any other internal services at risk.
Microsoft, in response to Wiz’s research, expanded GitHub’s secret spanning service. It now includes SAS tokens with excessively permissive expirations or privileges.
This enhancement complements the ongoing monitoring of public open source code for potential plaintext exposure of credentials and secrets.
Last year, a security flaw was found by Microsoft in ChromeOS. The flaw allowed hackers to remotely trigger a denial-of-service (DoS) or a remote code execution (RCE) by interfering with audio metadata.
In 2020, researchers at Check Point identified two major security flaws in Azure. The flaws allowed hackers to access sensitive data on on-premises machines running Azure. Hackers could also take over Azure servers in the cloud.
Hungry for more tech news?
Sign up for your weekly tech briefings!
Written by Rebecca Uffindell Thu 21 Sep 2023