Thirteen genetic sequences – isolated from people infected with COVID-19 in the early days of the epidemic in China – were mysteriously deleted from an online database last year but have now been recovered.
Jesse Bloom, a computational biologist and specialist in viral evolution at the Fred Hutchinson Cancer Research Center, found that the sequences were removed from an online database at the request of scientists in Wuhan, China. But with some internet spying, he was able to recover copies of the data stored on Google Cloud.
Sequences do not fundamentally change scientists’ understanding of The origins of COVID-19 – Including the perilous question of whether the coronavirus spread naturally from animals to humans or escaped in a lab accident. But its deletion adds to concerns that secrecy from the Chinese government has hampered international efforts to understand how COVID-19 emerged.
Bloom’s results were published in prepress paper, has not yet been reviewed by other scholars, released Tuesday. “I think he definitely agrees with trying to hide sequences,” he told BuzzFeed News.
Bloom learned about deleted data after reading paper From a team led by Carlos Farcas at the University of Manitoba in Canada on some of the oldest genetic sequences of SARS-CoV-2. Farkas’ paper described sequences taken from hospital outpatient clinics in a project by researchers in Wuhan who were developing diagnostic tests for the virus. But when Bloom tried to download sequences from a file Archive reading sequence, an online database operated by the US National Institutes of Health, has been given error messages showing that it has been removed.
Bloom realized that copies of SRA data were also kept on servers operated by Google, and was able to solve the puzzle of URLs where the missing sequences could be found in the cloud. In this way, he recovered 13 genetic sequences that may help answer questions about how the coronavirus evolved and where it came from.
Bloom found that the deleted sequences, like other sequences collected at later dates outside the city, were more similar to bat coronaviruses — presumed to be the ancestors of the virus that causes COVID-19 — than to sequences linked to the Huanan Seafood Market in Wuhan. This adds to previous suggestions that the seafood market may have been an early victim of COVID-19, rather than the place where the coronavirus first jumped from animals to humans.
“This is a very interesting study done by Dr. Bloom, and in my opinion the analysis is absolutely correct,” Farkas told BuzzFeed News by email. Scott Gottlieb, the former head of the Food and Drug Administration, also praised the findings on Twitter.
But some scholars were less impressed. “It really doesn’t add anything to the debate over the origins,” Robert Garry of Tulane University in New Orleans told BuzzFeed News by email. Gary argued that Huanan Market or other markets in Wuhan could still be the source of COVID-19.
Bloom is one of 18 scientists in May publish a letter Criticizing the WHO and China study on the origins of SARS-CoV-2. The scientists argued that the WHO-China report failed to give “balanced consideration” to the competing ideas that the coronavirus spread naturally from animals to humans or escaped from a laboratory – a theory the report deemed “highly unlikely”. After the publication of the report of the World Health Organization and China, the United States and 13 other governments complained They “lack access to data and complete original samples”.
The deleted virus sequence was first uploaded to the SRA in early March 2020, around the time researchers led by Yan Li and Tiangang Liu from Wuhan University Publish an initial version They describe their work using genetic sequencing to diagnose COVID-19. Just days ago, China’s State Council have ordered That all paperwork related to COVID-19 must be approved centrally.
Then the sequences were pulled from SRA in June, at the time when the Final version of the paper Featured in a scientific journal. According to the National Institutes of Health, the authors requested that the sequences be removed. “The applicant indicated that the sequence information had been updated, submitted to another database, and wanted the data removed from the SRA to avoid version control issues,” NIH spokesperson Amanda Fine told BuzzFeed News by email.
However, it is not clear if the sequences were published online in another database.
“There is no reasonable scientific reason for the deletion,” Bloom wrote in preprint, arguing that the sequences were likely “deleted to hide their existence.” This, he wrote, indicates “a less-than-pure effort to track the early spread of the epidemic”.
Although the sequences were deleted, Gary noted that the key genetic mutations they contained were still posted in a table in the final paper from the Wuhan team. “Jesse Bloom has not found something entirely new that is not part of the scientific literature,” Gary told BuzzFeed News, accusing Bloom of writing his preprint “in a flamboyant, unscientific and unnecessary manner.”
Bloom wrote to the Wuhan researchers asking why the sequences were deleted, but received no response. Likewise, Li and Liu did not immediately respond to an inquiry from BuzzFeed News.
This isn’t the first time scientists have raised concerns about removing data that might help answer questions about the origins of COVID-19. The main database containing information on the coronavirus sequences maintained by the Wuhan Institute of Virology – the focus of speculation about a possible “lab leak” of the virus – Taken offline In September 2019. When the WHO and China team members asset lesson of the epidemic visited the institute in February, they were told the database, which It reportedly contained data Over 22,000 samples and sequence records of the coronavirus, removed after repeated hacking attempts.