Syntegra and the National Institutes of Health (NIH) have announced a collaboration to validate Syntegra’s AI-enabled synthetic data technology to advance the understanding of and care for COVID-19. Using Syntegra’s novel synthetic data engine, the NIH will be able to offer far less restricted access to the largest available repository of patient-level COVID-19 electronic medical records, immediately increasing the reach and use of this data to drive COVID-19 insights and laying the groundwork to accelerate data access for life science researchers in other key areas of disease understanding and drug and device development.
Syntegra’s synthetic data engine will be a key component of the National COVID Cohort Collaborative (N3C), validating the generation of a non-identifiable synthetic version of the entire dataset, representing 2.7m+ screened individuals, including over 413,000 COVID-19 positive patients, and 2.6B rows of data. This innovative public-private collaboration includes over 70 contributing healthcare organizations. The Bill and Melinda Gates Foundation, through their COVID-19 Therapeutic Accelerator, is supporting the collaboration between Syntegra and the NIH. Syntegra has also engaged with the Federal Drug Administration (FDA) to evaluate the role of synthetic data in regulatory decisions, for COVID-19 and beyond.
There has never been a time when rapid, low burden access to patient-level data, at scale, was more urgent.
“The promise of ‘big-data’ and precision medicine won’t be fulfilled unless we can share data siloed throughout the healthcare system, while guaranteeing patient privacy. With the COVID-19 pandemic, there has never been a time when rapid, low burden access to patient-level data, at scale, was more urgent” says Michael D. Lesh MD, founder and CEO of Syntegra. “Our novel AI technology produces a brand new dataset that accurately represents all of the statistical patterns in the underlying health records. But since no individual is represented in the synthetic data, it is impossible to disclose confidential information. We are proud of our partnership with the BMGF, the NIH, and the FDA, as Syntegra becomes the default standard for validated synthetic data sharing.”
While the rollout of approved vaccines will play a major role in the fight against COVID-19, there remains a lot of work to be done in understanding COVID-19 to care for the millions with long-term consequences. The rapid data access enabled by Syntegra for physicians, scientists, and researchers, will help accelerate enable key focus areas for the N3C such as disparities (racial and ethnic) in spread and risk, predictors of hospitalization, long term adverse effects, and the impact of COVID-19 on hospitals. This deeper research and understanding enabled by Syntegra’s synthetic N3C data will continue long beyond the eventual control of the COVID-19 pandemic itself, including the potential use for other areas of medical exploration.
The partnership with the Gates Foundation and the NIH is an important step in driving widespread adoption of Syntegra’s capabilities to generate synthetic versions of entire datasets, rather than single-question based cohorts, with full statistical fidelity and strict, validated privacy and offers the potential for a seismic shift in data interchange. Unburdened access to high quality data, including physician notes and genomics will offer benefits across the spectrum of healthcare from clinical decision support in the hospital to the full lifecycle of drug development but for health conditions beyond COVID-19.
About Syntegra
Life science researchers have difficulty generating new insights from the petabytes of electronic medical record data gathered each year because of the ethical and legal mandate that personal health information be protected. Syntegra accelerates innovation in healthcare by making data easily shareable without compromising patient privacy. Using state-of-the-art machine learning, Syntegra converts medical record systems to individual-level synthetic data which is “realistic but not real” to serve the needs of multiple stakeholders including large health systems, life science companies, insurance providers, data scientists, and clinical research organizations.