About the Data Commons

The coronavirus has turned the world upside down, and we know that research is the only way to turn it right side up. Stony Brook Medicine clinicians and researchers are leading exciting studies that are addressing many aspects of this new virus, from onset through recovery and prevention.

Under the direction of Dean Kenneth Kaushansky, the Renaissance School of Medicine at Stony Brook University has developed a Stony Brook Medicine COVID-19 Data Commons, that supports integrated management, query and analysis of clinical, radiology, pathology, spatial and molecular data. The clinical data captures all available information about COVID-19 patient symptoms, past medical history, family history, clinical course,  treatment and response, as well as data elements relating patient demographics and co-morbidities. The radiology data includes all imaging studies obtained during each patient’s treatment, including CT and chest x-ray data along with computationally derived data products.

This highly collaborative interdisciplinary effort led by Sandeep Mallipattu, MD; Sharon Nachman, MD; Joel Saltz, MD, PhD; and Mary Saltz, MD, brought together more than 50 people from across all of Stony Brook, from both East and West Campus. This diverse group included the entire Biomedical Informatics (BMI) team; Clinical Informatics fellows, Preventive Medicine residents, adjunct faculty from the Departments of Surgery, Medicine, Anesthesiology, Preventive Medicine, Radiology and Family Medicine, students from Computer Science and Public Health, and interns from the Division of Applied Informatics. More than 20 medical students devoted their time over six weeks to collect information from the charts of patients, supervised by residents and fellows. The Stony Brook Medicine IT Department was instrumental in gathering the appropriate data, which was then carefully curated by data scientists. Not only was routine data brought into the Data Commons, but also all radiology reports and clinical notes to allow natural language processing to interrogate the data. All radiology images were also imported, thus specifically leveraging the imaging informatics expertise at Stony Brook. This intense, hands-on effort led to the creation of a viable data commons within a month of starting the enterprise. The Stony Brook Medicine COVID-19 Data Commons is a single, robust source of truth for COVID-19 at Stony Brook, providing a cutting-edge repository to support COVID-19 research at Stony Brook, as well as at a national level.

Currently available data is integrated across data from the electronic health record, the clinical data warehouse, images and hand-abstracted chart review information stored in REDCap. Identified images with linked reports are also available. A pipeline to de-identify images has been developed, and currently about 600 cases are available for research, with this number growing each day. To date, the effort has concentrated on hospitalized patients who are COVID-19 positive, but will expand to include the outpatient population. Going forward, it will also be able to import genomic and genetic data. By having a curated data source across multiple data types, clinicians and researchers will have a powerful tool to learn more about COVID-19, both on an individual and population level with the goal of improving patient care and facilitating research.