What role can data science play in confronting the COVID-19 pandemic and social injustice?
A panel of experts from Duke University, the State of North Carolina, and the technology industry shared data sources and approaches to mitigate these public health crises during an online research forum attended by more than 675 people and hosted by the Duke University School of Medicine in late June. Participants represented 98 Duke organizations and 79 external groups. The presentations are available online at the forum website.
Michael Pencina, PhD, vice dean for data science and information technology in the School of Medicine and a professor in the Department of Biostatistics and Bioinformatics, organized the event.
“Data scientists need to engage in the social context of communities and demonstrate commitment, humility and advance planning,” said Pencina. “The voice of the people makes for stronger research together.”
Throughout the day, North Carolina government officials described how datasets related to numbers of cases of COVID-19, virus tracing, testing, hospital surges, and amount of available personal protective equipment have been vital and are analyzed to determine policies regarding shutting down and re-opening the community and in the work place.
These datasets have also helped decisionmakers better understand demographics of people infected with COVID-19 and helped guide strategies to help mitigate the spread of the virus, especially in populations most vulnerable. For example, disproportionate numbers of people from the Black and LatinX communities have been infected, said Jessie Tenenbaum, PhD, Chief Data Officer for the North Carolina Department of Health and Human Services and an assistant professor of biostatistics and bioinformatics at Duke.
“What we’ve seen is that COVID-19 is not impacting North Carolina populations evenly,” said Tenenbaum. “In the African American population, that population as part of the state is about 22 percent, but we’re seeing that population is 25 percent of [COVID] cases, and even more in terms of deaths. Thirty four percent of the deaths we’re seeing from COVID are in the African-American population. Similarly, the Hispanic population is only about nine percent of the state, and yet we’re seeing that 45 percent of cases are in the Hispanic population.”
Speakers also discussed the importance of collecting reliable data and barriers to data collection including inconsistencies in the amount of data that each city provides to the state, as well as mistrust of health systems in many minority communities.
“As data scientists and social behaviorists, it’s important we understand how mistrust of health care systems can impact the quality of data we’re collecting—if people aren’t trusting the data collectors, it can impact the quality of the data we are receiving, data that we need to make informed decisions,” said Michelle Laws, PhD, assistant director of consumer policy and community engagement with the NC Division of Mental Health, Developmental Disabilities, and Substances Abuse Services.
Later in the day, the conversation shifted to include scientists and innovators in academia and private industry who have developed tools to track COVID-19 response. These tools include a modeling database to track how resources used to treat COVID-19 patients—such as ventilators and PPE—are being used on a daily basis in the Duke University Health System. The model is refreshed every morning to provide the most up to date data to operationalize workflow, said Ben Goldstein, PhD, associate professor of biostatistics and bioinformatics.
Another talk focused on how a new study called Covidentify seeks to collect biological information like heart rate, sleep and activity from wearable devices as a way of determining whether or not a person has been infected with COVID-19.
“Our goal is to let people know they are infected before they realize it based on physiological changes in their body collected by this wearable device,” said Jessilyn Dunn, PhD, assistant professor of biomedical engineering at Duke. Data shows that the virus is often transmitted by people who have not developed symptoms and are not aware they are infected.
Another presenter from Amazon Web Services (AWS) shared information about a new service available to the academic research community: AWS Data Exchange. Launched in late 2019, the service allows customers to search for, subscribe to, and use third party data in the cloud, said Fred Lee, MD, MPH global business development leader for healthcare and life sciences at the Amazon Web Services Data Exchange.
The software, said Lee, was created to help customers—whether they be academic researchers, state and federal governments, consumer goods organizations, and pharmaceutical companies—find and leverage data that has been made freely available by other customers in order to create a more efficient data exchange that is searchable and does not require the shipping of hard drives from one customer to another.
Meanwhile, “at the end of February, we started getting a spike in demand. A lot of our customers who were listing data started contributing free COVID-19 data sets, publishers such as Foursquare, Change Healthcare, and many others,” said Lee. New subscribers in academic research and other sectors began to find and leverage that data, which includes a wide range of information from foot traffic during COVID-19 to grocery stores, pharmacies, urgent care and restaurants (Foursquare) to weekly unemployment claims data from the Department of Labor (rearc). Lee encouraged Duke researchers to take advantage of the available data sets.
In another talk, a representative from Change Healthcare highlighted the data that the healthcare technology company has collected on COVID-19 patients, and how Duke data scientists might leverage this data (which is de-identified to protect patient privacy) to answer important research questions about the pandemic.
The company currently has data on tests, diagnoses, hospitalization and mortality for half a million COVID patients, and the data is collected daily and updated weekly for researchers, said Tim Suther, senior vice president and general manager of data solutions at Change Healthcare.
“Change enables a wealth of COVID related insights, to understand disease progression, intervention effectiveness and knock-on effects of the healthcare system,” said Suther. He encouraged researchers interested in partnering with the company to contact Pencina.
“We absolutely need to be drivers of solution and we will be,” said Mary E. Klotman, MD, dean of the Duke University School of Medicine in her remarks during the forum. “One of the key tools we have is data science. Data science needs to play an active role in the process of healing. Reliable data and rigorous research are essential and we need to use them in managing the Covid-19 pandemic as well as understanding the deep roots of racial injustice.”