BRCA Journal

journal entry

Mar 14


How Will "All of Us" Manage and Secure the Health Data of One Million Participants?

"Please step back behind the sign," said the receptionist at the doctor’s office. Another person was being helped and I waited, obliviously, a bit too close behind her. I mumbled an apology and stepped back. Sure enough, there was a sign that read, "For the privacy of the patient ahead of you, please wait here." I was embarrassed, of course, because I was acting as if I was at the checkout line at the grocery store. People weren’t buying cucumbers here, though; they were being given (or told) private and delicate information about their health. Information they have the right, under federal law, to have protected and determine with whom it will be shared. And it’s this very issue of donating one’s health information to big data research cohorts that raises legitimate concerns of confidentiality and security. How can we ensure our data may be shared easily between researchers yet still remain anonymous, reliable, and secure? In other words, where would a virtual privacy sign live and what should it say?

In the previous two posts of this big data and precision medicine series, we explored how and why the Precision Medicine Initiative (PMI) of the National Institute of Health (NIH) has turned to us, the public, to help medicine become more personalized. The PMI’s "All of Us," a prospective research cohort of one million engaged participants is the largest and most comprehensive of its kind in United States history. It aims to collect health information ranging from Electronic Health Records (EHR) to genetic data. Researchers from different fields and institutions will be able to access, compare and study this data in order to better understand how to treat diseases in a targeted approach based on the individual’s health background. Collecting, securing, and preserving the integrity of this huge number of personal information while maintaining its utility and accessibility, however, hinges on rigorous technical, regulatory, and ethical policies. These policies start with establishing standard protocols and systems to collect and store the data in a usable and reliable manner.

Creating the Sign or Technical Strategies

Technically, the PMI is working on disentangling two main challenges: 1) the sheer number and types of data to collect and store; and 2) the methodology to collect and organize them. EHRs, for instance, are not standardized across all health care providers in the United States. Providers tend to organize them in various ways and in different types of software. Collecting EHRs is also not a one-time deal; this information will keep changing and accumulating throughout the participant’s life and involvement in the study. Adding to the complexity, participants may have multiple EHRs to begin with, and/or switch their health care provider during the study. All this heterogeneous and ever-evolving EHR data needs to be accommodated in a standardized fashion, using a secure server that can handle extraordinary amounts of dynamic information. This largely depends on generating robust data processing methods to mine and organize various forms of EHRs and the information they provide. Other sources of data, such as imaging and diagnostic devices, are also not standardized. They store data in different types of software that cannot be universally retrieved and analyzed. To solve this problem, the research community is engineering open-source or publicly available software tools that can liberate data from different device software. The PMI also recognizes the need for storage space for the vast amount of data, especially high-throughput ones like genomics. The main hurdle here, however, is to create and maintain an efficient data storage system that allows researchers to easily access, transfer, and analyze data.

Enforcing the Sign or Regulatory Strategies

As a federal research institution that does not deal with billing and transactions for health services or insurance coverage, the NIH is not considered a covered entity under the Health Insurance Portability and Accountability Act (HIPAA). Instead, The PMI and “All of Us” have drafted their own policies and regulations that serve to protect the interest and privacy of its participants. The governing body of the PMI, for example, has to follow strict rules, which include: 1) requiring participant representation and active collaboration at all levels of the study; 2) upholding accountability and responsible data management; and 3) abiding by all applicable laws and regulations concerning privacy and research with humans. These could be federal policies for the responsible conduct of research in humans, such as Informed Consent protocols, or regulations implemented by research institutions, such as the Institutional Review Board. They will be used to regulate how the data is collected, stored, and accessed. The PMI will also assess and respect the preferences of participants and the extent of their desired involvement in the study. Furthermore, through the Certificates of Confidentiality, the program can protect its participants from legal demands to reveal information that could identify them.

Respecting the Sign or Ethical Strategies

Perhaps the main concern of participants is the privacy and security of their health information. Fortunately, researchers have a vested interest in keeping this data secure; any security breach undermines the integrity and reliability of the data and they will no longer be able to trust the validity of the results. That said, how will data be easily accessed and analyzed by authorized researchers and still remain protected from hackers? The PMI has set up multiple safeguards to protect the participants from such transgressions.

For starters, names, dates of birth, and other identifiable information, will be encrypted, i.e. replaced by randomized codes. The information will also be stored on physically, and virtually locked, servers and accessible only through multi-factor user authentication. Only key personnel and researchers will have the credentials to access the dataset of anonymous health information at any given time. And before gaining clearance, researchers and personnel will have to undergo rigorous ethical and technical training to handle the data properly and prevent compromise. In the event of a data breach, the PMI will notify the participants.

Another ethical concern is participants’ use and ownership of their donated health information. As NIH Director, Dr. Francis S. Collins, so succinctly states, the PMI views “All of Us” participants as engaged volunteers and partners. The will have the right to access their own health information, look through what studies are using their data, and learn what the findings of these studies might be. Being a partner and not a mere test subject is particularly important and valuable to participants who carry disease- related mutations, such as BRCA mutations. BRCA mutation carriers will be able to track their information and learn of any new discoveries that have implications for their health and prospects. On the other hand, partners who do not know they carry a BRCA mutation will be able to find out this information through the program. It is important to note that “All of Us” stresses it won’t diagnose participants or replace their health care providers. It does recognize, however, its ethical duty to inform participants if a documented disease mutation pops up in their genome.


When it comes to the logistics of big data, the PMI is aiming to strike a delicate balance between the utility and security of donated information. Most of the groundwork has been initiated to achieve this balance, whether in the technical, regulatory, or ethical realm. Several of those considerations include researchers engineering some of the technical tools and methods that can handle the type and amount of health information that needs processing in a safe and effective manner. The NIH is also working with policy-makers to solidify and draft regulations to protect the rights of program participants. And, alongside the PMI, it’s taking several privacy and data security measurements and safeguards that ensure the confidentiality of participants and integrity of information.

Right now, "All of Us" is in it's beta testing phase and will officially launch in Spring, 2018. Enrolling a million volunteers and collecting their health information will take time and collaboration with local health care providers. Research using the cohort will be able to start well before we hit that million mark but as expected, it will be years until we see the resulting advances and treatment. It will all be worth it, though, for our health, the health of our family, and the health of our community.

So, in addition to a healthy dose of schadenfreude at my embarrassing stories, I hope this series has given you some insights into the big world of big data and how we as individuals with nothing but our human condition can contribute to a medical revolution.

Author Bio

Rabab is a postdoctoral scholar at the University of California, San Francisco, working on how newborn nerve cells travel and mature in the developing brain. When not at the bench, she is an avid STEM advocate and a science communicator who contributed to science blogs throughout her graduate studies at the Albert Einstein College of Medicine. In her free time, she likes to read articles on Flipboard and dance the Lindy Hop.