Protecting Brain Data with Federated Learning

Emily Einhorn
June 12, 2020

It’s no secret that our personal data is collected and used to develop machine learning models. The resulting algorithms can be used by private companies, research organizations, and public institutions to make predictions or decisions based on patterns from our data. This holds true within the field of neurotechnology, where increasing troves of brain data are being amassed through brain-computer interfaces (BCIs) and other neurotechnology devices. 

The widespread use of personal data to train machine learning models has spurred much discussion about data privacy. As data protection becomes an increasing concern, researchers have focused more on developing new model-training methods that emphasize privacy and security. One of the tools that is becoming prominent within this sphere is federated learning. Given the sensitive nature of brain data, federated learning could be particularly useful in the field of neurotechnology.

Federated Learning

Federated learning is a method popularized by Google that helps improve the accuracy of machine learning models. Machine learning is a subfield of artificial intelligence in which mathematical algorithms consume large amounts of data that then inform a model's prediction and decision making. For the best results, training data needs to be compiled from a wide range of sources, whether that's personal devices like smartphones or the local servers of an organization, like health records at a hospital. Typically, this data will be copied from various sources and pooled into the centralized database that the developers use as the training set for their machine learning model. This can be problematic because it gives developers—particularly those operating under less regulated circumstances—full access to analyze and share that data. Such practices could potentially expose personal identities or breach sensitive information.  

Unlike standard methods of machine learning, federated learning allows that same user data to train the predictive model while allowing the data to remain locally on the individual device or local server. Each local entity can download the model and its associated algorithm, train the model with user data on the device or server, which in turn improves the model. Then this updated model is sent back to the developer’s centralized server or cloud, improving the predictive algorithm. The developer’s model is therefore able to benefit off of the data it absorbed from the device without being granted access to the data itself.

Federated Learning and Brain Data 

When it comes to neurotechnology and brain data, federated learning could be a valuable tool that would allow researchers, clinicians, and companies to access the volumes of brain data collected through BCI devices while minimizing data privacy concerns. BCIs are becoming increasingly used. Many of these BCI companies are aggregating the resulting brain data on their own internal databases. This data can be manipulated and sold, which could prove poor privacy practice for intimate brain data. Federated learning could provide an avenue for this data to be shared with medical researchers, clinicians, or even other companies without giving them full possession over the data. 

Rather than migrating BCI data over to a centralized server to train a machine learning model, developers could use federated learning to query many BCIs locally and send the resulting data insights back to the centralized database. This could be applied in all sorts of beneficial ways. For example, machine learning models could help diagnose neurological diseases by consuming large quantities of brain data that help the algorithm predict neurological abnormalities. Rather than a research group relying on a single hospital’s health records or publicly available case records to train this model, the researchers could devise an algorithm that queries many hospitals’ servers, medical devices, and other relevant databases, all without removing the data from their local servers or devices. This could allow the predictive algorithm to become a diagnostic “expert,” as it would integrate heaps of case information into its model without gaining full access to the data itself or infringing upon patient privacy.

Brain data collected from BCIs can be used to devise multitudes of beneficial machine learning and AI tools. While privacy must be a crucial factor within this innovation, protecting data can be accomplished through other means beside restricting data use outright. Given the benefits that brain data could have for clinical care and scientific advancement, we should be looking for ways to use brain data safely. New tools like federated learning do just that by creating ways to preserve privacy, while allowing innovation to thrive.