DOI: 10.1093/bioinformatics/btad531 ISSN:

Accessibility of covariance information creates vulnerability in Federated Learning frameworks

Manuel Huth, Jonas Arruda, Roy Gusinow, Lorenzo Contento, Evelina Tacconelli, Jan Hasenauer
  • Computational Mathematics
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Molecular Biology
  • Biochemistry
  • Statistics and Probability

Abstract

Motivation

Federated Learning (FL) is gaining traction in various fields as it enables integrative data analysis without sharing sensitive data, such as in healthcare. However, the risk of data leakage caused by malicious attacks must be considered. In this study, we introduce a novel attack algorithm that relies on being able to compute sample means, sample covariances, and construct known linearly independent vectors on the data owner side.

Results

We show that these basic functionalities, which are available in several established FL frameworks, are sufficient to reconstruct privacy-protected data. Additionally, the attack algorithm is robust to defense strategies that involve adding random noise. We demonstrate the limitations of existing frameworks and propose potential defense strategies analyzing the implications of using differential privacy. The novel insights presented in this study will aid in the improvement of FL frameworks.

Availability and Implementation

The code examples are provided at GitHub (https://github.com/manuhuth/Data-Leakage-From-Covariances.git). The CNSIM1 data set which we used in the manuscript is available within the DSData R package (https://github.com/datashield/DSData/tree/main/data).

Supplementary information

Mathematical proves and further information are available online.

More from our Archive