Publications

Privacy - Preserving Clustering

AuthorKeller, Hannah; Möllering, Helen; Schneider, Thomas; Yalame, Hossein
Date2021
TypeConference Proceedings
AbstractClustering is an unsupervised machine learning technique that divides a data set into groups. The manifold applications of clustering range from segmenting a market using consumer preferences to grouping photos of diseased organs in medical imaging. These scenarios inevitably require sensitive data such as business secrets or personal health records. We consider multiple mutually distrusting parties, e.g., corporations or hospitals that do not want to or are not permitted to share their clients’ or patients’ data due to legislation such as GDPR or HIPAA or to protect their market position and business secrets. Nevertheless, these parties wish to cluster their joint data, since a larger data pool usually results in a better result providing more insights. A growing body of research proposes privacy-preserving variants of clus- tering algorithms. These algorithms operate in a semi-honest security model, which assumes that all parties honestly follow the protocol. However, exist- ing implementations are impractical for real-world usage due to immense computational costs or privacy-compromising data leakage during the cluster- ing process. To the best of our knowledge, the few clustering schemes that fully preserve privacy focus on the simple and efficient k-means algorithm; however, the effectiveness of k-means depends on many assumptions. For example, the number of clusters must be known in advance, and the input data cannot include nominal variables or outliers. These assumptions often do not hold for real-world clustering applications, limiting k-means’ practicality. In our work, we explore the suitability of advanced clustering algorithms for efficient crypto-oriented clustering. We identify affinity propagation to be the most promising algorithm candidate. Affinity propagation and k-means have comparable computational complexities of O(n2), and affinity propagation provides more flexibility in terms of tolerance to outliers, compatibility with var- ious data types, and the selection of tunable parameters. Furthermore, affinity propagation involves only operations that can be implemented relatively effi- ciently using secure computation techniques, such as addition and comparison. The algorithm was also successfully applied for privacy-relevant use cases, e.g., the detection of genes from chromosomal data. Based on our insights, we design the first privacy-preserving affinity propa- gation clustering protocol using secure computation techniques. Moreover, our protocol was implemented and benchmarked with the SPDZ framework and enables private clustering in a malicious security model for the first time, providing protection from strong adversaries that deviate from the protocol.
Conference32. Kryptotag - crypto day matters
Urlhttps://tubiblio.ulb.tu-darmstadt.de/id/eprint/126231