test
Search publications, data, projects and authors

Thesis

English

ID: <

10670/1.xnn5sj

>

Where these data come from
Secure Distributed MapReduce Protocols : How to have privacy-preserving cloud applications?

Abstract

In the age of social networks and connected objects, many and diverse data are produced at every moment. The analysis of these data has led to a new science called "Big Data". To best handle this constant flow of data, new calculation methods have emerged.This thesis focuses on cryptography applied to processing of large volumes of data, with the aim of protection of user data. In particular, we focus on securing algorithms using the distributed computing MapReduce paradigm to perform a number of primitives (or algorithms) essential for data processing, ranging from the calculation of graph metrics (e.g. PageRank) to SQL queries (i.e. set intersection, aggregation, natural join).In the first part of this thesis, we discuss the multiplication of matrices. We first describe a standard and secure matrix multiplication for the MapReduce architecture that is based on the Paillier’s additive encryption scheme to guarantee the confidentiality of the data. The proposed algorithms correspond to a specific security hypothesis: collusion or not of MapReduce cluster nodes, the general security model being honest-but-curious. The aim is to protect the confidentiality of both matrices, as well as the final result, and this for all participants (matrix owners, calculation nodes, user wishing to compute the result). On the other hand, we also use the matrix multiplication algorithm of Strassen-Winograd, whose asymptotic complexity is O(n^log2(7)) or about O(n^2.81) which is an improvement compared to the standard matrix multiplication. A new version of this algorithm adapted to the MapReduce paradigm is proposed. The safety assumption adopted here is limited to the non-collusion between the cloud and the end user. The version uses the Paillier’s encryption scheme.The second part of this thesis focuses on data protection when relational algebra operations are delegated to a public cloud server using the MapReduce paradigm. In particular, we present a secureintersection solution that allows a cloud user to obtain the intersection of n > 1 relations belonging to n data owners. In this solution, all data owners share a key and a selected data owner sharesa key with each of the remaining keys. Therefore, while this specific data owner stores n keys, the other owners only store two keys. The encryption of the real relation tuple consists in combining the use of asymmetric encryption with a pseudo-random function. Once the data is stored in the cloud, each reducer is assigned a specific relation. If there are n different elements, XOR operations are performed. The proposed solution is very effective. Next, we describe the variants of grouping and aggregation operations that preserve confidentiality in terms of performance and security. The proposed solutions combine the use of pseudo-random functions with the use of homomorphic encryption for COUNT, SUM and AVG operations and order preserving encryption for MIN and MAX operations. Finally, we offer secure versions of two protocols (cascade and hypercube) adapted to the MapReduce paradigm. The solutions consist in using pseudo-random functions to perform equality checks and thus allow joining operations when common components are detected. All the solutions described above are evaluated and their security proven.

Your Feedback

Please give us your feedback and help us make GoTriple better.
Fill in our satisfaction questionnaire and tell us what you like about GoTriple!