报告题目：Large Scale Privacy Preserving Data Integration
主 讲 人：Fausto Giunchiglia
The theory says that big data analytics will allow us to understand and predict the evolution of almost any world phenomenon. The practice tells us that, at the moment, there are at least two hidden complexities which limit substantially the full exploitation of analytics, machine learning and AI in general. The first is that, with a few exceptions (e.g., Google, WeChat or Facebook) the big data which are needed, for instance to train our models, do not exist. In practice they must be constructed via the integration of data which come from multiple heterogeneous data silos. And the cost of this operation is very high, growing exponentially with the size of data. The second is that any experiment we have done provides evidence that the amount of information which can be extracted from data is very high, far beyond our initial expectations. And this rises a lot of issues when these data involve, as it is often the case, personal data.
The goal of this talk is to describe a general data integration methodology, and support tools, which allow to cut radically the cost of the data integration while, at the same time being privacy aware by design. We will also describe how it has been applied in a case study in the Health domain in a way to be GDPR compliant, as required by the European legislation.