Geometric Data Perturbation for
Privacy-preserving Data Classification
Keke
Chen and Ling Liu
|
|
|
|
This project investigates a
random-geometric-transformation based data-perturbation approach for privacy
preserving data classification. The goal of this perturbation approach is
two-fold: preserving the utility of data in terms of classification modeling,
and preserving the privacy of data. To achieve the first goal, we identify that
many classification models utilize the geometric properties of datasets,
which can be preserved by geometric transformation. We prove that the three
types of well-known classifiers will deliver the same (or very similar) performance
over the geometrically perturbed dataset as over the original dataset. As a
result, this perturbation approach guarantees almost no loss of accuracy for
three popular classification methods. To reach the second goal, we propose a
multi-column privacy model to address the problems of evaluating privacy
quality for multidimensional perturbation, and develop an attack-resilient
perturbation optimization method. We analyze three types of inference
attacks: naive estimation, ICA-based reconstruction, and distribution-based
attacks with the proposed privacy metric. Based on the attack analysis, a
randomized optimization method is developed to optimize perturbation. Our
initial experiments show that this approach can provide high privacy
guarantee while preserving the accuracy for the discussed classifiers. More related geometric transformations will
be investigated to meet the requirements of different privacy-preserving
mining tasks and models. |
|
Matlab code.
Note: add the whole directory including the subdirectory into the matlab
path. The main function adv_geo_pert1(). |
|
Representative papers:
|