“
Big Data Set Privacy Preserving through Sensitive Attribute-based Grouping
There is a growing trend towards attacks on database privacy due to great value of privacy information stored in big data set. Public’s privacy are under threats as adversaries are continuously cracking their popular targets such as bank accounts. We find a fact that existing models such as K-anonymity, group records based on quasi-identifiers, which harms the data utility a lot. Motivated by this, we propose a sensitive attribute-based privacy model. Our model is the early work of grouping records based on sensitive attributes instead of quasi-identifiers which is popular in existing models. Random shuffle is used to maximize information entropy inside a group while the marginal distribution maintains the same before and after shuffling, therefore, our method maintains a better data utility than existing models. We have conducted extensive experiments which confirm that our model can achieve a satisfying privacy level without sacrificing data utility while guarantee a higher efficiency.