Skip to content

lipikaramaswamy/DifferentialPrivacyExperiments

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Differentially Private Learning // Learning Differential Privacy

This is a repo to record my experiments in differentially private learning

Differential privacy is a mathematical definition of privacy that enables sharing of useful information from sensitive datasets. Differential privacy is a property of a mechanism or algorithm applied to a dataset, and not a property of the dataset itself (as is the case with techniques like anonymization).

Machine learning algorithms are routinely applied to sensitive data to provide useful results. There is incentive to make machine learning differentially private for a variety of purposes, such as automated loan decisions that use sensitive financial information for training. A commonly cited example of this is the recovery of training images from a facial recognition API [Fredrikson et al. 2015].

Machine learning problems are widely solved using iterative optimization to minimize a loss function, and these iterative optimization methods can be made differentially private. In this repo, we implement differentially private stochastic gradient descent (DP-SGD) for optimization of the logistic regression algorithm. This algorithm (shown below) was first introduced by Abadi et al. 2016 in Deep Learning with Differential Privacy. The idea is to clip each gradient, which is a commonly used regularization technique, followed by Gaussian noise addition.

DP-SGD

A key component of such algorithms is tracking privacy loss. Basic and advanced composition lead to a fast accumulation of privacy loss. Here, we incorporate information about the algorithm, such as, in each epoch, only a batch of data is used for training. Further, we use the moments accountant, where privacy loss is treated as a random variable and higher moments are used to obtain a tighter bound on privacy loss. Overall, given , the probability of failure of the differentially private mechanism, we can get an upper bound on the privacy loss, ,

Results on different datasets can be seen in the notebooks notebooks/logistic_regression-syn_data.ipynb and notebooks/logistic_regression-breast_cancer.ipynb.

Note that some implementations of DP Logistic Regression are already available, such as that provided by IBM's DiffPrivLib. The mechanisms here implement the vector mechanism for objective perturbation as described by Chaudhuri et al.. A quick exploration of this library is also provided in this repo under ibm-diffprivlib.

Forthcoming work includes selection of hyperparameters following the methodology of Talwar and Liu 2019.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published