GitHub - lipikaramaswamy/DifferentialPrivacyExperiments

Differentially Private Learning // Learning Differential Privacy

This is a repo to record my experiments in differentially private learning

Differential privacy is a mathematical definition of privacy that enables sharing of useful information from sensitive datasets. Differential privacy is a property of a mechanism or algorithm applied to a dataset, and not a property of the dataset itself (as is the case with techniques like anonymization).

Machine learning algorithms are routinely applied to sensitive data to provide useful results. There is incentive to make machine learning differentially private for a variety of purposes, such as automated loan decisions that use sensitive financial information for training. A commonly cited example of this is the recovery of training images from a facial recognition API [Fredrikson et al. 2015].

Machine learning problems are widely solved using iterative optimization to minimize a loss function, and these iterative optimization methods can be made differentially private. In this repo, we implement differentially private stochastic gradient descent (DP-SGD) for optimization of the logistic regression algorithm. This algorithm (shown below) was first introduced by Abadi et al. 2016 in Deep Learning with Differential Privacy. The idea is to clip each gradient, which is a commonly used regularization technique, followed by Gaussian noise addition.

A key component of such algorithms is tracking privacy loss. Basic and advanced composition lead to a fast accumulation of privacy loss. Here, we incorporate information about the algorithm, such as, in each epoch, only a batch of data is used for training. Further, we use the moments accountant, where privacy loss is treated as a random variable and higher moments are used to obtain a tighter bound on privacy loss. Overall, given $\delta$ , the probability of failure of the differentially private mechanism, we can get an upper bound on the privacy loss, $\epsilon$ ,

$\varepsilon \le C \cdot \frac{L/N}{\sigma} \sqrt{T\cdot log\left(1/\delta \right )}$

Results on different datasets can be seen in the notebooks notebooks/logistic_regression-syn_data.ipynb and notebooks/logistic_regression-breast_cancer.ipynb.

Note that some implementations of DP Logistic Regression are already available, such as that provided by IBM's DiffPrivLib. The mechanisms here implement the vector mechanism for objective perturbation as described by Chaudhuri et al.. A quick exploration of this library is also provided in this repo under ibm-diffprivlib.

Forthcoming work includes selection of hyperparameters following the methodology of Talwar and Liu 2019.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data		data
ibm-diffprivlib		ibm-diffprivlib
images		images
model_results		model_results
momentsaccountant		momentsaccountant
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Differentially Private Learning // Learning Differential Privacy

About

Releases

Packages

Languages

lipikaramaswamy/DifferentialPrivacyExperiments

Folders and files

Latest commit

History

Repository files navigation

Differentially Private Learning // Learning Differential Privacy

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages