Skip to content

Repo of projects using machine learning, statistical modeling, and deep learning in data science

Notifications You must be signed in to change notification settings

SiyuWu528/DataScience

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 

Repository files navigation

Machine Learning, Statistical Modeling, and Deep Learning Notebooks

Description

This repository is a curated collection of Jupyter notebooks designed to demonstrate and document the processes of machine learning, statistical modeling, and deep learning. The notebooks serve as a comprehensive guide that covers the theory and practical application of data science techniques in different scenarios. Each notebook is self-contained and focuses on a specific topic within the data science workflow, from data preparation and visualization to advanced model training and evaluation.

Key features of this project:

  • Modularity: Each notebook is designed to be independent and focuses on a single aspect of data science.
  • Transferability: While the examples provided are self-contained, the methods and code snippets are designed to be easily transferable to other data science projects.
  • Best Practices: Emphasizes data science best practices, including code readability, data exploration, and effective visualization techniques.
  • Reproducibility: With detailed comments and explanations, other users can follow along and reproduce the results or adapt the workflows to their own datasets.

Table of Contents

Installation

To set up a local development environment:

  1. Clone the repository:
git clone https://github.com/SiyuWu528/data_science.git
  1. Navigate to the cloned repository:
cd data-science
  1. It is recommended to create a virtual environment to keep the dependencies required by the project separate from your global Python environment: For virtualenv:
virtualenv env
source env/bin/activate  # On Windows use `env\Scripts\activate`

For conda environments:

conda create --name ds-notebooks python=3.8
conda activate ds-notebooks
  1. Start Jupyter Notebook or JupyterLab:
jupyter notebook

or

jupyter lab

Usage

The repository is organized into several notebooks, each focusing on a distinct topic within machine learning, statistical modeling, and deep learning, labeled as the last part in the name of each notebook.

To begin,

  1. launch Jupyter Notebook or JupyterLab.
  2. Navigate through the notebook/ directory.
  3. Open the notebook of your choice.
  4. Read through the explanations and run each cell sequentially to understand how the code works.
  5. Feel free to modify the code to experiment with different datasets or parameters.

Structure

Here is an overview of the project structure:

data/: This folder contains datasets used across the notebooks.

notebooks/: Jupyter notebooks are organized here, with a clear naming convention for ease of navigation.

License

This project is released under the MIT License.

Contact

If you have any questions or want to reach out regarding the project, please contact:

Siyu Wu - Project Lead - SiyuWu528

Acknowledgements

Thomas Jefferson Lab and Penn State Berks Physics Department provide data and support for generative models (GAN) under Dr. Alex Prokudin's ongoing project at the intersection of nuclear physics and deep learning. Most of the Machine Learning models were built for homework and projects of the 'Datamining 1' class instructed by Dr. Lin Lu and used the data provided by her class. Some Deep Learning Models and Statistical Models were built for homework and projects of the 'Datamining 2' class instructed by Dr. Aron Laska and used the data provided by his class.

About

Repo of projects using machine learning, statistical modeling, and deep learning in data science

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published