Supervised machine learning project repository template

This template repository provides an organizational structure for more quickly setting up predictive modeling/supervised machine learning projects. The directory structure accomodates projects that focus mainly on model prototyping (i.e., exploration and assessment) as opposed to building a production-level data product. This README should be replaced by README_template.md for new repositories created from this template repository.

# Directory tree

{project name}
├── README.md
├── README_template.md
├── data
│   ├── processed
│   │   └── dataset_0001
│   ├── raw
│   │   └── dataset_0001
│   └── tmp
│       └── dataset_0001
├── models
│   ├── 0001
│   │   └── README.md
│   └── README.md
├── notebooks
│   ├── 1_preprocessing
│   ├── 2_eda
│   ├── 3_modeling
│   └── 4_reports
├── results
│   ├── 1_preprocessing
│   │   ├── 0001
│   │   │   ├── 0001
│   │   │   │   ├── figures
│   │   │   │   ├── serialized
│   │   │   │   └── tables
│   │   │   └── README.md
│   │   └── README.md
│   ├── 2_eda
│   │   ├── 0001
│   │   │   ├── 0001
│   │   │   │   ├── figures
│   │   │   │   ├── serialized
│   │   │   │   └── tables
│   │   │   └── README.md
│   │   └── README.md
│   ├── 3_modeling_and_inference
│   │   ├── 0001
│   │   │   ├── 0001
│   │   │   │   ├── figures
│   │   │   │   ├── serialized
│   │   │   │   └── tables
│   │   │   └── README.md
│   │   └── README.md
│   └── README.md
├── src
│   ├── bash
│   ├── python
│   └── r
└── tests

Subdirectory descriptions

data: Consists of data used for model training and evaluation. Original data files are retrieved and stored in the raw folder. Intermediate files during processing are held in tmp before the final files used for model prototyping are stored in processed. There should be no dependencies on files in tmp so that they can be deleted as needed.
models: Consists of files containing trained models ready to load into a programming environment and applied to new data. As such these files will mainly be serialized data objects such as R .rds files or Python pickled files.
notebooks: Contains interactive notebooks (e.g., Jupyter) for various stages of predictive modeling projects. Data processing is logged in 1_preprocessing, exploratory data analysis is logged in 2_eda, model building and evaluation is logged in 3_modeling, and deliverable reports summarizing data characteristics, model performance, etc. are stored in 4_reports.
results: Contains visualizations and tables for deliverable reports, as well as serialized results files that can be loaded into programming environments for further use. The top level follows the same organizational logic as notebooks/. The subdirectories further divide results by type of analysis and analysis instance, using a sequential numbering system.
src: Contains source code and utility scripts used in data preparation and analysis.
tests: Contains code for unit testing the code maintained in the src folder.

Note: Typically in model prototyping, multiple iterations of a processed dataset, model, or analysis are generated. This is considered in the directory structure through the use of numeric subdirectories that categorize files relevant to a specific dataset, model, analysis type, or analysis iteration. In the directory tree, the incremental numbering for subdirectories uses a 4-digit ID format that can accomodate 9999 analysis types/variations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Supervised machine learning project repository template

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
models		models
notebooks		notebooks
results		results
src		src
tests		tests
LICENSE		LICENSE
README.md		README.md
README_template.md		README_template.md

License

bryancquach/supervised_ml_project_template

Folders and files

Latest commit

History

Repository files navigation

Supervised machine learning project repository template

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages