Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add new workflows to readme #547

Merged
merged 1 commit into from
Sep 28, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
125 changes: 60 additions & 65 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,52 +13,29 @@ This textbook is offered under
the [Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License](https://creativecommons.org/licenses/by-nc-sa/4.0/).
See [the license file](LICENSE.md) for more information.

## Contributing
## Development

Primary development in this repository happens on the `main` branch. If you
want to contribute to the book, please branch off of `main` and make a pull
request into `main`.
### Setup

The `production` branch contains the source material for the live, publicly
viewable HTML book. The website is served from the `gh-pages` branch, which is
automatically built from the `production` branch.
Building the book requires Docker (instructors here: https://docs.docker.com/get-docker/)

### Update build environment
You can update the build environment for the book by making changes to
`Dockerfile` in the root of the repository in the `main` branch. If you push
any changes to the `Dockerfile` on the `main` branch, GitHub will trigger a
rebuild of the docker image, push it to DockerHub, and update the
`build_html.sh` and `build_pdf.sh` scripts with the new image tag.
### Build locally

### Update public html
You can update the live, publicly viewable HTML book by making changes to any
`*.Rmd` file, or any file in the `img/` or `data/` folders in the `production`
branch. If you push any changes to these files/folders on the `production`
branch, GitHub will trigger a rebuild of the public HTML site and push it to
the `gh-pages` branch.
You must have at least 8GB of RAM (and ideally more like 16GB RAM) to build the book.

## Building the book locally

In order to build the book, you need to install [Docker](https://docs.docker.com/get-docker/).
You must have at least **8GB of RAM** (and ideally at least 16GB RAM) to build the book.

To build the **html version** of the book, navigate to the repository root folder and run
You can build the HTML version of the book on your own machine by running
```
./build_html.sh
```
from the command line. This command automatically spawns a docker container
with the `ubcdsci/intro-to-ds` image, renders the book within the container,
and then stops the container. The book HTML files will be located in the `docs/` folder
after the build completes. If you did not already have the `ubcdsci/intro-to-ds` image pulled,
the script will automatically pull the image from DockerHub.

To build the **PDF version** of the book, instead run
```
in the root directory of this repository. The book can be viewed in your browser by opening the `docs/index.html` file.

You can build the PDF version of the book on your own machine by running
```
./build_pdf.sh
```
This command again spawns a docker container and render the PDF version of the book inside the container.
in the root directory of this repository. The book can be viewed in a PDF reader by opening `docs/latex/python.pdf`.

### Working with RStudio (HTML only)
#### Working with RStudio (HTML only)

If you want to edit the source material and build the book using RStudio, navigate to the repository root and run
```
Expand All @@ -72,9 +49,40 @@ bookdown::render_book('index.Rmd', 'bookdown::gitbook')
```
When you are done working, make sure to type `docker-compose down` to shut down the container.

### Contributing

Primary development in this repository happens on the `main` branch. If you want to contribute to the book,
please branch off of `main` and make a pull request into `main`. You cannot commit directly to `main`.

The `production` branch contains the source material corresponding to the current publicly-viewable version of the book website.

The `gh-pages` branch serves the current book website at https://datasciencebook.ca.

### Workflows

#### Book deployment

You can update the live, publicly viewable HTML book by making changes to the `source/` folder in the `production` branch (e.g. by merging `main` into `production`).
GitHub will trigger a rebuild of the public HTML site, and store the built book in the root folder of the `gh-pages` branch.

#### `main` deploy previews

Any commit to `source/**` on the `main` branch (from a merged PR) will trigger a rebuild of the development preview site served at `https://datasciencebook.ca/dev`.
The built preview book will be stored in the `dev/` folder on the `gh-pages` branch.

#### PR deploy previews

Any PR to `source/` will trigger a build of a PR preview site at `https://datasciencebook.ca/pull###`, where `###` is the number of the pull request.
The built preview book will be stored in the `pull###/` folder on the `gh-pages` branch.

#### Build environment updates

Any PR to `Dockerfile` will trigger a rebuild of the docker image, push it to DockerHub, and update the image tags in the `build_html.sh` and `build_pdf.sh` scripts on the PR automatically.
This new build environment will be used for the PR deploy preview mentioned above.

## Style Guide

#### General
### General
- **80 character line limit!** This is necessary to make git diffs useful
- numbers in text should be english words ("four common mistakes" not "4 common mistakes") unless there are units (40km, not forty km)
- use Oxford commas ("a, b, and c" not "a, b and c")
Expand All @@ -90,7 +98,7 @@ When you are done working, make sure to type `docker-compose down` to shut down
There are likely exceptions to this rule though.
- Book titles in the text should be typeset in italics (e.g. *R for Data Science*)

#### Code blocks
### Code blocks
- Use the knitr label format `##-[name with only alphanumeric + hyphens]` where
the `##` is the 2-digit chapter number, e.g. `03-test-name` for a label `test-name` in chapter 3
- Make sure to get syntax highlighting by specifying the language in each code block:
Expand All @@ -115,7 +123,7 @@ When you are done working, make sure to type `docker-compose down` to shut down
- use `slice`, `slice_min`, `slice_max` (not `top_n`)
- just `pull(colname)`, don't `select` first

#### Section headings
### Section headings
- All (sub)section headings should be sentence case ("Loading a tabular data set", not "Loading a Tabular Data Set")
- Make sure that subsections occur in 1-step hierarchies (no subsubsection directly below subsection, for example)
- Make sure that `{-}` is used wherever unnumbered headings are required
Expand All @@ -126,11 +134,11 @@ bookdown::gitbook:
toc_depth: 2
```

#### Learning objectives
### Learning objectives
- when saying that students will do things in code, always say "in R"
- "you will be able to" (not "students will be able to", "the reader will be able to")

#### Captions
### Captions
- captions should be sentence formatted and end with a period
- If you have special characters (particularly underscores, quotation marks, plus signs, other LaTeX math symbols) make sure to separate
the caption out of the code chunk like so
Expand All @@ -143,10 +151,10 @@ bookdown::gitbook:
\`\`\`
```

#### Equations
### Equations
- make sure all equations get capitalized labels ("Equation \\@ref(blah)", not "equation below" or "equation above")

#### Figures
### Figures
- make sure all figures get (capitalized) labels ("Figure \\@ref(blah)", not "figure below" or "figure above")
- make sure all figures get captions
- specify image widths of pngs and jpegs in terms of linewidth percent
Expand All @@ -160,21 +168,21 @@ for plots we create in R use `fig.width` and `fig.height`.
- Fig size for bar charts should be: `fig.width=5, fig.height=3` (an exception are figs 1.7 & 1.8 so that we can read the axis labels)
- cropping width for syntax diagrams is 1625 (done using `image_crop`)

#### Tables
### Tables
- make sure all tables get capitalized labels ("Table \\@ref(blah)", not "table below" or "table above")
- make sure all tables get captions
- make sure the row + column spacing is reasonable
- Do not put links in table captions, it breaks pdf rendering
- Do not put underscores in table captions, it breaks pdf rendering

#### Note boxes
### Note boxes
- note boxes should be typeset as quote boxes using `>` and start with **Note:**

#### Bibliography
### Bibliography
- do not put "et al" or "and others"; always use the full list of authors, BibTeX will choose how to abbreviate
- read https://trevorcampbell.me/html/bibtex.html and make sure our bib follows this convention

#### Naming conventions
### Naming conventions
- K-means (not $K$-\*, K means, Kmeans)
- K-nearest neighbors (not $K$-\*, K nearest neighbors, K nearest neighbor, use US spelling neighbor not neighbour). Note that "K-nearest neighbor" is not the singular form; "K-nearest neighbors" is
- K-NN (not $K$-\*, KNN, K NN, $K$NN, K-nn)
Expand All @@ -191,27 +199,27 @@ for plots we create in R use `fig.width` and `fig.height`.
- numerical variable (not quantitative variable)
- categorical variable (not class variable)

#### Punctuation
### Punctuation
- emdashes should have no surrounding spaces. `This kind of typesetting—which is awesome—is correct!` and `Typesetting with spaces around em-dashes — which is bad — is not correct`
- make sure `\index` commands don't break punctuation spacing. E.g. `This is an item \index{item}; it is good` will typeset with an erroneous space after item, i.e. `This is an item ; it is good`

#### Common typos to check for
### Common typos to check for
- RMPSE: should be RMSPE
- boostrap: should be bootstrap

#### Use American spelling
### Use American spelling
Generally the book uses American spelling. Some common British vs American and Canadian vs American gotchas:
- o vs ou: neighbor and color (not neighbour and colour)
- single vs double ell: labeling and labeled (not labelling and labelled)
- z vs s: summarize (not summarise)
- c vs s: defense (not defence)
- er vs re: center (not centre)

#### Whitespace
### Whitespace
We need a line of whitespace before and after code fences (code surrounded by three backticks above and below). This is for readability,
and it is essential for figure captions.

#### PDF Output
### PDF Output
These are absolute last steps when rendering the PDF output:
- Look for and fix bad line breaks (e.g. with only one word on the next line, orphans, and widows)
- Look for and fix bad line wraps in code and text
Expand All @@ -226,19 +234,6 @@ These are absolute last steps when rendering the PDF output:
sense in the hardcopy book version (i.e. nothing like "click this"). Many links appear in the additional resources: make sure the
text-replacement of the URL contains enough information for someone to find the resource (without being able to click the link)

#### HTML Output
### HTML Output
- Look for broken references (I *think* these end up as `??`)
- Look for uncentered images

## Updating the textbook data
Data sets are collected and curated by `data/retrieve_data.ipynb`. To run that notebook in the Docker container type the following in the terminal:

```
docker run --rm -it -p 8888:8888 -v $PWD:/home/rstudio/introduction-to-datascience ubcdsci/intro-to-ds jupyter notebook --ip=0.0.0.0 --allow-root
```

## Repository Organization / Important Files
- The files `index.Rmd` and `##-name.Rmd` are [R-markdown](https://rmarkdown.rstudio.com/) chapter contents to be parsed by [Bookdown](https://bookdown.org/)
- `_bookdown.yml` sets the output directory (`docs/`) and default chapter name
- `img/` contains custom images to be used in the text; note this is not all of the images as some are generated by R code when compiling
- `data/` stores datasets processed during compile