Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TODOs to Dataset Files #103

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions man/ChromHMMiterations.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
\usage{data("ChromHMMiterations")}
\format{
Named list of 3 data.frames: metrics, transition, emission.

TODO add codebook
}
\source{
https://github.com/tdhock/ChromHMM-viz/blob/master/iterations.R
Expand Down
4 changes: 4 additions & 0 deletions man/FluView.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@
}
\description{
Data about Flu outbreaks.

TODO need source

TODO need codebook
}
\usage{data("FluView")}
\format{
Expand Down
2 changes: 2 additions & 0 deletions man/FunctionalPruning.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@
}
\usage{data("FunctionalPruning")}
\format{
TODO add codebook

a named list of 4 data.frames
}
\source{
Expand Down
5 changes: 4 additions & 1 deletion man/PeakConsistency.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,10 @@
the peak.
}
\usage{data("PeakConsistency")}
\format{A list of four data.frames: model, truth, signal, guess.
\format{
TODO add codebook

A list of four data.frames: model, truth, signal, guess.
}
\source{
https://github.com/tdhock/PeakSegJoint-paper/blob/master/figure-consistency.R
Expand Down
2 changes: 2 additions & 0 deletions man/TestROC.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ Five peak detection models were evaluated.
}
\usage{data("TestROC")}
\format{
TODO add codebook

A list of two data frames.
}
\source{
Expand Down
2 changes: 2 additions & 0 deletions man/UStornadoes.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ Tornadoes in the United States from 1950 to 2012
}
\usage{data(UStornadoes)}
\format{
TODO add codebook

A data frame with 41620 observations on the following 32 variables.
\describe{
\item{\code{fips}}{a numeric vector}
Expand Down
2 changes: 2 additions & 0 deletions man/VariantModels.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
Error rates of supervised learning methods for variant calling
}
\description{
TODO add codebook

Several supervised machine learning models were applied to the binary
classification task of predicting True Positive or False Positive
variants, using several filtering scores as input.
Expand Down
2 changes: 2 additions & 0 deletions man/WorldBank.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@
}
\usage{data(WorldBank)}
\format{
TODO add codebook

A data frame with 11342 observations on the following 15 variables.
\describe{
\item{\code{iso2c}}{a character vector}
Expand Down
2 changes: 2 additions & 0 deletions man/climate.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ Temperatures are given in degrees celsius (original data had Kelvin).
}
\usage{data(climate)}
\format{
TODO add codebook

A data frame with 41472 observations on the following 16 variables.
\describe{
\item{\code{time}}{a numeric vector}
Expand Down
2 changes: 2 additions & 0 deletions man/compare.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ level curves of the learned functions.
}
\usage{data(compare)}
\format{
TODO add codebook

List of 4 data.frames: error contains the test error of the learned
models, bayes contains the Bayes classification error of the latent
ranking function applied to the test data, rank contains the ranking
Expand Down
2 changes: 2 additions & 0 deletions man/generation.loci.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ Evolution simulation
}
\usage{data(generation.loci)}
\format{
TODO add codebook

A data frame with 120000 observations on the following 4 variables.
\describe{
\item{\code{locus}}{a numeric vector}
Expand Down
4 changes: 4 additions & 0 deletions man/intreg.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ observed several noisy piecewise constant signals, and we have
weak labels about how many change-points occur in several regions. Max
margin interval regression is an algorithm that uses this information to
learn a penalty function for accurate change-point detection.

TODO need source

TODO need codebook
}
\usage{data(intreg)}
\format{There are 7 related data.frames: signals contains the noisy
Expand Down
2 changes: 2 additions & 0 deletions man/malaria.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@
}
\usage{data("malaria")}
\format{
TODO add codebook

List of 8 data.frames:
$ error.variants :'data.frame': 18800 obs. of 19 variables:
$ regions :'data.frame': 14 obs. of 6 variables:
Expand Down
2 changes: 2 additions & 0 deletions man/mixtureKNN.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@
}
\usage{data("mixtureKNN")}
\format{
TODO add codebook

Named list of 9 data.frames.
}
\source{
Expand Down
5 changes: 4 additions & 1 deletion man/montreal.bikes.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,10 @@
Data on montreal bike usage, paths, and accidents, 2009-2014.
}
\usage{data("montreal.bikes")}
\format{A named list of four data.frames. counter.counts contains dates
\format{
TODO add codebook

A named list of four data.frames. counter.counts contains dates
and counts of bikes at several locations in
Montreal. counter.locations has the geographical coordinates of the
counter locations. path.locations has the geographical coordinates of
Expand Down
2 changes: 2 additions & 0 deletions man/pirates.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
}
\usage{data(pirates)}
\format{
TODO add codebook

A data frame with 6636 observations on the following 14 variables.
\describe{
\item{\code{Reference}}{factor}
Expand Down
2 changes: 2 additions & 0 deletions man/presidential.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions man/prior.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ data sets.
}
\usage{data(prior)}
\format{
TODO add codebook

List of 2 data.frames: accuracy contains the mean and standard error of
the performance measures (sqErr and accuracy), data.set.info contains
meta-data about the dimension and number of positive and negative
Expand Down
2 changes: 2 additions & 0 deletions man/prostateLasso.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@
}
\usage{data("prostateLasso")}
\format{
TODO add codebook

A list of 4 data.frames: path for the piecewise linear coefficient
path, residuals for the prediction error of every model at every data
point, models with one row per regularization parameter, error for
Expand Down
2 changes: 2 additions & 0 deletions man/seals.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

54 changes: 54 additions & 0 deletions man/todo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
## About

This file explains why there are all these TODOs scattered
about some of the `.Rd` files in `man/`. There are 2 kinds
of TODOs:

1. TODO add source
2. TODO add codebook

The 1st appears only in 2 files: `FluView.Rd` and `intreg.Rd`.
It means that the source for the file is missing.

The 2nd appears in all of the files that contain a TODO.
It means that an explanation for each column's variable is
missing. Since variable names are often either truncated or
not obvious, having a brief explanation of each variable
helps the reader use the dataset.

A good example of a dataset with a good-enough codebook is
`diamonds.Rd`. It gives enough information to make the
dataset useable. Good-enough is probably good enough. I
reckon most people don't Animint2 for its datasets.

Sorry in advance if some of this writing seems condescending.
You probably know all of this already. I write like this cuz
I forget things a lot, and documentation helps future me.


## Files with TODOs

Here's the list of files that have at least one TODO
in them:

- ChromHMMiterations
- climate
- compare
- FluView
- FunctionalPruning
- generation.loci
- intreg
- malaria
- mixtureKNN
- montreal.bikes
- PeakConsistency
- pirates
- presidential
- prior
- prostateLasso
- seals
- TestROC
- UStornadoes
- VariantModels
- vervet
- WorldBank
2 changes: 2 additions & 0 deletions man/vervet.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ bacterium.
}
\usage{data(vervet)}
\format{
TODO add codebook

The format is a named list of data.frames: samples contains 64 rows
with sample-specific info, counts contains 1190208 rows with counts of
55mers observed in each sample, and monkeys contains 23 rows with
Expand Down