From e38b936ef5bc8df9cb3741b9165fa19a8e33284e Mon Sep 17 00:00:00 2001 From: ampurr <99299900+ampurr@users.noreply.github.com> Date: Thu, 10 Aug 2023 23:52:48 -0400 Subject: [PATCH 1/5] added 2 TODOs to 2 .Rd files in man/ TODO need source TODO need codebook --- man/FluView.Rd | 3 +++ man/intreg.Rd | 4 ++++ 2 files changed, 7 insertions(+) diff --git a/man/FluView.Rd b/man/FluView.Rd index a91c082a..ccfdb4f5 100644 --- a/man/FluView.Rd +++ b/man/FluView.Rd @@ -6,6 +6,9 @@ } \description{ Data about Flu outbreaks. + + TODO need source + TODO need codebook } \usage{data("FluView")} \format{ diff --git a/man/intreg.Rd b/man/intreg.Rd index 593c2768..ab05b363 100644 --- a/man/intreg.Rd +++ b/man/intreg.Rd @@ -10,6 +10,10 @@ observed several noisy piecewise constant signals, and we have weak labels about how many change-points occur in several regions. Max margin interval regression is an algorithm that uses this information to learn a penalty function for accurate change-point detection. + +TODO need source + +TODO need codebook } \usage{data(intreg)} \format{There are 7 related data.frames: signals contains the noisy From e917df1a21ecbb8a35b63e876812fd7ef0fbc223 Mon Sep 17 00:00:00 2001 From: ampurr <99299900+ampurr@users.noreply.github.com> Date: Thu, 10 Aug 2023 23:53:18 -0400 Subject: [PATCH 2/5] added space to FluView TODO --- man/FluView.Rd | 1 + 1 file changed, 1 insertion(+) diff --git a/man/FluView.Rd b/man/FluView.Rd index ccfdb4f5..8b784f00 100644 --- a/man/FluView.Rd +++ b/man/FluView.Rd @@ -8,6 +8,7 @@ Data about Flu outbreaks. TODO need source + TODO need codebook } \usage{data("FluView")} From f72500220a8ce0c3099809236771bc4a090c24ff Mon Sep 17 00:00:00 2001 From: ampurr <99299900+ampurr@users.noreply.github.com> Date: Fri, 11 Aug 2023 00:06:19 -0400 Subject: [PATCH 3/5] added 1 TODO to 19 .Rd files in mna/ TODO add codebook --- man/ChromHMMiterations.Rd | 2 ++ man/FunctionalPruning.Rd | 2 ++ man/PeakConsistency.Rd | 5 ++++- man/TestROC.Rd | 2 ++ man/UStornadoes.Rd | 2 ++ man/VariantModels.Rd | 2 ++ man/WorldBank.Rd | 2 ++ man/climate.Rd | 2 ++ man/compare.Rd | 2 ++ man/generation.loci.Rd | 2 ++ man/malaria.Rd | 2 ++ man/mixtureKNN.Rd | 2 ++ man/montreal.bikes.Rd | 5 ++++- man/pirates.Rd | 2 ++ man/presidential.Rd | 2 ++ man/prior.Rd | 2 ++ man/prostateLasso.Rd | 2 ++ man/seals.Rd | 2 ++ man/vervet.Rd | 2 ++ 19 files changed, 42 insertions(+), 2 deletions(-) diff --git a/man/ChromHMMiterations.Rd b/man/ChromHMMiterations.Rd index 88279b9c..69bf20fa 100644 --- a/man/ChromHMMiterations.Rd +++ b/man/ChromHMMiterations.Rd @@ -12,6 +12,8 @@ \usage{data("ChromHMMiterations")} \format{ Named list of 3 data.frames: metrics, transition, emission. + + TODO add codebook } \source{ https://github.com/tdhock/ChromHMM-viz/blob/master/iterations.R diff --git a/man/FunctionalPruning.Rd b/man/FunctionalPruning.Rd index 610a53d1..4edd53c5 100644 --- a/man/FunctionalPruning.Rd +++ b/man/FunctionalPruning.Rd @@ -9,6 +9,8 @@ } \usage{data("FunctionalPruning")} \format{ + TODO add codebook + a named list of 4 data.frames } \source{ diff --git a/man/PeakConsistency.Rd b/man/PeakConsistency.Rd index ac9f2edb..f3185646 100644 --- a/man/PeakConsistency.Rd +++ b/man/PeakConsistency.Rd @@ -10,7 +10,10 @@ the peak. } \usage{data("PeakConsistency")} -\format{A list of four data.frames: model, truth, signal, guess. +\format{ +TODO add codebook + +A list of four data.frames: model, truth, signal, guess. } \source{ https://github.com/tdhock/PeakSegJoint-paper/blob/master/figure-consistency.R diff --git a/man/TestROC.Rd b/man/TestROC.Rd index 8fb97dc4..388a012d 100644 --- a/man/TestROC.Rd +++ b/man/TestROC.Rd @@ -9,6 +9,8 @@ Five peak detection models were evaluated. } \usage{data("TestROC")} \format{ + TODO add codebook + A list of two data frames. } \source{ diff --git a/man/UStornadoes.Rd b/man/UStornadoes.Rd index 76d201c9..32883684 100644 --- a/man/UStornadoes.Rd +++ b/man/UStornadoes.Rd @@ -9,6 +9,8 @@ Tornadoes in the United States from 1950 to 2012 } \usage{data(UStornadoes)} \format{ + TODO add codebook + A data frame with 41620 observations on the following 32 variables. \describe{ \item{\code{fips}}{a numeric vector} diff --git a/man/VariantModels.Rd b/man/VariantModels.Rd index d4a8717d..81017816 100644 --- a/man/VariantModels.Rd +++ b/man/VariantModels.Rd @@ -5,6 +5,8 @@ Error rates of supervised learning methods for variant calling } \description{ + TODO add codebook + Several supervised machine learning models were applied to the binary classification task of predicting True Positive or False Positive variants, using several filtering scores as input. diff --git a/man/WorldBank.Rd b/man/WorldBank.Rd index f0f08135..be904dd8 100644 --- a/man/WorldBank.Rd +++ b/man/WorldBank.Rd @@ -9,6 +9,8 @@ } \usage{data(WorldBank)} \format{ + TODO add codebook + A data frame with 11342 observations on the following 15 variables. \describe{ \item{\code{iso2c}}{a character vector} diff --git a/man/climate.Rd b/man/climate.Rd index 23c21645..13a31047 100644 --- a/man/climate.Rd +++ b/man/climate.Rd @@ -13,6 +13,8 @@ Temperatures are given in degrees celsius (original data had Kelvin). } \usage{data(climate)} \format{ + TODO add codebook + A data frame with 41472 observations on the following 16 variables. \describe{ \item{\code{time}}{a numeric vector} diff --git a/man/compare.Rd b/man/compare.Rd index 3e12485b..c8f128ba 100644 --- a/man/compare.Rd +++ b/man/compare.Rd @@ -15,6 +15,8 @@ level curves of the learned functions. } \usage{data(compare)} \format{ +TODO add codebook + List of 4 data.frames: error contains the test error of the learned models, bayes contains the Bayes classification error of the latent ranking function applied to the test data, rank contains the ranking diff --git a/man/generation.loci.Rd b/man/generation.loci.Rd index 8bd51378..41f3a22e 100644 --- a/man/generation.loci.Rd +++ b/man/generation.loci.Rd @@ -10,6 +10,8 @@ Evolution simulation } \usage{data(generation.loci)} \format{ + TODO add codebook + A data frame with 120000 observations on the following 4 variables. \describe{ \item{\code{locus}}{a numeric vector} diff --git a/man/malaria.Rd b/man/malaria.Rd index 504ffb96..ba2d5300 100644 --- a/man/malaria.Rd +++ b/man/malaria.Rd @@ -14,6 +14,8 @@ } \usage{data("malaria")} \format{ +TODO add codebook + List of 8 data.frames: $ error.variants :'data.frame': 18800 obs. of 19 variables: $ regions :'data.frame': 14 obs. of 6 variables: diff --git a/man/mixtureKNN.Rd b/man/mixtureKNN.Rd index be371aef..a14cf819 100644 --- a/man/mixtureKNN.Rd +++ b/man/mixtureKNN.Rd @@ -9,6 +9,8 @@ } \usage{data("mixtureKNN")} \format{ + TODO add codebook + Named list of 9 data.frames. } \source{ diff --git a/man/montreal.bikes.Rd b/man/montreal.bikes.Rd index 8a10d220..7ba1a1e1 100644 --- a/man/montreal.bikes.Rd +++ b/man/montreal.bikes.Rd @@ -8,7 +8,10 @@ Data on montreal bike usage, paths, and accidents, 2009-2014. } \usage{data("montreal.bikes")} -\format{A named list of four data.frames. counter.counts contains dates +\format{ + TODO add codebook + + A named list of four data.frames. counter.counts contains dates and counts of bikes at several locations in Montreal. counter.locations has the geographical coordinates of the counter locations. path.locations has the geographical coordinates of diff --git a/man/pirates.Rd b/man/pirates.Rd index 3acaf740..0007a125 100644 --- a/man/pirates.Rd +++ b/man/pirates.Rd @@ -10,6 +10,8 @@ } \usage{data(pirates)} \format{ + TODO add codebook + A data frame with 6636 observations on the following 14 variables. \describe{ \item{\code{Reference}}{factor} diff --git a/man/presidential.Rd b/man/presidential.Rd index def5e07d..b8bc7137 100644 --- a/man/presidential.Rd +++ b/man/presidential.Rd @@ -5,6 +5,8 @@ \alias{presidential} \title{Terms of 11 presidents from Eisenhower to Obama.} \format{ +TODO add codebook + A data frame with 11 rows and 4 variables } \usage{ diff --git a/man/prior.Rd b/man/prior.Rd index bd0b5df7..d6bf68c1 100644 --- a/man/prior.Rd +++ b/man/prior.Rd @@ -10,6 +10,8 @@ data sets. } \usage{data(prior)} \format{ +TODO add codebook + List of 2 data.frames: accuracy contains the mean and standard error of the performance measures (sqErr and accuracy), data.set.info contains meta-data about the dimension and number of positive and negative diff --git a/man/prostateLasso.Rd b/man/prostateLasso.Rd index ddcbd055..66543989 100644 --- a/man/prostateLasso.Rd +++ b/man/prostateLasso.Rd @@ -9,6 +9,8 @@ } \usage{data("prostateLasso")} \format{ + TODO add codebook + A list of 4 data.frames: path for the piecewise linear coefficient path, residuals for the prediction error of every model at every data point, models with one row per regularization parameter, error for diff --git a/man/seals.Rd b/man/seals.Rd index daa780a2..71aef846 100644 --- a/man/seals.Rd +++ b/man/seals.Rd @@ -11,6 +11,8 @@ A data frame with 1155 rows and 4 variables seals } \description{ +TODO add codebook + This vector field was produced from the data described in Brillinger, D.R., Preisler, H.K., Ager, A.A. and Kie, J.G. "An exploratory data analysis (EDA) of the paths of moving animals". J. Statistical Planning and diff --git a/man/vervet.Rd b/man/vervet.Rd index c9c8c263..2608a8eb 100644 --- a/man/vervet.Rd +++ b/man/vervet.Rd @@ -15,6 +15,8 @@ bacterium. } \usage{data(vervet)} \format{ + TODO add codebook + The format is a named list of data.frames: samples contains 64 rows with sample-specific info, counts contains 1190208 rows with counts of 55mers observed in each sample, and monkeys contains 23 rows with From b0fc1b97d037e4e63afdfa69765444bafb59912c Mon Sep 17 00:00:00 2001 From: ampurr <99299900+ampurr@users.noreply.github.com> Date: Fri, 11 Aug 2023 00:28:55 -0400 Subject: [PATCH 4/5] todo file explains the TODO nomenclature --- man/todo.md | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+) create mode 100644 man/todo.md diff --git a/man/todo.md b/man/todo.md new file mode 100644 index 00000000..7c98fca0 --- /dev/null +++ b/man/todo.md @@ -0,0 +1,53 @@ +## About + +This file explains why there are all these TODOs scattered +about some of the `.Rd` files in `man/`. There are 2 kinds +of TODOs: + +1. TODO add source +2. TODO add codebook + +The 1st appears only in 2 files: `FluView.Rd` and `intreg.Rd`. +It means that the source for the file is missing. + +The 2nd appears in all of the files that contain a TODO. +It means that an explanation for each column's variable is +missing. Since variable names are often either truncated or +not obvious, having a brief explanation of each variable +helps the reader use the dataset. + +A good example of a dataset with a good-enough codebook is +`diamonds.Rd`. It gives enough information to make the +dataset useable. Good-enough is probably good enough. I +reckon most people don't Animint2 for its datasets. + +Sorry in advance if some of this writing seems condescending. +You probably know all of this already. I write like this cuz +I forget things a lot, and documentation helps future me. + + +## Files with TODOs + +Here's the list of files that have at least one TODO in it: + +- ChromHMMiterations +- climate +- compare +- FluView +- FunctionalPruning +- generation.loci +- intreg +- malaria +- mixtureKNN +- montreal.bikes +- PeakConsistency +- pirates +- presidential +- prior +- prostateLasso +- seals +- TestROC +- UStornadoes +- VariantModels +- vervet +- WorldBank \ No newline at end of file From 15fc6a818c7dc41e574ffcaeb1e8cefeb62a11bc Mon Sep 17 00:00:00 2001 From: ampurr <99299900+ampurr@users.noreply.github.com> Date: Fri, 11 Aug 2023 16:16:44 -0400 Subject: [PATCH 5/5] minor edits to writing in todo.md --- man/todo.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/man/todo.md b/man/todo.md index 7c98fca0..e7a64348 100644 --- a/man/todo.md +++ b/man/todo.md @@ -28,7 +28,8 @@ I forget things a lot, and documentation helps future me. ## Files with TODOs -Here's the list of files that have at least one TODO in it: +Here's the list of files that have at least one TODO +in them: - ChromHMMiterations - climate