Deploying to gh-pages from @ df55363 🚀

cambiotraining · Feb 26, 2024 · 4bb7fd1 · 4bb7fd1
commit 4bb7fd1
Show file tree

Hide file tree

Showing 32 changed files with 13,223 additions and 0 deletions.
diff --git a/.nojekyll b/.nojekyll
diff --git a/index.html b/index.html
diff --git a/materials/resampling-intro.html b/materials/resampling-intro.html
diff --git a/search.json b/search.json
@@ -0,0 +1,51 @@
+[
+  {
+    "objectID": "index.html#overview",
+    "href": "index.html#overview",
+    "title": "Resampling techniques",
+    "section": "Overview",
+    "text": "Overview\nInclude a one-paragraph summary of the course here.\n\n\n\n\n\n\nLearning Objectives\n\n\n\n\nList course learning objectives here.\nThese describe concepts the learners should grasp and techniques they should be able to use by the end of the course.\nYou can think of these as completing the phrase “after this course, the participant should be able to…”\nThey are not supposed to be as detailed as the learning objectives of each section, but more high-level.\n\n\n\n\nTarget Audience\nBrief description of target audience here.\n\n\nPrerequisites\nDetail any prerequisite skills needed to attend this course, with links to other relevant materials/courses if possible.\n\n\n\nExercises\nExercises in these materials are labelled according to their level of difficulty:\n\n\n\n\n\n\n\nLevel\nDescription\n\n\n\n\n  \nExercises in level 1 are simpler and designed to get you familiar with the concepts and syntax covered in the course.\n\n\n  \nExercises in level 2 combine different concepts together and apply it to a given task.\n\n\n  \nExercises in level 3 require going beyond the concepts and syntax introduced to solve new problems."
+  },
+  {
+    "objectID": "index.html#authors",
+    "href": "index.html#authors",
+    "title": "Resampling techniques",
+    "section": "Authors",
+    "text": "Authors\n\nAbout the authors:\n\nAlexia Cardona  \nAffiliation: Bioinformatics Training Facility, University of Cambridge\nRoles: conceptualisation\nHugo Tavares  \nAffiliation: Bioinformatics Training Facility, University of Cambridge\nRoles: writing - original draft; conceptualisation; coding\nMartin van Rongen \nAffiliation: Bioinformatics Training Facility, University of Cambridge\nRoles: writing - review & editing; conceptualisation; coding"
+  },
+  {
+    "objectID": "index.html#citation",
+    "href": "index.html#citation",
+    "title": "Resampling techniques",
+    "section": "Citation",
+    "text": "Citation\n\nPlease cite these materials if:\n\nYou adapted or used any of them in your own teaching.\nThese materials were useful for your research work. For example, you can cite us in the methods section of your paper: “We carried our analyses based on the recommendations in TODO.”.\n\nYou can cite these materials as:\n\nTODO\n\nOr in BibTeX format:\n@Misc{,\n  author = {},\n  title = {},\n  month = {},\n  year = {},\n  url = {},\n  doi = {}\n}"
+  },
+  {
+    "objectID": "index.html#acknowledgements",
+    "href": "index.html#acknowledgements",
+    "title": "Resampling techniques",
+    "section": "Acknowledgements",
+    "text": "Acknowledgements\n\n\nList any other sources of materials that were used.\nOr other people that may have advised during the material development (but are not authors)."
+  },
+  {
+    "objectID": "setup.html#data",
+    "href": "setup.html#data",
+    "title": "2  Data & Setup",
+    "section": "Data",
+    "text": "Data\nThe data used in these materials is provided as a zip file. Download and unzip the folder to your Desktop to follow along with the materials.\n\n  Download"
+  },
+  {
+    "objectID": "setup.html#software",
+    "href": "setup.html#software",
+    "title": "2  Data & Setup",
+    "section": "Software",
+    "text": "Software\n\nQuarto\nTo develop and render the course materials website, you will need to install Quarto:\n\nDownload and install Quarto (available for all major OS).\nIf you are developing materials using executable .qmd documents, it is recommended that you also install the extensions for your favourite IDE (e.g. RStudio, VS Code).\nIf you are developing materials using JupyterLab or Jupyter Notebooks, please install Jupytext.\n\nUse the paired notebook feature to have synchronised .ipynb/.qmd files. Only .qmd files should be pushed to the repository (.ipynb files have been added to .gitignore)."
+  },
+  {
+    "objectID": "materials/resampling-intro.html#background",
+    "href": "materials/resampling-intro.html#background",
+    "title": "3  Resampling techniques",
+    "section": "3.1 Background",
+    "text": "3.1 Background\nAll traditional statistical test make use of various named distributions (normally the normal distribution, lol, for parametric tests like the t-test or ANOVA) in order to work properly, or they require certain assumptions to be made about the parent distribution (such as the shape of the distribution is symmetric for non-parametric tests like Wilcoxon). If these assumptions are met then traditional statistical tests are fine, but what can we do when we can’t assume normality or if the distribution of the data is just weird?\nResampling techniques are the tools that work here. They can allow us to test hypotheses about our data using only the data itself (without appeal to any assumptions about the shape or form of the parent distribution). They are in some ways a much simpler approach to statistics, but because they rely on the ability to generate thousands and tens of thousands of random numbers very quickly, they simply weren’t considered practical back in the day. Even now, they aren’t widely used because they require the user (you, in case you’d forgotten what’s going on at this time of day) to do more than click a button on a stats package or even know the name of the test. These techniques require a mix of statistical knowledge and programming; a combination of skills that isn’t all that common! There are three broad areas of resampling methods (although they are all quite closely related):\n\nPermutation Methods\nBootstrapping\nCross-validation\n\nPermutation methods are what we will focus on in this practical and they allow us to carry out hypothesis testing.\nBootstrapping is a technique for estimating confidence intervals for parameter estimates. We effectively treat our dataset as if it was the parent distribution, draw samples from it and calculate the statistic of choice (the mean usually) using these sub-samples. If we repeat this process many times, we will eventually be able to construct a distribution for our sample statistic. This can be used to give us a confidence interval for our statistic.\nCross-validation is at the heart of modern machine learning approaches but existed long before this technique became sexy/fashionable. You divide your dataset up into two sets: a training set that you use to fit your model and a testing set that you use to evaluate your model. This allows your model accuracy to be empirically measured. There are several variants of this technique (holdout, k-fold cross validation, leave-one-out-cross-validation (LOOCV), leave-p-out-cross-validation (LpOCV) etc.), all of which do essentially the same thing; the main difference between them being a trade-off between the amount of time it takes to perform versus the reliability of the method.\nWe won’t cover bootstrapping or cross-validation in this practical but feel free to Google them."
+  }
+]