Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RMarkdown for literate analyses #21

Closed
fgregg opened this issue Jun 26, 2019 · 11 comments
Closed

RMarkdown for literate analyses #21

fgregg opened this issue Jun 26, 2019 · 11 comments
Assignees
Labels

Comments

@fgregg
Copy link
Member

fgregg commented Jun 26, 2019

Currently, we used PWeave for literate analyses. I would like to explore using RMarkdown instead.

Here's are advantages of RMarkdown:

  1. Pretty good code caching. One recurrent pain point in working with pweave is that every time you want to update the results from one code block, all code block were rerun. For longer analyses with expensive queries, this could lengthen feedback cycles to many minutes
  2. Very good editor support. RMarkdown is much, much more popular than PWeave so text editors have much better support for it: sublime, EMACS, RStudio to name a few
  3. Generally much better supported and widely used. RMarkdown is officially supported by RStudio which is a big R company (RMarkdown is to the R ecosystem as Jupyter notebooks are to Python)
  4. Easy to switch to other markdown authoring modes, like latex.

Disadvantages of RMarkdown

  1. It's in R, which is not part of our current stack.

Actually, that's this only disadvantage versus PWeave I can think of. It's a big one though.

Some amelioration of this disadvantage.

  1. You can actually write python (or event other languages) in the code blocks). You still need some R to get things off the ground, but it's pretty minimal. Code caching only partially works with non-R blocks. (cache engine for knitr rstudio/reticulate#167)
  2. We are not in love with pandas as data analysis option and have been considering R as a replacement.
@jeancochrane
Copy link
Contributor

Looks good to me! Some things I'll be interested in hearing more about once you wrap up your R&D:

  • What's the setup for previews like? Do you get live reloading? Is it tightly coupled to your editor? If so, what are the implications for our stack recommendation? (E.g. could we provide a containerized template for bootstrapping an RMarkdown analysis repo, or will we have to just maintain detailed setup docs for the different editors that we use?
    • If RMarkdown requires an editor integration, are all the editors we use well-supported? By my count we currently use these editors on our team:
      • Sublime
      • Atom
      • VSCode
      • Emacs
      • Any others?
  • Are there services we can use to easily share dynamic RMarkdown documents (internally or externally), or does the reader have to either A) have an RMarkdown environment locally or B) have to read a static document, like a PDF?
  • You mentioned that RMarkdwon is the Jupyter Notebook of the R ecosystem -- what's the advantage of RMarkdown over Jupyter Notebooks? Should we investigate Jupyter Notebooks too, or are the wins of R over Pandas so great that it's worth ignoring it?

Excited to see where this goes!

@fgregg
Copy link
Member Author

fgregg commented Jun 27, 2019

  1. you can manually recompile or setup up a service to watch and recompile
  2. RStudio does provide some nice facilities for working with RMarkdown, but my experience working with it in emacs is that it's better than pweave
  3. We could definitely containerize, including the watch and compile bit
  4. I can't speak to the quality of sublime, atom, and vscode but the emacs support is excellent. There seem to be actively developed support for sublime (link in original, and vs code and atom: https://github.com/REditorSupport)
  5. there's no current equivalent service to Jupyter Notebooks Online (that I know of). You can compile the Rmd to a markdown file and host that on github or gitlab
  6. I prefer both Pweave and RMarkdown to jupyter notebooks for two main reasons. First, the "weave" paradigm is for generating reports and we have successfully used it to generate beautiful documents to share with clients. This is not the use case for jupyter notebooks, which are intended to be exploratory notebooks. Second, the "weave" paradigm files are simple text files and so you git good "git diffs" and reviewable changes. Jupyter notebook format is almost a binary format and don't get easy diffs.

@fgregg fgregg changed the title [WIP] RMarkdown for literate analyses RMarkdown for literate analyses Jun 27, 2019
@hancush
Copy link
Member

hancush commented Jun 28, 2019

appreciate you taking this on, @fgregg, and thanks for teeing off the discussion of r markdown v. jupyter, @jeancochrane.

to me, the very worst thing about pweave is crappy editor support. i'd be very happy if this change made writing analysis code easier, even if we miss out on caching during compilation by writing python code.

so, happy for you to proceed!

@hancush
Copy link
Member

hancush commented Jun 28, 2019

addendum: please include a discussion of any idiosyncrasies of writing python code in r markdown.

@jeancochrane
Copy link
Contributor

How's this coming along @fgregg?

@hancush
Copy link
Member

hancush commented Sep 16, 2019

in use for the courts project by datamade, as well as journalists on the project.

@hancush
Copy link
Member

hancush commented Jun 2, 2020

I've started using R Markdown for the courts project. I polled some R users in my network and both recommended R Studio. So far, it seems pretty straightforward. My favorite feature is that you can write SQL blocks, and they're cached with no problems. I've only just begun working with ggplot for charting, but it also seems more sensical and aesthetically nice out of the box than matplotlib.

@hancush
Copy link
Member

hancush commented Jun 2, 2020

Also watched this video on using the shiny runtime for interactive R markdown documents. Really exciting stuff, in particular for EDA we want to share with clients/elsewhere, or even for mocking up applications.

@fgregg
Copy link
Member Author

fgregg commented Aug 1, 2020

Hi @jeancochrane and @hancush,

I'm a little bit unsure what the next step of this is. I've done a few projects with rMarkdown which I could share.

Reviewing CONTRIBUTING.md Is the next step to do a write-up? To add reflections on this issue?

Thanks so much for your guidance.

@hancush
Copy link
Member

hancush commented Aug 3, 2020

@fgregg Looks like you've finished steps 1, 2, and 5. At this point, I'd like to see a comparison to existing tools (you can port the answers you've already provided in this issue, to a Markdown document), a recommendation of adoption, and a summary of helpful tips and resources for learning, submitted as a PR against this repo. (This would be a combination of steps 3 and 6.)

@jeancochrane's adoption artifacts on Gatsby are a great example of all of these documents: #12. The discussion on their PR also provides a good example of the kinds of questions we expect to answer in a stack change of this kind.

@fgregg
Copy link
Member Author

fgregg commented Sep 8, 2020

I created the comparison to existing tools doc, I still have to write the recommendation for adoption. #111

@hancush hancush closed this as completed Mar 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants