Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

research documents for rMarkdown #111

Merged
merged 10 commits into from
Mar 1, 2021
Merged

research documents for rMarkdown #111

merged 10 commits into from
Mar 1, 2021

Conversation

fgregg
Copy link
Member

@fgregg fgregg commented Sep 8, 2020

Overview

This PR will contains research documents for rMarkdown

Handles #21

Testing Instructions

  • Read the documents and evaluate?

@fgregg fgregg marked this pull request as draft September 8, 2020 01:17
@fgregg fgregg changed the title [WIP] research documents for rMarkdown research documents for rMarkdown Sep 26, 2020
@fgregg fgregg marked this pull request as ready for review September 26, 2020 00:39
@fgregg
Copy link
Member Author

fgregg commented Sep 26, 2020

@jeancochrane @hancush, i think this is ready for review and discussion.

Copy link
Contributor

@jeancochrane jeancochrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These docs make a strong case for RMarkdown over our existing toolkit for data analysis. Thanks for sharing!

Reading through the docs, it strikes me that the relevant change recommended here isn't exclusively swapping RMarkdown in for Pweave but rather changing up our whole workflow and tooling for reproducible data analysis. I think that effort makes sense, but it has two implications for me:

  1. We'll need to plan to do a major edit of https://github.com/datamade/data-analysis-guidelines and bring it into this repo, ideally with some templates (I hear Courts has some cookiecutter templates already?)
  2. It would be more appropriate for these docs to live in whatever subdirectory the data analysis docs do, e.g. something like data-analysis/ instead of rmarkdown/

We can do 2 immediately in this PR but I think 1 will be a big task that could take several cycles to accomplish. I expect it'll be hard for you to execute alone given your capacity constraints. As part of pulling this in, we should make a plan for how that work is going to be tracked and delegated, since updated documentation is going to be key to the success of the adoption of this workflow.

rmarkdown/research/comparisons-with-existing-tools.md Outdated Show resolved Hide resolved
rmarkdown/research/comparisons-with-existing-tools.md Outdated Show resolved Hide resolved
Comment on lines 21 to 29
rMarkdown has better editor support than Pweave. For the following editors, rMarkdown is as good and usually better
than support for Pweave, if there any Pweave support exists.

* [sublime](https://packagecontrol.io/packages/knitr)
* [emacs](https://ess.r-project.org/)
* [atom](http://www.goring.org/resources/atom_and_r.html)
* [vscode](https://marketplace.visualstudio.com/items?itemName=Ikuyadeu.r)

rMarkdown also has its own IDE, [RStudio](https://rstudio.com/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to hear that RMarkdown has such wide support. Our existing data analysis guidelines make a strong recommendation on which editor to use, though, and I've heard @hancush express the belief that RStudio is really good and we should recommend it. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this recommendation recommendation doc.

rmarkdown/research/comparisons-with-existing-tools.md Outdated Show resolved Hide resolved

## Proof of concept and pilot

RMarkdown has been the tool of choice for authoring reports in the Courts project. DataMade staff familiar with Pweave have picked it up quickly and journalists without a deep background in programming have also been able to use it successfully (within the RStudio environment).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great if we could link out to the relevant project here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are not going to be accessible to all staff, let alone public folks. unfortunately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pilots are useful in evaluating the tool, as well as for providing an example for future use. If we can't link to the project, could we host a clone of the cookiecutter as a basis for future analysis? It'd be ideal to add that in this repository, under docker/templates/r-markdown or something like that.


RMarkdown's interleaving of text and code adds another layer to interact with code. As such, we advise that staff not be introduced to RMarkdown until they are familiar with the programming language they will be using in the report. If the report will depend on SQL code, the developer should be familiar with how write and debug SQL code in the terminal or by writing SQL scripts.

If something is not working within a RMarkdown file, it's very useful to be able to work on the code in familiar environment in order to narrow the possible considerations while debugging.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, can you drop a debugger in a Python block in an RMarkdown file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, not really.

@fgregg
Copy link
Member Author

fgregg commented Dec 2, 2020

These docs make a strong case for RMarkdown over our existing toolkit for data analysis. Thanks for sharing!

Reading through the docs, it strikes me that the relevant change recommended here isn't exclusively swapping RMarkdown in for Pweave but rather changing up our whole workflow and tooling for reproducible data analysis. I think that effort makes sense, but it has two implications for me:

1. We'll need to plan to do a major edit of https://github.com/datamade/data-analysis-guidelines and bring it into this repo, ideally with some templates (I hear Courts has some cookiecutter templates already?)

2. It would be more appropriate for these docs to live in whatever subdirectory the data analysis docs do, e.g. something like `data-analysis/` instead of `rmarkdown/`

We can do 2 immediately in this PR but I think 1 will be a big task that could take several cycles to accomplish. I expect it'll be hard for you to execute alone given your capacity constraints. As part of pulling this in, we should make a plan for how that work is going to be tracked and delegated, since updated documentation is going to be key to the success of the adoption of this workflow.

I think this makes sense, I propose that we bring in this PR (once I resolve some of the inline comments) and then I can open an an issue on https://github.com/datamade/data-analysis-guidelines to track the changes that need to be made there.

how does that sound @hancush ?

Copy link
Member

@hancush hancush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, @fgregg. A couple of broad comments:

  • Is it rMarkdown or RMarkdown? In either case, could you standardize throughout?
  • We aren't consistent about organizing docs, e.g., some top-level directories are about tools, while others are about topic areas. As we expand this repository, I'm starting to prefer the topic area approach. Would you mind rehoming this, as Jean suggested, to a data-analysis directory?

Re: our existing docs (and related to a data-analysis directory), since the majority of our data analysis docs pertain to Pweave and because that repo hasn't really grown legs in the same way data making has, I think I'd prefer to archive that repo with a pointer to how-to and add documentation on our revised practices here. What do you think?

@@ -0,0 +1,54 @@
# Comparing rMarkdown with existing tools

How does rMarkdown compare with existing tools in DataMade's stack or possible alternatives.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
How does rMarkdown compare with existing tools in DataMade's stack or possible alternatives.
How does rMarkdown compare with existing tools in DataMade's stack or possible alternatives?


The main advantage of Pweave is that it is Python.

While rMarkdown does allow for Python code chunks, there is typically some setup code and that does need to be done in R. With Pweave, it's all Python.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you qualify the R setup a bit more, e.g., include a code block with an example setup? IIRC, it's pretty minimal, and an example could help to illuminate that.


While rMarkdown does allow for Python code chunks, there is typically some setup code and that does need to be done in R. With Pweave, it's all Python.

That is really the only advantage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
That is really the only advantage.
That is really the only advantage of Pweave.

Like rMarkdown, Pweave requires an additional runtime beyond standard Python. rMarkdown requires R and Pweave requires
[IPython](https://ipython.org/).

Pweave is not actively maintained, and has not been updated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a link to the repo here?

in three years.

rMarkdown has better editor support than Pweave. For the following editors, rMarkdown is as good and usually better
than support for Pweave, if there any Pweave support exists.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
than support for Pweave, if there any Pweave support exists.
than support for Pweave, if any Pweave support exists.


rMarkdown also has its own IDE, [RStudio](https://rstudio.com/)

Beyond active devlopment and editor support, Pweave is missing many features compared to rMarkdown. Of greatest consequence are 1. chunk specific caching and support for 2. multiple languages, particularly SQL.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Beyond active devlopment and editor support, Pweave is missing many features compared to rMarkdown. Of greatest consequence are 1. chunk specific caching and support for 2. multiple languages, particularly SQL.
Beyond active development and editor support, Pweave is missing many features compared to rMarkdown. Of greatest consequence are chunk specific caching and support for multiple languages, particularly SQL.

Comment on lines 5 to 7
1. The report is for a client
2. When the report contains graphs or statistics.
3. When we use code to generate the graphs or statistics. If we are doing an quick analysis in Excel, because that is what a client needs, then a literate research report would not be useful approach.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. The report is for a client
2. When the report contains graphs or statistics.
3. When we use code to generate the graphs or statistics. If we are doing an quick analysis in Excel, because that is what a client needs, then a literate research report would not be useful approach.
1. The report is for a client.
2. The report contains graphs or statistics.
3. We use code to generate the graphs or statistics. If we are doing a quick analysis in Excel, because that is what a client needs, then a literate research report would not be useful approach.


## Proof of concept and pilot

RMarkdown has been the tool of choice for authoring reports in the Courts project. DataMade staff familiar with Pweave have picked it up quickly and journalists without a deep background in programming have also been able to use it successfully (within the RStudio environment).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pilots are useful in evaluating the tool, as well as for providing an example for future use. If we can't link to the project, could we host a clone of the cookiecutter as a basis for future analysis? It'd be ideal to add that in this repository, under docker/templates/r-markdown or something like that.

@hancush
Copy link
Member

hancush commented Dec 3, 2020

Follow up Q: Does RMarkdown obviate the need for us to learn LaTeX????

@hancush
Copy link
Member

hancush commented Dec 7, 2020

Update: We chatted out loud at R&D. We are going to archive the data analysis guidelines and maintain our revised docs, including these artifacts, in how-to/data-analysis. So, action items:

@fgregg
Copy link
Member Author

fgregg commented Feb 1, 2021

I think this responds to your requested changes?

@fgregg fgregg requested a review from hancush February 1, 2021 17:08
@hancush
Copy link
Member

hancush commented Feb 11, 2021

It's a new day at DataMade! Thank you, @fgregg.

@hancush hancush merged commit a104298 into master Mar 1, 2021
@hancush hancush deleted the rmarkdown branch July 14, 2021 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants