Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cody's AI debugging assignment #22

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
274 changes: 274 additions & 0 deletions Cody_ai_models.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,274 @@
---
title: "ai_model_assignment"
author: "Cody Wilkinson"
date: "`r Sys.Date()`"
output: github_document
editor_options:
markdown:
wrap: 72
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, eval = FALSE)
```

For this assignment, copy and paste both **your question to the chat
bot** as well as **the chat bot's response** below each assignment
question.

1) Ask an AI chat bot (like ChatGPT or Claude) what the following chunk
of code does:

```{r}
library(palmerpenguins)

data("penguins")

step_one <- function(x){
x <- as.character(x)
}

species_list <- unlist(lapply(penguins$species, step_one))
adelie_indices <- which(species_list == "Adelie")
adelie_penguins <- penguins[adelie_indices, ]
```

**Question posed to chat bot:** What does this R code do?

**Chat bot response:** The provided R code is intended to perform the
following actions:

Load the "palmerpenguins" library: This code begins by loading the
"palmerpenguins" library, which provides a dataset of penguin
measurements. The library is loaded using the library(palmerpenguins)
command.

Load the "penguins" dataset: The code then loads the "penguins" dataset
from the "palmerpenguins" library using the data("penguins") command.
This dataset contains various measurements and information about
different penguin species.

Define the step_one function: The step_one function is created. This
function takes a single argument, x, and its purpose is to convert the
input into character data type using as.character(x).

Apply the step_one function to the "species" column: The lapply function
is used to apply the step_one function to each element in the "species"
column of the "penguins" dataset. This essentially converts the species
names to character data type.

Create a list of species: The unlist function is then used to unlist the
result of the lapply operation, creating a single list containing the
character species names for each penguin in the dataset.

Find indices of "Adelie" penguins: The which function is employed to
identify the indices in the list of species where the species name is
"Adelie." These indices correspond to "Adelie" penguins in the dataset.

Extract "Adelie" penguins: Finally, the code extracts the "Adelie"
penguins from the original "penguins" dataset by using the identified
indices. The resulting dataset, adelie_penguins, contains only the
penguins of the "Adelie" species.

In summary, this R code is designed to extract and create a new dataset
containing penguins of the "Adelie" species from the "palmerpenguins"
dataset, while ensuring that the "species" column is treated as
character data. \*\*\*

2. Ask the AI model to simplify the code in question 1. **Question to
chat bot:** Simplify the following code in R. **Chat bot response:**

library(palmerpenguins)

data("penguins")

adelie_penguins \<- penguins[penguins\$species == "Adelie", ]

##In this simplified version, we directly subset the "penguins" dataset
to extract rows where the "species" column is equal to "Adelie" without
the need for a custom function or creating an intermediate list.

3. & 4. Is the AI model's response to Question 2 in the {tidyverse}
style (e.g., does the response use any {tidyverse} functions, the
`%>%` pipe, etc.)?

ChatGPT's response was not it the tidyverse style.

**Question to chat bot:**

is this response: library(palmerpenguins)

data("penguins")

adelie_penguins \<- penguins[penguins\$species == "Adelie", ]
in the tidyverse style?

**Chat bot response:**

The code you provided is not in the tidyverse style, but it's written in
base R. The tidyverse is a collection of R packages, including dplyr,
ggplot2, and others, that follow a consistent and unified style for data
manipulation and visualization. Here's the equivalent code in the
tidyverse style using the dplyr package:

library(palmerpenguins)

library(dplyr)

data("penguins")

adelie_penguins \<- penguins %\>% filter(species == "Adelie")

In the tidyverse style, the **`filter`** function from the dplyr package
is used for filtering rows based on specified conditions, and the pipe
operator **`%>%`** is used to chain operations together in a more
readable and expressive manner.

5. Does the code that ChatGPT put together for you work? If it doesn't,
ask the chat bot to fix it. Does it work now?

```{r}
library(palmerpenguins)

library(dplyr)

data("penguins")

adelie_penguins <- penguins %>% filter(species == "Adelie")
```

```{r}
View(adelie_penguins)
```

5. The answer works.

6. Ask the chat bot if there is a significant difference in bill length
across Adelie penguin sexes in the {palmerpenguins} data set (using
R).

**Question posed to chat bot:**

in R, is there a significant difference in bill length across Adelie
penguin sexes in the {palmerpenguins} data set

**Chat bot response:**

To determine if there is a significant difference in bill length across
Adelie penguin sexes in the **`palmerpenguins`** dataset, you can
perform a statistical test, such as a t-test, to compare the means of
the bill lengths between male and female Adelie penguins. Here's how you
can do it in R:

library(palmerpenguins)

data("penguins")

\# Filter the data to include only Adelie penguins

adelie_penguins \<- penguins[penguins\$species == "Adelie", ]

\# Subset the data for males and females

adelie_males \<- adelie_penguins[adelie_penguins\$sex == "Male", ]

adelie_females \<- adelie_penguins[adelie_penguins\$sex == "Female", ]

\# Perform a t-test to compare bill lengths between male and female
Adelie penguins

t_test_result \<- t.test(adelie_males\$bill_length_mm,
adelie_females\$bill_length_mm)

\# Print the results

print(t_test_result)

The **`t.test`** function is used to perform a two-sample t-test,
comparing the bill lengths between male and female Adelie penguins. The
result will provide a p-value, which you can use to determine whether
there is a significant difference in bill length between the two groups.
If the p-value is below a chosen significance level (e.g., 0.05), you
can conclude that there is a significant difference.

7. Does the code produced work? (*Copy and paste the code in R Studio
console to run it).* If not, what was wrong with it's answer?

```{r}
library(palmerpenguins)
data("penguins")
# Filter the data to include only Adelie penguins
adelie_penguins <- penguins[penguins$species == "Adelie", ]
# Subset the data for males and females
adelie_males <- adelie_penguins[adelie_penguins$sex == "Male", ]
adelie_females <- adelie_penguins[adelie_penguins$sex == "Female", ]
# Perform a t-test to compare bill lengths between male and female Adelie penguins
t_test_result <- t.test(adelie_males$bill_length_mm, adelie_females$bill_length_mm)
# Print the results
print(t_test_result)
```

Error in t.test.default(adelie_males\$bill_length_mm,
adelie_females\$bill_length_mm) :
not enough 'x' observations

I got this error message.

8. Did the chat bot's response include an assessment of whether the
Adelie penguins data met the assumptions of the statistical test
used in the response? (e.g., test for normality or equal variances)?
If not, pose the question again to test for the assumptions and
paste the new question/response below.

```{r}
library(palmerpenguins)
data("penguins")

# Filter the data to include only Adelie penguins
adelie_penguins <- penguins[penguins$species == "Adelie", ]

# Check the levels of the "sex" factor variable
print(levels(adelie_penguins$sex))

# Perform a t-test to compare bill lengths between male and female Adelie penguins
if ("Male" %in% levels(adelie_penguins$sex) && "Female" %in% levels(adelie_penguins$sex)) {
adelie_males <- adelie_penguins[adelie_penguins$sex == "Male", ]
adelie_females <- adelie_penguins[adelie_penguins$sex == "Female", ]

t_test_result <- t.test(as.numeric(adelie_males$bill_length_mm), as.numeric(adelie_females$bill_length_mm))
# Print the results
print(t_test_result)
} else {
cat("Both male and female categories are not present in the data.\n")
}
```

**Question to chat bot:**

Many questions. i got error messages that could not be resolved in the
chat bot.

**Chat bot response:**

```{r}
library(palmerpenguins)
library(dplyr)

data("penguins")

adelie_penguins <- penguins %>%
filter(species == "Adelie") %>%
group_by(sex) %>%
mutate(sex = as.character(sex))%>%
summarise(mean_bill_length = mean(bill_length_mm), n = n())

t_test_result <- t.test(adelie_penguins$mean_bill_length ~ adelie_penguins$sex)

# Print the results
print(adelie_penguins)
print(t_test_result)

```

I got a bunch of weird and fun responses to chatgpt trying to make this
t-test.
73 changes: 73 additions & 0 deletions lessons/Cody_debugging_assignment.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
---
title: "Code Review/Debugging Assignment"
author: Cody Wilkinson
date: "`r Sys.Date()`"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE)
```

## The Assignment:

*Tip: View this in the visual editor in RStudio*

You want to re-create the HTML document in the `lessons/` folder in this repo titled `code_debugging_output.html`. Below is the workflow to do so, ***however*** you will notice if you attempt to render this R Markdown document as is it does NOT reproduce the `code_debugging_output.html` file, and won't even render until you fix some of the mistakes/missing elements. Therefore, this workflow is not reproducible! Your task is to fix the below coding workflow so that it renders and **exactly** matches the output of `code_debugging_output.html` (*including* whether or not certain code chunks and/or outputs are shown).

There are a total of **5 fixes** you need to make to the workflow below. To complete this assignment you must ***1)*** Identify the 5 errors and specify what they are below (*4 pts each*) and ***2)*** fix the errors and render this entire .Rmd as am HTML doc where the content below matches that of the `code_debugging_output.html` (*5 pts*).

**List here the 5 errors you had to fix in this workflow:**

1. We need to correctly read in the palmerpenguins dataset. #library(palmerpenguins)

2. We need to read in the tidyverse. I also made the ggplot a bit smoother to read by adding color.

3. "Adelei" misspells "Adelie."

4. The "adelie" object does not exist because it was not created anywhere previously. I created an object called "Adelie."

5. The chunk was trying to plot over an object that did not exist. once I created it, and renamed the original thing we pipe from, it worked.

## The workflow you need to fix begins here:

In this workflow, we want to explore the Palmer Penguins data set collected at the Palmer Station LTER in Antarctica. This data is stored in the {palmerpenguins} data package.

We read in the data and initially explore its structure:

```{r}
library(palmerpenguins)
summary(penguins)
library(tidyverse)
View(penguins)
```

Let's first explore body mass among all the penguin species with a histogram:

```{r warning=FALSE}
ggplot(data = penguins)+
geom_histogram(aes(x = body_mass_g))
```

Now we want to visualize the relationship between body mass and bill length *among species*:

```{r warning=FALSE}
penguins %>%
ggplot() +
geom_point(aes(x = body_mass_g, y = bill_length_mm, color = species))
```

Finally, let's create a new data frame called `adelie` that contains data for just the *Adelie* penguin species, and then visualize the differences in that species' body mass among islands.

```{r}
Adelie <- penguins %>%
filter(species == 'Adelie')
```

```{r warning=FALSE}
Adelie %>%
ggplot() +
geom_boxplot(aes(x = island, y = body_mass_g))
```

This figure suggests *Adelie* individuals on the Dream island are slightly smaller that those on the Biscoe and Torgersen islands.
493 changes: 493 additions & 0 deletions lessons/Cody_debugging_assignment.html

Large diffs are not rendered by default.