Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scientific names are built upon GBIF codes, but they don't always match the Soort column #207

Open
PietrH opened this issue Aug 19, 2024 · 2 comments
Labels

Comments

@PietrH
Copy link
Member

PietrH commented Aug 19, 2024

Currently, the species in the darwincore occurrence output is determined based on the GBIF_Code column in the raw data, not on the Soort column.

However, if we look up the values for the Soort column for Rattus norvegicus, we also get rabbits and chickens, which makes me think that perhaps this field isn't 100% reliable.

filter(raw_data, GBIF_Code == 2439261) %>% count(Soort, sort = TRUE)
Soort n
Bruine rat bak/buis 87074
Kippen 8
Andere (soort vermelden): 3
Konijnen 1
Steenmarter 1

However, the Soort column isn't necessarily more reliable either, especially the "Other" field.

@damianooldoni I remember you mentioning you had trouble with this in the past, do you maybe have a bit more context? What do you think we should do?

@damianooldoni
Copy link
Contributor

As mentioned in #24 I would map them manually. In other words, I would not use the GBIF codes they provide. They started to add it by a request of @timadriaens some years ago. The reason was that their data were at that time not on GBIF yet and by having a backbone GBIF taxonKey was practical to use RATO data and join them with other data sources.

@timadriaens
Copy link

indeed, manual seems the way to go, they do not have the technical and species knowledge to always be aware of the correct codes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants