Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

finalize data model #1

Open
egonw opened this issue Jan 29, 2021 · 9 comments
Open

finalize data model #1

egonw opened this issue Jan 29, 2021 · 9 comments

Comments

@egonw
Copy link
Member

egonw commented Jan 29, 2021

At this moment I do not have clear yet what the data model is going to be. These things come to mind at this moment:

  • died identifier
  • when the identifier died
  • type of identifier: dead, zombie, ghost
  • next of kin
@Chris-Evelo
Copy link

I think you would also want to know why the identifier died. Things like "gene turns out not to be coding" or "green turns out to be two separate genes" are really different things. (And in both cases we might actually want to keep a record even though the identifier was removed from the most recent database since you could still do things like: "map variants mapped to the now longer gene area to the chromosome location", decide that gene expression for a cluster should belong to one of two genes and possibly you could decide which one based on reporter sequence)

@egonw
Copy link
Member Author

egonw commented Feb 1, 2021

Yes, indeed. Once we have one or more data sources that mention this, we can start modelling this. WikiPathways has similar things, like "merged content into". Ensembl does not seem to provide this information.

@cthoyt
Copy link
Collaborator

cthoyt commented Oct 24, 2022

Would be nice to add the curator ORCID identifier to keep track of attributions (or when content is imported, use like the wikidata identifier for the database itself)

@matentzn
Copy link

@egonw can I take a look at the current overall data model? This seems to be all very related to ontology-metadata and mapping metadata efforts we are trying to reconcile across the board..

@cthoyt
Copy link
Collaborator

cthoyt commented Oct 25, 2022

@matentzn the data model is currently 3 columns in a TSV file - see https://github.com/bridgedb/tiwid/blob/main/data/hgnc.symbol.tsv as an example

@matentzn
Copy link

Thank you @cthoyt! I thought this was related to the more broad birdgedb data models, not just tiwid, but makes sense now.

@egonw
Copy link
Member Author

egonw commented Oct 26, 2022

@matentzn, let's start a discussion here: bridgedb/BridgeDb#216

@egonw
Copy link
Member Author

egonw commented Oct 26, 2022

Oh, and with SSSOM in mind, this format may have mappings but these are optional: there is no clear mapping necessarily.

@matentzn
Copy link

Ok, got it :) I remember looking at bridgedb in the past, and talking to @Chris-Evelo about adopting sssom for any mappings, but I did not really follow up with that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants