Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip for better csw-dcat output for data.gouv.fr #288

Draft
wants to merge 5 commits into
base: georchestra-gn4.2.x
Choose a base branch
from

Conversation

landryb
Copy link
Member

@landryb landryb commented Feb 2, 2024

cf georchestra/georchestra#4182, PR mostly for diff readability of https://github.com/georchestra/georchestra/files/14127940/tpl-rdf.xsl.txt, all work from @jeanpommier

geOrchestra/geonetwork checklist

  • PR only involves cherry-picked commits from upstream.
  • PR contains custom code which will soon be available in an upstream release and can be overriden => mention core-geonetwork version if possible.
  • PR contains custom geOrchestra code, which need to be verified during future migrations.

<!-- SIB addon-->
<xsl:if test="(gmd:description/gco:CharacterString)[1]!=''">
<dct:description>
<xsl:value-of select="(gmd:description/gco:CharacterString)[1]"/>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for $self: this is to solve Description des données non renseignée on harvested data, which comes from the attached resource description


<!-- SIB addon-->
<!-- <dcat:Dataset rdf:about="{$resourcePrefix}/datasets/{iso19139:getResourceCode(../../.)}"> -->
<dcat:Dataset rdf:about="{$resourcePrefix}/{iso19139:getResourceCode(../../.)}">
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/datasets doesnt map to anything in geonetwork under resources?

group-by="gco:CharacterString|gmx:Anchor">
<xsl:text>&#xa;</xsl:text> <!-- linebreak -->
<xsl:value-of select="normalize-space(../gmd:thesaurusName/*/gmd:title)"/> : <xsl:value-of select="normalize-space(gco:CharacterString|gmx:Anchor)"/><xsl:text>.</xsl:text>
</xsl:for-each-group>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after discussion with @jeanpommier this section might not be needed ?

@landryb
Copy link
Member Author

landryb commented Mar 19, 2024

the complete section about legalConstraints should be checked with @MaelREBOUX to know if we use the right method to fill the licensing info

@landryb
Copy link
Member Author

landryb commented Jun 13, 2024

so regarding licences mapping, i've did a bunch more tests, and with what we have now in the PR, if we have this for a md:

<gmd:MD_LegalConstraints>
  <gmd:accessConstraints>
    <gmd:MD_RestrictionCode codeList="...#MD_RestrictionCode" codeListValue="otherRestrictions"/>
  </gmd:accessConstraints>
  <gmd:otherConstraints>
    <gco:CharacterString>LO/OL</gco:CharacterString>
  </gmd:otherConstraints>

we do end up with <dct:license>LO/OL</dct:license> in the dcat output, and if we have

<gmd:MD_LegalConstraints>
  <gmd:useConstraints>
    <gmd:MD_RestrictionCode codeList="...#MD_RestrictionCode" codeListValue="otherRestrictions"/>
  </gmd:useConstraints>
  <gmd:otherConstraints>
    <gco:CharacterString>LO/OL</gco:CharacterString>
  </gmd:otherConstraints>

we end up with <dct:accessRights>LO/OL</dct:accessRights> in the dcat output. We can of course use both accessConstraints & useConstraints to get dct:accessRights and dct:licence but from https://doc.data.gouv.fr/moissonnage/dcat/#jeu-de-donn%C3%A9es i think dct:licence matters first for udata.

im not the specialist to judge which one makes sense, but what matters is from my reading of https://guides.data.gouv.fr/guide-data.gouv.fr/moissonnage/les-differents-types-de-moissonneurs#detection-des-licences-par-le-moissonnage if we have a value (LO/OL here) that matches one of the id/title/url & alternates from https://www.data.gouv.fr/api/1/datasets/licenses/, then udata will recognize it.

see for example: https://demo.georchestra.org/datahub/dataset/faa240f5-90a8-4e1f-b8da-39c913bed9df
as dcat: https://demo.georchestra.org/geonetwork/csw-dcat-singlemd/eng/csw?SERVICE=CSW&VERSION=2.0.2&REQUEST=GetRecordById&outputSchema=http://www.w3.org/ns/dcat%23&id=faa240f5-90a8-4e1f-b8da-39c913bed9df
and finally, as harvested by udata: https://demo.data.gouv.fr/fr/datasets/fiche-parfaite-pour-data-gouv/ - note that the licence is properly recognized as Licence Ouverte / Open Licence

as for the xsl voodoo, i don't really understand the indirection done through otherRestrictions/otherConstraints, it's only to be able to have the value coming from a free-form field instead of a select box in geonetwork ?

eg if i select licence for access constraints i have this in the ISO:

<gmd:accessConstraints>
  <gmd:MD_RestrictionCode codeList="...#MD_RestrictionCode" codeListValue="license"/>

which traduces to an empty <dct:accessRights/> as the MD_RestrictionCode has no child in the XML.

@landryb
Copy link
Member Author

landryb commented Jun 26, 2024

to note for the license mapping, ecolab has this: ecolabdata/ecospheres-core-geonetwork@6724d09

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants