Skip to content

Configuration: Basic Indexing of transcripts in Solr

kim pham edited this page May 30, 2017 · 8 revisions

Whether you are using transcript XML or WebVTT for transcripts, you must set up Solr appropriately. The following discussion provides steps for basic indexing of the transcripts in Solr and is meant as a guide. The example paths are based off of those found in the development VM created by islandora_vagrant. Paths on your system may vary depending on your Islandora environment. As Islandora setups vary, please seek the assistance of your local system administrator and Solr expert.

  1. Copy the XSLT file xsl/or_transcript_solr.xslt provided with Islandora Solution Pack Oral Histories into the proper location:
 > cd /var/lib/tomcat7/webapps/fedoragsearch/WEB-INF/classes/fgsconfigFinal/index/FgsIndex/islandora_transforms
 > cp /var/www/drupal/sites/all/modules/islandora_solution_pack_oralhistories/xsl/or_transcript_solr.xslt ./
 > cp /var/www/drupal/sites/all/modules/islandora_solution_pack_oralhistories/xsl/vtt_solr.xslt ./
  1. Edit foxmlToSolr.xslt, located in /var/lib/tomcat7/webapps/fedoragsearch/WEB-INF/classes/fgsconfigFinal/index/FgsIndex/. Add the following lines at the end of the other <xsl:include/> elements:
<!-- begin: islandora_solution_pack_oralhistories setup -->
<xsl:include href="/var/lib/tomcat7/webapps/fedoragsearch/WEB-INF/classes/fgsconfigFinal/index/FgsIndex/islandora_transforms/or_transcript_solr.xslt"/>
<xsl:include href="/var/lib/tomcat7/webapps/fedoragsearch/WEB-INF/classes/fgsconfigFinal/index/FgsIndex/islandora_transforms/vtt_solr.xslt"/>
<!-- end: islandora_solution_pack_oralhistories setup -->
  1. You must also add a test to foxmlToSolr.xslt to ensure that TRANSCRIPT is not edited when the mimetype is vtt. The test is:
<!-- begin: islandora_solution_pack_oralhistories setup -->
<xsl:when test="@CONTROL_GROUP='M' and @ID='TRANSCRIPT' and foxml:datastreamVersion[last()][@MIMETYPE='text/vtt']">  </xsl:when>
<!-- end: islandora_solution_pack_oralhistories setup -->

Locate the test below, and place this new test immediately above the following existing test:

<xsl:when test="@CONTROL_GROUP='M' and foxml:datastreamVersion[last() and not(starts-with(@MIMETYPE, 'image') or starts-with(@MIMETYPE, 'audio') or starts-with(@MIMETYPE, 'video') or @MIMETYPE = 'application/pdf')]">
<!-- TODO: should do something about mime type filtering text/plain should use the getDatastreamText extension because document will only work for xml docs xml files should use the document function other mimetypes should not be being sent will this let us not use the content variable? -->

<xsl:apply-templates select="foxml:datastreamVersion[last()]">

<xsl:with-param name="content" select="java:ca.discoverygarden.gsearch_extensions.XMLStringUtils.escapeForXML(normalize-space(exts:getDatastreamText($PID, $REPOSITORYNAME, @ID, $FEDORASOAP, $FEDORAUSER, $FEDORAPASS, $TRUSTSTOREPATH, $TRUSTSTOREPASS)))"/>

</xsl:apply-templates>
</xsl:choose>
</xsl:when>

after this, you can safely save and exit the file.

  1. Navigate to /usr/local/solr/collection1/conf/schema.xml and add the following dynamicField at the end of the other dynamicField elements:
<!-- begin: islandora_solution_pack_oralhistories setup -->
<dynamicField name="or_*" type="text" indexed="true" stored="true" multiValued="true"/>
<!-- end: islandora_solution_pack_oralhistories setup -->
  1. Restart tomcat
> sudo service tomcat7 stop
> sudo service tomcat7 start

Note that you will still need to configure the Islandora Solr module to search new indexed fields.