Skip to content

geoblacklight/geoblacklight_sidecar_images

Repository files navigation

GeoBlacklight Sidecar Images

CI Gem Version

Store local copies of remote imagery in GeoBlacklight.

Description

This GeoBlacklight plugin captures remote images from geographic web services and saves them locally. It borrows the concept of a SolrDocumentSidecar from Spotlight, to have an ActiveRecord-based "sidecar" to match each non-AR SolrDocument. This allows us to use ActiveStorage to attach images to our solr documents.

Example Screenshot

Screenshot

Requirements

Suggested

  • Background Job Processor

Sidekiq is an excellent choice if you need an opinion.

Installation

Existing GeoBlacklight Instance

GeoBlacklight v4 with Aardvark metadata / Add the gem to your Gemfile.

gem "geoblacklight_sidecar_images", "~> 1.0"

GeoBlacklight v3 with GBL v1.0 metadata / Add the gem to your Gemfile.

gem "geoblacklight_sidecar_images", "~> 0.9.1", "< 1.0"

Run the generator.

$ bin/rails generate geoblacklight_sidecar_images:install

Run the database migration.

$ bin/rails db:migrate

Complete any necessary Active Storage setup steps, for example:

  1. Add a config/storage.yml file
local:
  service: Disk
  root: <%= Rails.root.join("storage") %>
  1. Add config/environments declarations, development.rb for example:
# Store uploaded files on the local file system (see config/storage.yml for options)
config.active_storage.service = :local

New GeoBlacklight Instance

Create a new GeoBlacklight instance with the GBLSI code

$ rails new app-name -m https://raw.githubusercontent.com/geoblacklight/geoblacklight_sidecar_images/develop/template.rb

Ingest Test Documents

  # Run your GBL instance
  bundle exec rake geoblacklight:server
  # Index the GBL test fixtures
bundle exec rake gblsci:sample_data:seed

Rake tasks

Harvest images

Harvest all images

Spawns background jobs to harvest images for all documents in your Solr index.

bundle exec rake gblsci:images:harvest_all

Harvest an individual image

Allows you to add images one document id at a time. Pass a DOC_ID env var.

DOC_ID='stanford-cz128vq0535' bundle exec rake gblsci:images:harvest_doc_id

Harvest all incomplete states

Reattempt image harvesting for all non-successful state objects.

bundle exec rake gblsci:images:harvest_retry

Check image states

bundle exec rake gblsci:images:harvest_states

We use a state machine library to track success/failure of our harvest tasks. The states we track are:

  • initialized - SolrDocumentSidecar created, no harvest attempt run
  • queued - Harvest attempt queued as background job
  • processing - Harvest attempt at work
  • succeeded - Harvest was successful, image attached
  • failed - Harvest failed, no image attached, error logged
  • placeheld - Harvest was not successful, placeholder imagery will be used
SolrDocumentSidecar.in_state(:succeeded) => [#<SolrDocumentSidecar:0x0000000170697960 ... ]
SolrDocumentSidecar.image.attached? => false
SolrDocumentSidecar.image_state.current_state => "placeheld"
SolrDocumentSidecar.image_state.last_transition => #<SidecarImageTransition id: 207, to_state: "placeheld", metadata: {"solr_doc_id"=>"stanford-cg357zz0321", "solr_version"=>1616509329754554368, "placeheld"=>true, "viewer_protocol"=>"wms", "image_url"=>"http://geowebservices-restricted.stanford.edu/geoserver/wms/reflect?&FORMAT=image%2Fpng&TRANSPARENT=TRUE&LAYERS=druid:cg357zz0321&WIDTH=300&HEIGHT=300", "service_url"=>"http://geowebservices-restricted.stanford.edu/geoserver/wms/reflect?&FORMAT=image%2Fpng&TRANSPARENT=TRUE&LAYERS=druid:cg357zz0321&WIDTH=300&HEIGHT=300", "gblsi_thumbnail_uri"=>false, "error"=>"Faraday::Error::ConnectionFailed"},...>

Destroy images

Remove everything

Remove all sidecar objects and attached images

bundle exec rake gblsci:images:harvest_purge_all

Remove orphaned AR objects

Remove all sidecar objects and attached images for AR objects without a corresponding Solr document

bundle exec rake gblsci:images:harvest_purge_orphans

Remove a batch

Remove sidecar objects and attached images via a CSV file of document ids

bundle exec rake gblsci:images:harvest_destroy_batch

Troubleshooting

Harvest report

Generate a CSV file of sidecar objects and associated image state. Useful for debugging problem items.

bundle exec rake gblsci:images:harvest_report

Failed state inspect

Prints details for failed state harvest objects to stdout

bundle exec rake gblsci:images:harvest_failed_state_inspect

Prioritize Solr Thumbnail Field URIs

If you add a thumbnail uri to your geoblacklight solr documents...

Example Doc

{
  ...
  "dc_format_s":"TIFF",
  "dc_creator_sm":["Minnesota. Department of Highways."],
  "thumbnail_path_ss":"https://umedia.lib.umn.edu/sites/default/files/imagecache/square300/reference/562/image/jpeg/1089695.jpg",
  "dc_type_s":"Still image",
  ...
}

Then you can edit your GeoBlacklight settings.yml file to point at that solr field (Settings.GBLSI_THUMBNAIL_FIELD). Any docs in your index that have a value for that field will harvest the image at that URI instead of trying to retrieve an image via IIIF or the other web services.

View customization

Use basic Active Storage patterns to display imagery in your application.

Example Methods

# Is there an image?
document.sidecar.image.attached?

# Can the image size be manipulated?
document.sidecar.image.variable?

# Example image_tag with resize
<%= image_tag document.sidecar.image.variant(resize_to_fit: [100, 100]), {class: 'media-object'} %>

Search results

This GBL plugin includes a custom catalog/_index_split_default.html.erb file. Look there for examples on calling the image method.

Show pages

Example for adding a thumbnail to the show page sidebar.

catalog/_show_sidebar.html.erb

# Add to end of file
<% if @document.sidecar.image.attached? %>
  <% if @document.sidecar.image.variable? %>
    <div class="card">
      <div class="card-header">Thumbnail</div>
      <div class="card-body">
        <%= image_tag @document.sidecar.image.variant(resize_to_fit: [200, 200]), {class: 'mr-3'} %>
      </div>
    </div>
  <% end %>
<% end %>

Development

# Run test suite
bundle exec rake ci

# Launch test app server
cd .internal_test_app/
bundle exec rake geoblacklight:server

# Load test fixtures
bundle exec rake gblsci:sample_data:seed

# Run harvest
bundle exec rake gblsci:images:harvest_all

# Tail image service log file
tail -f log/image_service_development.log

See Localhost Results