Skip to content
Nick Ruest edited this page Mar 30, 2016 · 9 revisions

Time/Place

This meeting is a hybrid teleconference and IRC chat. Anyone is welcome to join. Here is the info:

Attendees

  • Nick Ruest
  • Jared Whiklo
  • Diego Pino
  • Aaron Coburn
  • Melissa Anez ✨
  • Ed Fugikawa

Agenda

  1. Fedora Messaging SPI. See also sample event ontology
  2. Project Plan - list of minimum functional requirements for a first release would be a big help
  3. Recent PCDM threads
  4. CollectionService should allow you to get a collection
  5. PHP7, HHVM, high performance PHP.
  6. ... (feel free to add agenda items)

Minutes

irc log

Fedora Messaging SPI:

Aaron: Ongoing effort in Fedora community to create specifications for different components, including messaging (also CRUD, versioning, etc). For messaging SPI, there's an implementation with JMS, but we're talking about something a layer down. A couple of issues: 1. The way messages are structured, we're using JMS and they;'re header-only. Not every messaging protocol supports headers. 2. In those headers, we have values that look like they are from Fedora ontology, but they are not from there. So - we need clean up on what these messages need to be and how they are formatted. How to define the packaging of the data so it's generalizable. One way to serialize the data is RDF. A. Soroka and Aaron have been talking about how to change this messaging so the data is contained int he body, not the header. Suspect that any actual implementation should use JSON-LD. Since CLAW group is the biggest user of messages, this question comes here.

Diego: Having it in body is good idea, JSON-LD is even better. Concerned about using external source because without caching it will hit the Fedora side for each message. Do we have a way of discerning what type of events we're getting without parsing ht full JSON-LD?

Aaron: Not a fan of RDF. Fine for modelling objects, but for messages, not sure of any RDF library that is built into a messaging system. So yes it's RDF (JSON-LD) so if you need the RDF you can get it - but it will just be simple JSON. Would like this to be how all of the toolbox pieces work. Whether the context file is provided inline or as http, it could go either way. Expects the clients of the messages would not be dereferencing the full JSON-LD.

Diego: We have been talking on the CLAW side about moving to JSON-LD as much as possible. Not a huge fan of RDF either.

Aaron: Has written a sample ontology for this (see agenda link) - which is already set to change. Many of the properties are already defined by PREMIS, so we won't duplicate. So the only properties we'll define are repository root and resource path (ideally) Looking at switching from properties header to resources header which should make current code easier.

Diego: Does this include user-provided RDF types?

Aaron: Yes. If you have a PCDM type, for instance, the message will indicate that. The structure of the body of the message would contain a field called "type" containing one or multiple types, and that will contain the message types. Also, a field like "resource type" which will contain the types of resources that was the target of the event.

Diego: Is there a chance that we could have the transaction ID in the message body?

Aaron: For transaction ID, probably not. But if you are modifying the repo in the context of a transaction, the messages won't be emitted until after the transaction is completed. So the best way to do that would be to tag the client with some special string and filtering on the user agent as it comes through in a message.

Jared: Was there any discussion about having the object embedded in the messaging? Clearly not part of the API, but has there been any thought about whether it would be possible?

Aaron: Is writing the specification and wrote that it must not contain the content of the object, because the message has a certain set of information and the content of the resource has a certain set of information and you have to keep the two separate, Also, messages should be very small. Messages in Fedora can get huge, but that can throw a wrench into the works.

Jared: Depending on your distributedness, the amount of time passing between when the message is emitted and when it is consumed, the object could change, so that explanation makes sense.

Diego: What determines the time stamp?

Aaron: Not exactly sure. Soroka has a question in the spec to determine how we decide that. At present, it refers to the timestamp on the JCR event, which is probably pretty similar to when the resource was modified. I don't want to have a hard and fast rule that says it's exactly the same as when modified, but it should be close.

These changes won't be part of the next release (currently in code freeze), but the idea is that they would come in pretty quickly in the next release.

Ed: Can you speak more about the PREMIS ontology?

Aaron: It's from the LOC and has do to with preservation activities. At present, if you run Fedora with auditing on it records events as PREMIS events. It has properties for things like the date and time when an event happened, who was the agent acting on the repo, etc. We're using it already internally and it's widely used. it seems to capture the kind of data we want to capture in these messages.

Ed: Can it be used to diagnose how things go wrong?

Aaron: You can build something for that. In principle you could do the same thing with a MySQL DB in the backend.

Nick: The Audit Triplestore is set up on the Vagrant build. This came up last year when we were looking at migration and noticed we need an audit function.

Diego: Do we have events from every CRUD action? For versioning?

Aaron: Tries to stay as far from versioning as possible. So probably it has that? But not sure. Probably not doing it exactly right and should be fixed.

Diego: On our side we'll have to make some changes on our Camel routes to use body instead of headers, but we've got lots of time since this isn't going into Fedora HEAD for a while.

Project plan:

Stakeholders who have funded CLAW would like a more clear vision of our plan. We have a document (see link in agenda). Melissa has done some work filling out the management side. The group discussed minimums for a Beta and Nick will go through the document to make a draft to share with the Board and Roadmap Committee, as a basis for asking them to articulate what they want to see in a plan.

Modelling Books:

Nick: PCDM mailing list has a lot of activity and they want out response. We can do a common one or respond individually.

Diego: Storing a book means viewing a book. Trying to model a book in PCDM and finding that while PCDM is a great structural idea of an ontology about how to build stuff, but it's not fully ??? compliant. We have to deal with paged content sooner or later in Islandora. If Hydra's idea is to model things the way we're already doing them, then why not? Not sure how PCDM will help Hydra and Islandora to interact. Don't see how PCDM can be a common data model if PCDM is not a data model

Nick: That was a discussion at LDCX. PCDM is all well and good, but we still have to implement, and if those implementations aren't compatible, what's the point? Working towards a TCK. We need to jump into the discussion on the PCDM list to talk about this for Islandora. The model Hydra is proposing resembles how we do books now, but maybe not how we should do them.

Diego: We need more community contributions, even just to the discussion. This is too hard to do with only 4 or 5 people participating. We can't get it all done.

Jared: That's why we may not be able to get it done, but not why we shouldn't do it. There's always more work than time, but if there's value there and we want to go that route, even if we take a step that's something. If we're not going to do it, we need to say so. Otherwise we'll waste a lot of time creating PCDM objects for no reasons. We're using proxies for collections, which is good when an object is in multiple collections.

Diego: But that isn't true for pages. An example/idea. We won't share pages between multiple books. Could uses ranges or sequences for handling pages. Removed the option of having an OBJ container - considers this a problem in the current stack, since we don't define what that is. Just a preservation master. So this example points to its preservation master. it's modelled like a book, but it's a generic paged context structured that aligns with IIIF. Example

Jared: What about jumping to a page Can we add that?

Diego: It's completely possible. One thing about proxies that we haven't explored yet is that they can be more than just proxies; they can have metadata. We can re-use them and we can include info like page numbers. IIIF is pretty complex.

Nick: Action wise, how do we want to proceed with the PCDM list? Reply and wait for correction if wrong? Throw out Diego's example?

Diego: Go for it. PCDM has meaning for CLAW as a group more than for us as individuals.

Ed: A drawing to illustrate how the pieces work together

Moved to next call:

This is an archive. For new Tech Call notes, click here

⚠️ ARCHIVED Islandora Tech Calls

⚠️ ARCHIVED Islandora User Calls

Clone this wiki locally