Submission Service

From APSRWiki

Jump to: navigation, search

The 'Submission Service' translates and routes Submission Information Packages (SIP) from the workflow environments to DSpace or Fez+Fedora for automatic ingestion.


Contents

[edit] Overview

The Submission Service will be service-oriented and initially provide mapping from targetted source content of the RIFF workflow projects to NLA METS packages and routing to available repositories.

The Submission Service will comprise of a number of services by default, such as submission, transformation, submission reporting and notification and routing. Each of these will be covered in more detail as design and prototyping moves forward.

The objectives of this development are:

  • to make the Submission Services lightweight but flexible
  • not to be a burden for IT support staff (and other developers) to maintain
  • to use common and stable technologies
  • to be platform-independent
  • to deliver a usable application in APSR timeframes

When considering the comments that follow please keep in mind these objectives.

Currently it is planned for the service to be based around Java servlet technology running under Tomcat (and Apache). At this stage it will be purely Java (primarily because it's pretty simple to write platform independent Java and it's all I know other than C and C++).

Initially I looked at Hibernate for abstracting the persistent store (Postgres database by default) away from the data access API. However to get the most out of this requires learning a new query language (HSQL) and also appears to require a one-to-one mapping of objects and tables. Most of the tutorial docs around also seem to create the db schema from class definitions which is a bit iffy. I'm also not keen on the size of the distro, especially in trying to keep the Submission Service simple and accessible to your average Java developer (and IT support person of course). I've now decided to look at a DAO (Data Access Object) pattern in conjunction with the DAO Abstract Factory pattern to allow others to create database specific DAOs should the need arise.

Underlying the services will be a number of Java interfaces which will allow transforms and more complex workflows to be configured on an as-needs basis. The User Interface side will be completely configurable so reporting/submission UIs can be developed in whatever context they are needed. The servlets will simply return information in an XML stream and how that is processed is up to each institution. The focus of this project will be getting the framework right using the RIFF projects as use cases. A simple UI will accompany the initial release (mainly for demonstration purposes), however this is expected to be throwaway since where and how the services are used will vary from institution to institution.

I should note that JBPM was considered as a possible workflow solution however this requires JBoss and Hibernate and appears overkill at this point (and for the same reasons I'm looking at DAO rather than Hibernate). I also briefly looked at Spring but it's definitely overkill and would probably take a significant amount of time to come up to speed on. I also found myself wondering how I was going to map this development effort to Spring, which sounded warnings as to how useful it would be to me.

[edit] Development Plan

The Quartz scheduler is looking like it's exactly what we need. Current testing is looking at its workflow capabilities and developing a framework around Quartz targetted at APSR requirements. The current development plan is as follows:

[edit] Stage 1

Develop submit and status services to a deployable demo level. This will provide a simple stateless job submission webapp (deployable as a Tomcat war file) for running and definiing batch jobs. A number of simple default jobs will be included for retrieving, transforming and transferring content as well as simple notification. In addition a simple config markup will be available for defining jobs although the submit service will need to be invoked to reschedule any jobs should the application server fail.

[edit] Stage 2

Develop stateful job submission. This will use the Quartz database backend to store job state and run-time arguments in order that should the webapp fail, job state is not lost and jobs will automatically restart when the application server is restarted. It will also look at interruptable jobs (Stage 1 assumes jobs are restarted from scratch should they fail, which is ok in a batch context but not in a more dynamic context). This work is not likely to start until late April and as well as a more robust service, will deliver any additional job classes developed under the Journal Workflow project. Stage 2 thus will deliver a robust and configurable job scheduling framework for submitting and exchanging repository objects.

Services will be implemented as servlets and will interact with David Berriman's UI framework. The Jakarta Commons config package will be used for job definitions as this provides a completely flexible means of defining jobs as well as providing helper methods to parse the job definitions. Two Java interfaces will be available for people wishing to define their own jobs and job creation classes, some defaults will be provided.

[edit] Current Status

For the most part, Stage 1 is complete with samples for submitting OJS journal issues to DSpace and Fedora although some mods are required and the initial hacky test stylesheets will need to be rewritten from scratch once the OJS METS profile is complete. There are also some MODS required to incorporate PREMIS metadata and support Fedora RELS-EXT metadata but all seems do-able.

Stage 2 has started and database seems fine, may not need any additional database information, although I need to understand more about the job chaining information when using the Quartz job store. Planning to start UI work with David week starting 23 April.

[edit] APSR-defined design constraints

The Submission Service must take into account the following design constraints:

  • developed as a standalone Web-application that can be accessed by other software applications using an API that supports Web service protocols;
  • interoperable with DSpace and Fez+Fedora;
  • interoperable with COSI software (Collections Service Registry, Benchmark Statistics, Obsolescence Notification Service) where relevant;
  • deployed as part of DSpace and Fez+Fedora installations;
  • extensible XSLT workflow-to-SIP translation libraries;
  • deployable within the proposed National federated AAA service (e.g. Shibboleth);
  • packaged and documented suitable for release in the public domain (conditional on the identification of a IP license suitable for the higher education sector); and
  • deployable using common COSI Framework GUI components.

[edit] Development Notes

Just some notes which may (or may not) be useful

  • Using CVS for version control on backed up server
  • Root Java package (at least initially) will be au.edu.apsr.riffs3
  • Sun JDK 1.6, Tomcat 6, Postgres 8.2.3. At least JDK1.5 will be required to run the application, it *may* be possible to build using JDK 1.4 but safer to assume not
  • Quartz scheduler software (Open Source, Apache 2.0 License)
Personal tools