Parser scripts are instructions that describe how data from a particular source is to be harvested into Supplejack.

View the introductory screencast for a general overview of how parser scripts work inside Supplejack.

You can also view the introductory tutorials that will take you through a step by step process in creating your first parser scripts Tutorial 1 Tutorial 2

What does a parser script look like?

class PublicAddressAndy < HarvesterCore::Rss::Base

  # Specify the location of the data

  base_url "http://publicaddress.net/system/6.rss"

  # Default mappings - the same attributes value for all records

  attributes :content_partner, :display_content_partner, default: "Public Address"
  attributes :category, default: "Podcasts"
  attributes :primary_collection, :display_collection, :collection, default: "Public Address Radio"
  attributes :usage, default: "All rights reserved"

  # Variable mappings - different attribute values for each record

  attributes :title, xpath: "/item/title"
  attributes :description, xpath: "item/description"
  attributes :landing_url, xpath: "item/link"
  attributes :display_date, xpath: "item/pubDate"
  attributes :date, xpath: "item/pubDate", date: true
  attributes :category, xpath: "item/pubDate"

  attributes :internal_identifier do
    get(:landing_url).downcase
  end


end

The above example of a simple parser script starts by identifying the source of the data to ingest from (base_url). Each script then identifies the data fields (attributes) to copy data into. Field names are based on the specific schema that has been set up for your Supplejack instance.

Supplejack is supported by a Parser DSL (Domain Specific Language) that has been designed specifically to support the capture and manipulation of data. The Parser DSL provides rich functions for getting, namespacing, validating, transforming, and enriching your source data. Supplejack allows you to use xpath expressions and regex to get the data, as well as providing the option for ruby code if you need to do some heavy lifting.

The tutorials above are a great place to start in understanding how to build your first parser script, with the Parser DSL, and example scripts being the place to look for advanced support.

Concept parser script

Below is an example of a parser for harvesting concepts. Make sure that your Manager is configured to harvest concepts.

class AucklandArtGalleryPeopleConcepts < SupplejackCommon::Xml::Base

  # Issues
  # http://49.50.242.36/search.do?view=detail&db=person&keyword=Katsukawa+Shunko

  base_url "file:////data/arts.xml"

  record_selector "//vernon"
  record_format :xml

  match_concepts :create_or_match

  attributes :internal_identifier, :landing_url do 
    compose("http://aucklandart.govt.nz/Person/", fetch("//person_id"))
  end

  attributes :type,                       default: "foaf:person"

  attributes :label,                      xpath: "//display_name"
  attributes :name,                       xpath: "//display_name"
  attributes :dateOfBirth do
    fetch("//birth_date").to_date
  end

  attributes :dateOfDeath do
    fetch("//death_date").to_date
  end

  attributes :givenName do
    fetch("//display_name").split(" ").first
  end

  attributes :familyName do
    fetch("//display_name").split(" ").select(:last)
  end

  attributes :gender do
    fetch("//person_type").split("|").select(2).downcase
  end

  attributes :sameAs do
    compose("http://www.aucklandartgallery.com/the-collection/browse-artists/", fetch("/person/@ext_id").first) # this page has a redirect
  end

  attributes :placeOfBirth do
    fetch("//birth_place")
  end

  attributes :placeOfDeath do
    fetch("//death_place")
  end

  reject_if do
    not (get(:dateOfBirth).present? and get(:dateOfDeath).present? and get(:familyName).present? and get(:givenName).present?)
  end 

end

The above example will posts data to the API with concept schema configured.