Parser scripts are instructions that describe how data from a particular source is to be harvested into Supplejack.

View the introductory screencast for a general overview of how parser scripts work inside Supplejack.

You can also view the introductory tutorials that will take you through a step by step process in creating your first parser scripts Tutorial 1 Tutorial 2

What does a parser script look like?

class PublicAddressAndy < HarvesterCore::Rss::Base

  # Specify the location of the data
  
  base_url "http://publicaddress.net/system/6.rss"

  # Default mappings - the same attributes value for all records

  attributes :content_partner, :display_content_partner, default: "Public Address"
  attributes :category, default: "Podcasts"
  attributes :primary_collection, :display_collection, :collection, default: "Public Address Radio"
  attributes :usage, default: "All rights reserved"
  
  # Variable mappings - different attribute values for each record
  
  attributes :title, xpath: "/item/title"
  attributes :description, xpath: "item/description"
  attributes :landing_url, xpath: "item/link"
  attributes :display_date, xpath: "item/pubDate"
  attributes :date, xpath: "item/pubDate", date: true
  attributes :category, xpath: "item/pubDate"
  
  attributes :internal_identifier do
    get(:landing_url).downcase
  end

  
end

The above example of a simple parser script starts by identifying the source of the data to ingest from (base_url). Each script then identifies the data fields (attributes) to copy data into. Field names are based on the specific schema that has been set up for your Supplejack instance.

Supplejack is supported by a Parser DSL (Domain Specific Language) that has been designed specifically to support the capture and manipulation of data. The Parser DSL provides rich functions for getting, namespacing, validating, transforming, and enriching your source data. Supplejack allows you to use xpath expressions and regex to get the data, as well as providing the option for ruby code if you need to do some heavy lifting.

The tutorials above are a great place to start in understanding how to build your first parser script, with the Parser DSL, and example scripts being the place to look for advanced support.

Concept parser script

Below is an example of a parser for harvesting concepts. Make sure that your Manager is configured to harvest concepts.

class AucklandArtGalleryPeopleConcepts < SupplejackCommon::Xml::Base
  
  # Issues
  # http://49.50.242.36/search.do?view=detail&db=person&keyword=Katsukawa+Shunko
  
  base_url "file:////data/arts.xml"

  record_selector "//vernon"
  record_format :xml
  
  match_concepts :create_or_match
  
  attributes :internal_identifier, :landing_url do 
    compose("http://aucklandart.govt.nz/Person/", fetch("//person_id"))
  end
  
  attributes :type,                       default: "foaf:person"
  
  attributes :label,                      xpath: "//display_name"
  attributes :name,                       xpath: "//display_name"
  attributes :dateOfBirth do
    fetch("//birth_date").to_date
  end
  
  attributes :dateOfDeath do
    fetch("//death_date").to_date
  end

  attributes :givenName do
    fetch("//display_name").split(" ").first
  end
  
  attributes :familyName do
    fetch("//display_name").split(" ").select(:last)
  end
  
  attributes :gender do
    fetch("//person_type").split("|").select(2).downcase
  end
  
  attributes :sameAs do
    compose("http://www.aucklandartgallery.com/the-collection/browse-artists/", fetch("/person/@ext_id").first) # this page has a redirect
  end
  
  attributes :placeOfBirth do
    fetch("//birth_place")
  end
  
  attributes :placeOfDeath do
    fetch("//death_place")
  end
  
  reject_if do
    not (get(:dateOfBirth).present? and get(:dateOfDeath).present? and get(:familyName).present? and get(:givenName).present?)
  end 

end

The above example will posts data to the API with concept schema configured.