Enrichment is the process by which a record's metadata can be augmented by either fetching information from an external resource or by performing data analysis on a Collection of records.

The DSL allows the harvest operator to specify multiple enrichments for a parser config. Each enrichment can define one or multiple attributes.

Define a JSON enrichment

enrichment :google_placenames do
  requires :country do
    "New Zealand"

  url "http://maps.google.com"
  format "json"

  attribute :locations do
      latitude: path("place/lat"),
      longitude: path("place/lng")

  attribute :address do
    "Wellington, #{requirements[:country]}"

Define a XML enrichment

enrichment :ndha_rights do
  requires :tap_id do
    primary[:identifier].first.to_s.gsub("tap:", "")

  url "http://ndhadeliver.govt.nz/?recordId=#{requirements[:tap_id]}"
  format "xml"

  attribute :dc_rights, xpath: "//dc:rights"

DSL Enrichment Options


It gives the harvest operator the option to specify an enrichment class to use for a specific enrichment. These are generally special case enrichments. The only cases we currently have at the moment are for tapuhi. ruby enrichment :tapuhi_denormalization, type: "TapuhiDenormalize"

The classes we currently support are the following:

TapuhiDenormalize: This class denormalizes the authorities
TapuhiRelationships: This class builds the authority relationships from the relation field on a primary source


It gives the harvest operator the option to specifiy the priority of an enrichment source.

enrichment :tapuhi_denormalization, priority: -1, type: "TapuhiDenormalize"

Negative numbers have a higher priority than positive numbers.


Setting this to true creates a record in the api that has status: partial until the record has a source with the name of this enrichment. Many enrichments in a parser can have this option.

enrichment :tapuhi_denormalization, priority: -1, required_for_active_record: true do

Notes: * Enrichments that contain reject_if block will not write sources for rejected records. If this enrichment is required using this option, rejected records will not become active. * Although technically possible, this is likely to break reflective enrichments (specifying a type: option).

DSL Enrichment Methods


It gives the harvest operator access to the primary source's attributes through a square bracket notation.


It will return an AttributeValue object, which means the operator has access to the same DSL as when working within an attribute definition block.



The requires method allows the harvest operator to specify a value that is required in order to be able to perform the enrichment. It also stores the returned value so that it can be conveniently accessed later.

When any of the require blocks returns an empty or nil value, the enrichment will be skiped.

requires :tap_id do
  primary[:identifier].first.gsub("tap:", "")

It then makes the tap_id available through the requirements method.

url "http://ndhadeliver.org?recordId=#{requirements[:tap_id]}"


It specifies the location of the enrichment resource.

url "http://ndhadeliver.govt.nz/?recordId=#{primary[:identifier].first}"


It tells the enrichment how it should parse the resource. There are 3 types of resources XML, JSON and File.

XML and JSON resources work the same as in the root of the record and the File resource exposes a number of pre set values about the file.

File Resource

The exposed values by the file resource are: - size - height - width - mime_type - extension - url

The following is an example of how to build a thumbnail ```ruby enrichment :ndha_thumbnail do requires :tap_id do primary[:thumbnail_url].first.match(/IE[\d]+/)[0] end

url "http://ndhadeliver.natlib.govt.nz/NLNZStreamGate/get?dps_pid=#{requirements[:tap_id]}" format "file"

attribute :thumbnail do { file_size: get(:size).first, height: get(:height).first, width: get(:width).first, file_type: get(:mime_type).first, file_extension: get(:extension).first, url: get(:url).first } end end ```