Modifiers
Modifiers are methods that provide an easy way to accomplish common tasks to manipulate or retrieve data in specific ways.
The modifiers below can be used in the attribute definition blocks.
Retrieve Value
There are two ways to extract data in order to manipulate it for a specific attribute. Both ways then allow to chain other modifiers to actually make the changes to the data.
Get
The get method retrieves the value of a attribute that has already been processed. (The attributes are always processed from top to bottom).
attribute :subjects, xpath: "//subject"
attribute :something_else do
get(:subjects)
end
Fetch
The fetch method is provided to have access to the the raw data.
XML based strategy (XML, RSS)
attribute :creator do
fetch("//author")
end
JSON based strategy (JSON)
attribute :creator do
fetch("author")
end
For the XML based strategy you can supply an optional sanitize_config argument to configure how HTML is stripped (by default all HTML is stripped)
With this you can specify
Tags to keep
attribute :creator do
# This will not strip BR tags from the fetched HTML, all other tags will be stripped
# the array parameter in the second argument is the XML namespaces argument, it must be
# passed because the sanitize_config is the third argument
fetch("//author", [], sanitize_config: {elements: ['br']})
end
Custom replacements for stripped tags
attribute :creator do
sanitize_config = {
whitespace_elements: {
# "before" is what the tag will be replaced with, "after" is what the nodes children will be replaced with (if any)
"br" => {before: ' \n', after: ''}
}
}
fetch("//author", [], sanitize_config: sanitize_config)
end
For more extensive documention refer to the documentation for the sanitization library used
For an overview of all the configuration parameters refer to the default sanitize config
Convinience Methods
The following set of methods allow the operator to inspect the values of a particular field. These methods are specially usefull when the operator needs to acomplish conditional
present?
Checks if there is any value at all. Returns true or false.
attribute :category, xpath: "//category" do
category = get(:category)
category += ["Images"] if get(:thumbnail_url).present?
category
end
includes?
Checks if the value provided is included in the set.
attribute :category, xpath: "//category" do
category = get(:category)
category += ["News"] if get(:dc_type).includes?("News")
category
end
to_a
Converts the AttributeValue object to a regular array.
attribute :category, xpath: "//category" do
get(:category).to_a
end
first
It returns the first value
attribute :category, xpath: "//category" do
get(:category).first
end
Modify Value
find_with(regexp)
It returns the first element that matches the regular expression.
attribute :identifier do
get(:identifier).find_with(/IE:\w/)
end
find_all_with(regexp)
It returns all the elements that matches the regular expression.
attribute :identifier do
get(:identifier).find_all_with(/IE:\w/)
end
find_without(regexp)
It returns the first element that doesn’t match the regular expression.
attribute :identifier do
get(:identifier).find_without(/http/)
end
find_all_without(regexp)
It returns all the elements that don’t match the regular expression.
attribute :identifier do
get(:identifier).find_all_without(/http/)
end
mapping(regexp, substitute_value)
It replaces all the values with the regular expression specified
attribute :enrichment_url do
get(:identifier).mapping(/.*handle.net(.*)/ => 'https://researchspace.auckland.ac.nz/handle\\1?show=full')
end
The mapping key value pairs can also be repeated to perform multiple substitutions on the same value. For exmaple:
attribute :enrichment_url do
get(:identifier).mapping(/width=[\d]{1,4}/ => 'width=520', /height=[\d]{1,4}/ => 'height=310')
end
To use captured values from the regular expression use \1 for the first captured value \2 for the second on so on.
select(start, end)
It allows you to select any element or range of elements within an array
Select the first element
attribute :creator do
get(:creator).select(:first)
end
Select the last element
attribute :creator do
get(:creator).select(:last)
end
Select from the 2nd element to the last
attribute :creator do
get(:creator).select(2, :last)
end
Select from the first element to the second to last
attribute :creator do
get(:creator).select(:first, -2)
end
add(value)
It appends a value to a attribute.
attribute :category, default: ["Newspapers"] do
get(:category).add("Images") if get(:thumbnail_url).present?
end
compose(value1, value2…)
It joins multiple values from potentially different places.
attribute :subjects do
compose(get(:tag), "New Zealand", get(:title))
end
concept_lookup(url)
It returns concept ids if the sameAs of the concept fragment contains given url
attribute :concept_ids do
concept_lookup("http://www.google.com")
end
to_date
It tries to parse the values into real dates.
attribute :date do
get(:display_date).to_date
end
split(split_value)
It will try and split all values by the value specified.
attribute :subject do
get(:subject).split(",")
end
The above code will convert a string in the following format “dogs, cats, puma” to [“dogs”, “ cats”, “ puma”]
attribute :subject do
get(:subject).split(/\d/)
end
The splitter can also split based on a regular expression.
truncate(length, omission=”…”)
It will truncate all values to the specified length. The omission defaults to three dots “…”, but a different string can be specified.
attribute :description do
get(:description).truncate(300)
end
It will truncate the string to 300 charachters.
attribute :description do
get(:description).truncate(10, "")
end
It will truncate the string to 10 characters and it won’t add anything at the end of the string.
downcase
It will downcase all the values
attribute :identifier do
get(:identifier).downcase
end