The harvester worker is a rails application that uses Sidekiq to run all of the various jobs that occur in the harvesting and link checking process.

Harvesting jobs

These use the harvester core gem to interpret a given parser which it uses to generate records and posts them to the Supplejack API (this is a simplified overview of the process).

Preview Job

A preview job is initiated from the manager when a parser is previewed. The worker returns a snapshot of the how the records will look like when created.

This job checks individual records which have been visited recently (in the API) and suppresses records which are unavailable.

Source checking job

This job checks a few records from a collection and suppresses collections which are unavailable.

Job priorities

There are 3 priority levels for Jobs in supplejack worker. Preview is the only critical priority job, which is the highest priority. Link Check job has low priority. All the other jobs are configured with default priority.

Supplejack API integration

Supplejack API harvester endpoints requires an API key with harvester privileges that is configured as enrironment variable. HARVESTER_API_KEY.

Sidekiq

Jobs are processed by Sidekiq, which runs as a part of the worker. In order to start Sidekiq run bundle exec sidekiq start. You can then look at http://WORKER_HOST/sidekiq to view progress.

If you are having issues with your Sidekiq worker jobs not running correctly you can add log output by following these instructions. The most likely cause of issues is that the environment variables are not setup correctly for the cron script and the workers are unable to find the Ruby gems installed on the system

Generate Worker User keys

From the Worker’s project root, Create a user from the console:

rails c
 > User.create!
=> #<User _id: 53714f99531163b56c000001, authentication_token: "RhymLHa9xRQGU8gyAYXP">
 > User.last.authentication_token
=> "RhymLHa9xRQGU8gyAYXP"

Environment Configurations

# Example Supplejack Worker application.yml file

development:
  API_HOST: "http://localhost:3000"
  HARVESTER_API_KEY: <YOUR_HARVESTER_KEY>
  API_MONGOID_HOSTS: "localhost:27017"
  MANAGER_HOST: "http://localhost:3001"
  HARVESTER_CACHING_ENABLED: true
  AIRBRAKE_API_KEY: "abc123"
  LINK_CHECKING_ENABLED: "true"
  LINKCHECKER_RECIPIENTS: "test@example.com"
  WORKER_KEY: <YOUR_WORKER_KEY>

production:
  API_HOST: "http://api.example.com"
  HARVESTER_API_KEY: <YOUR_HARVESTER_KEY>
  API_MONGOID_HOSTS: "localhost:27017"
  MANAGER_HOST: "http://harvester.example.com"
  HARVESTER_CACHING_ENABLED: true
  AIRBRAKE_API_KEY: "abc123"
  LINK_CHECKING_ENABLED: "true"
  LINKCHECKER_RECIPIENTS: "test@example.com"
  WORKER_KEY: <YOUR_WORKER_KEY>

Sidekiq Dashboard Authentication

If you would like to add an authentication wall to the Sidekiq dashboard, create a sidekiq initializer (config/initializers/sidekiq.rb) if you do not have one already.

Add this block of ruby to the sidekiq initializer

require 'sidekiq'
require 'sidekiq/web'

Sidekiq::Web.use(Rack::Auth::Basic) do |user, password|
  [user, password] == [ <set username here>, <set password here>]
end

Then restart the worker Rails server. When you next navigate to the ‘/sidekiq’ route you will be prompted for a username and password.