fixed tests, added new dedup specific jobs
added implementors for offline dedup person workflow
cleanup
updated sample project records
MapDocument implements a more general view of the pace model
added tests for author ids generation based on the datasource type
Updating tests: funding path ids include funder shortnames (#1379)
The OAI feed generates "enriched sets" for each content providers by applying a set of xpaths to records to understand if they have been enriched. The xpaths are defined in the OAI configuration profile.
fetch only instancetype and hostedby from the instance attributes, adding url to external references
added configurable max number of rel/children to be expanded in each entity
Added dc:creator
updated to the new pace specs, cleanup
WT ids are uniform now
added dedup roots to csv export job, dedup index feed job, tests
added dedup configuration to the entities merging process
commenting test with big doaj dataset
Testing DOAj for #1222#note-4
fundingtree is an escaped xml, not a json anymore.
sample records
reimplemented the fundingpath and context generation
updated packages
renamed test
added json size test
Updated configuration for testing
extended entities join configuration, added more tests
test record took from HDFS
added FCT fundings as contexts
merged branch ProtoMapping
Added oaf:identifiers to record sample.
updated tests
cleanup & tests
added more fields in test record
revised tests
added serialization, tests
Refactored class that extracts fields from records. When we can't find an expected index from the configuration to check its repeatability, the field is indexed as repeatable and a counter is updated.
idScheme and idNamespace defined as part of the OAI configuration profile
Removed dependency to dnet-oai-utils to avoid inheritance of unwanted jars such as cnr-rmi-api, cnr-service-common, spring, etc., which should not appear when running a job on the cluster. Needed classes have been copied and adapted so they do not use spring anymore.
oaf schema location passed as parameter by the workflow
Testing without depending on a running mdstore
small refactor
OAI feed map only job
fixed oaf to xml serialization
merged from branch 0.0.4
fixed IIS output escaping