dnet45dhp-schemasdnet-hadoopdnet40dnet50
Always new records to test how faster we go
Refactored class that extracts fields from records. When we can't find an expected index from the configuration to check its repeatability, the field is indexed as repeatable and a counter is updated.
idScheme and idNamespace defined as part of the OAI configuration profile
Removed dependency to dnet-oai-utils to avoid inheritance of unwanted jars such as cnr-rmi-api, cnr-service-common, spring, etc., which should not appear when running a job on the cluster. Needed classes have been copied and adapted so they do not use spring anymore.
extended dedup configuration, including now blacklists and algorithm parameters
Format, layout and interpretation are obtained from the collection name rather than being fixed.
removed protocolbuffers dependency from dnet-pace-core, Builders and Proto specific tests moved in dnet-openaireplus-mapping-utils, adapted dnet-mapreduce-jobs
oaf schema location passed as parameter by the workflow
View revisions
Also available in: Atom