added serialization, tests
instantiate one SAXReader for each call
fixed format-layout-interpretation concatenation,doesn't fail when the fieldExtractor returns a null result
added json serialization, builds the matching key one time only
do not upsert sets here in the mapper: we shall delegate to a separate workflow to be run after the OAI feeding is completed.
early implementation of jar upload script
Always new records to test how faster we go
Refactored class that extracts fields from records. When we can't find an expected index from the configuration to check its repeatability, the field is indexed as repeatable and a counter is updated.
idScheme and idNamespace defined as part of the OAI configuration profile
proto to pace mapping parses the whole entity
Removed dependency to dnet-oai-utils to avoid inheritance of unwanted jars such as cnr-rmi-api, cnr-service-common, spring, etc., which should not appear when running a job on the cluster. Needed classes have been copied and adapted so they do not use spring anymore.
branch to adapt the proto to pace mapping
extended dedup configuration, including now blacklists and algorithm parameters
Format, layout and interpretation are obtained from the collection name rather than being fixed.
namespace cleanup
removed unused field <dri:repositoryId/>
removed protocolbuffers dependency from dnet-pace-core, Builders and Proto specific tests moved in dnet-openaireplus-mapping-utils, adapted dnet-mapreduce-jobs
oaf schema location passed as parameter by the workflow
Testing without depending on a running mdstore
small refactor
OAI feed map only job
fixed oaf to xml serialization
merged from branch 0.0.4
tests
inferred stuff will be expanded right after of the main entity element
cleanup
updated test configuration
helper method to discover the type of the entity target of a relationship, used during the xml expansion
avoid to emit the relatioships stored in those rows containing a deleted metadata body
fixed relationship distribution
almost working workflows on hbase
dedup working
fixed version number
branch for 4.0.0
added early implementation of OAI feeding job (M/R)
fixed IIS output escaping
added support for one way relationships
SCRIPT_COMMENT: fixed deploy.info file to the module dnet-mapreduce-jobs
SCRIPT_COMMENT: Added deploy.info file to the module dnet-mapreduce-jobs