Tesing with "max" attribute
using inverse rel to refer to the correct link descriptor, avoid npe
defined maximum number of relationships, configurable per relationship type [attribute 'max' in the EntityGrouperConfigurationDSResourceType]
FCT has level_0 only in funding tree
trying to debug
branching for beta and new fundingpaths and context
added more detailed counter about entity sub-type
several improvements
commenting test with big doaj dataset
different escaping
trying to catch any kind of exception
Testing DOAj for #1222#note-4
added DedupSimilarityToActionsMapper and relative dependency
fundingtree is an escaped xml, not a json anymore.
sample records
reimplemented the fundingpath and context generation
updated packages
updated packages, codestyle
codestyle
OafMerger moved to mapping utils
renamed test
added json size test
saving disk space, less logging
Updated configuration for testing
extended entities join configuration, added more tests
test record took from HDFS
discard persons in OAI feeding (#1107)
do not alter inferenceprovenance; codestyle
added FCT fundings as contexts
merged branch ProtoMapping
imlemented retries
Added oaf:identifiers to record sample.
updated tests
cleanup & tests
added default bestlicense value. Used when the records doesn't provide any
added more fields in test record
Moved counters from entity body to header.
- provenance information parsed from element "about" - namespace aware datacite mapping for oaf:language and oaf:dateaccepted - dedupBuildRoot doesn't write to WAL- removed unused claim_2_hbase.xsl- overall cleanup
added relationship/children counters
revised tests
Avoiding set '___' generated when we have "strange" set names such as those in cyrillic/ukrain. In those cases records are assigned to a default set, currently named "OTHER".
expanding provenanceaction classid
merge from branch newIndexFeed
fixing #783 (note-18)
extraInfo removed from CDATA block, expanding provenance action in inferred elements
fixed blacklist type
more logging. fixed entity type check
more logging
defined limit to the maximum number of counters
added serialization, tests
instantiate one SAXReader for each call
fixed format-layout-interpretation concatenation,doesn't fail when the fieldExtractor returns a null result
added json serialization, builds the matching key one time only
do not upsert sets here in the mapper: we shall delegate to a separate workflow to be run after the OAI feeding is completed.
Always new records to test how faster we go
Refactored class that extracts fields from records. When we can't find an expected index from the configuration to check its repeatability, the field is indexed as repeatable and a counter is updated.
idScheme and idNamespace defined as part of the OAI configuration profile
Removed dependency to dnet-oai-utils to avoid inheritance of unwanted jars such as cnr-rmi-api, cnr-service-common, spring, etc., which should not appear when running a job on the cluster. Needed classes have been copied and adapted so they do not use spring anymore.
extended dedup configuration, including now blacklists and algorithm parameters
Format, layout and interpretation are obtained from the collection name rather than being fixed.
namespace cleanup
removed unused field <dri:repositoryId/>
removed protocolbuffers dependency from dnet-pace-core, Builders and Proto specific tests moved in dnet-openaireplus-mapping-utils, adapted dnet-mapreduce-jobs
oaf schema location passed as parameter by the workflow
Testing without depending on a running mdstore
small refactor
OAI feed map only job
fixed oaf to xml serialization
merged from branch 0.0.4
added early implementation of OAI feeding job (M/R)
fixed IIS output escaping
added support for one way relationships