using inverse rel to refer to the correct link descriptor, avoid npe
defined maximum number of relationships, configurable per relationship type [attribute 'max' in the EntityGrouperConfigurationDSResourceType]
FCT has level_0 only in funding tree
trying to debug
Updated version and scm. Changed dependency to 3.0.3 of mapping-utils
branching for beta and new fundingpaths and context
Added dc:creator
csv export of the duplicates original ids
cleanup
updated to the new pace specs, cleanup
WT ids are uniform now
Using MongoClient instead of deprecated Mongo. Removed record status management, since records are always new anyway.
trying to perform explicit escape
delegating to StringEscapeUtils
escaping also quotes and apostrophes
aggressive escaping
avoid infinite loop :)
updated data used for testing
fixing WT funding tree translation as contexts
adding empty solr docs to the rotten record set
integrated changes of r36247 from trunk
write skipped records into the rotten folder
using different counter names
Better to depend on the branch of mapping utils in this branch of mapreduce-jobs because of the last changes implemented by Claudio.
reverted to r35900
merging from trunk
added dedup roots to csv export job, dedup index feed job, tests
using proper logger
added dedup configuration to the entities merging process
We can use the most up-to-date version of mapping-utils here
Fixed scm and deploy.info
Distinguish publications from datasets when counting
added more detailed counter about entity sub-type
several improvements
Increment counter in case of no rows to keep track of records without body.
updated version to 0.0.6.3.1
including changes to catch and fail for any exception of r35769 of trunk
branch for code before the re-implementation of context and fundingpaths
raised version
commenting test with big doaj dataset
different escaping
trying to catch any kind of exception
Testing DOAj for #1222#note-4
added DedupSimilarityToActionsMapper and relative dependency
increased version in scripts
updated the version of a dependency
fundingtree is an escaped xml, not a json anymore.
increased a minor virsion
sample records
reimplemented the fundingpath and context generation
updated packages
updated packages, codestyle
codestyle
OafMerger moved to mapping utils
temporary commit
offline dedup
added protobuf-java-format dependency
renamed test
added json size test
saving disk space, less logging
Updated configuration for testing
extended entities join configuration, added more tests
scripts using updated version 0.0.6.3
test record took from HDFS
discard persons in OAI feeding (#1107)
do not alter inferenceprovenance; codestyle
Using released hadoop parent.
added FCT fundings as contexts
merged branch ProtoMapping
imlemented retries
Added oaf:identifiers to record sample.
ignored iml file
updated tests
updated scripts
[maven-release-plugin] prepare for next development iteration
[maven-release-plugin] copy for tag dnet-mapreduce-jobs-0.0.5
[maven-release-plugin] prepare release dnet-mapreduce-jobs-0.0.5
removed extra scm tag
added scm
bumped version, updated parent: let's start to depend on releases
cleanup & tests
added default bestlicense value. Used when the records doesn't provide any
added more fields in test record
Moved counters from entity body to header.
- provenance information parsed from element "about" - namespace aware datacite mapping for oaf:language and oaf:dateaccepted - dedupBuildRoot doesn't write to WAL- removed unused claim_2_hbase.xsl- overall cleanup
added relationship/children counters
revised tests
Avoiding set '___' generated when we have "strange" set names such as those in cyrillic/ukrain. In those cases records are assigned to a default set, currently named "OTHER".
expanding provenanceaction classid
merge from branch newIndexFeed
fixing #783 (note-18)
extraInfo removed from CDATA block, expanding provenance action in inferred elements
fixed dependencies
created tag folder for release
removed CDATA from extraInfo payloads