excluding dateoftransformation from metadata fields, it should be serialised only in the record header
import cleanup
reverting, we need less getters
added more getters
dedup experiments
added mapper class for hdfs actions
cleanup
added Mapper class PromoteActionSetFromHDFS
added anchorStats map-only job
added counter for DOIs
removing useless counters
using most recent dnet-pace-core features
fixed DedupDeleteRelMapper
do not export deleted entities
added utility methods to deal with strings rather than byte[]
sort merged ids
log the documents being compared before failing
introducing support for projects that doesn't provide a link to a specific fundingpath.
implemented job and workflow to export the openaire identifiers
log the number of items clustered on each key
do not consider deleted entities
updating to dnet-openaire-data-protos:3.5.0
updated to dnet-openaire-data-protos:3.5.0-SNAPSHOT
cleanup, extended tests to include new relationships and mapping profiles
counters
counter test
Back to revision r39888 and updated pom and sh files
added possibility to post-process the result stored in the index documents
use of external properties
added min distance algorithm, used to identify the connected components (dedup)
limit the job to insttitutional pubsrepository
counter labels
use of Text instead of ImmutableBytesWritable
reimplemented calculatePersonDistribution M/R job to consider only the results from pubsrepositories (not journals)
reuse the same outkey and outvalue objects
spring makes me lazy
added infospace dump mapper
added information space export job
updated to the new mongodb driver specs
Do not check the status of a record: we assume we have to insert it because the OAI store is built in refresh mode.
OAIStore with compressed bodies. FCurrently for beta only.
fixed tests, added new dedup specific jobs
added implementors for offline dedup person workflow