refactoring and change of counters
rollback wrong commit
fixing and testing propagation implementation
reducer for country propagation that writes on hdfs
cleanup pid types in order to make them valid attributes
added code for propagation of countries from institutional organization
master branch for deployments @ICM
why parse strings as Floats?
reverted to r52985 . Test runs shows we need to rely on the edgeIds produced by the connected components identfication phase instead of the vertexIds
alignment to trunk version
avoid to produce duplicated events by eliminating the roots from the comparison process
introduced mapping resulttype -> portal url
Fixed log class name
avoid collisions when hashing pids by value
cleaned up unused method, using setDurability in put operation
added mapper and hadoop job configuration file for importing Grid.AC organization data
integrating bulktag from trunk to beta branch
rule out invalid dates also on CrossRefToActions
rule out invalid dates on ScholixToActions
cleanup
produce 'supplement' subrel type in case of supplement relationships
simplified connected component application on the graph
adding check to understand the bug of wrong relation generated
do not skip processing datasets in DedupBuildRootsMapper, improved error reporting in DedupBuildRootsReducer
do not push vertex ids in memory, process them on the fly
added jobs for predatory journal analysis
added invisible setup
refactored Action
fixed null element
Created CrossrefImportMapper
add CrossRefToAction
fixed mapping from scholix to openaire model
small fixes
changed key type
implemented mapper writing
added configuration
added Mapper for tranform scholexplorer links into actionsets
deprecation: use setDurability instead of setWriteToWAL
introduced subType in pace wf configuration
adjusted ids export procedure
avoid to emit enrichment events when the similarity score is below the threshold
javadoc and test
indentation
pick the 1st instance to avoid collisions
improved behaviour EventWrapperTest
Partial implementation of a unit test
Fixed the generation of eventIds
Workaround for CLARIN mining issue: #3670#note-29
expand author identifiers
generate ENRICH/MISSING/PID only when the publication didn
discover the invalid character from the exception details
mapper class that parses xml records
expand field distributionlocation in result's instances
Including Open SOurce among the licenses
Added counters for missing date of collection and transformation
Do not add to the BasicDBObject properties that are not listed as field to index
splitAsList cannot be found when running on the cluster (dependency issues with guava?). Lets try to work around it.
OAI M/R jobs expect a new parameter that lists the date patterns to try 'services.publisher.oai.datepatterns'
We also have some date as ISO DateTime with Zone...
All date fields are actually added as Date field on mongo, hopefully
Fixed bug when retrieving info about store indices for a given metadata format
added preliminary support for events regarding software
don't fail in case of missing context ids
force gson to serialise dates in a format that can be undrestood by ElasticSearch, updated elasticsearch-hadoop-mr lib to version 5.2.0
getting rid of ugly hacks
use getInvisible instead of hasInvisible
beta
Fixed date parsing in OAI
#3110 Support incremental harvesting: setting dateOfTransformation as datestamp whenever available
refactored broker events generation
integrated exportSummaryRecordsJob mapper from dnet40
using SolrServer (4.X)
exclude from the deduplication process results that aren't publications
do not index invisible records
added support for invisible records
skip weird cases in CC algo
fixing mapping for license vs accessright #3128, cleanup
getting rid of person entities
upgraded solr version to 6.6.0
some java8 refactorings, added more tests for the software entities mapping
integrated latest changes from dnet40
instead of excluding datasets from the deduplication process, we include only publications
implemented use of opt in/out rules for entity fields (#2557).depending on specific solrj version (thus excluding cdh6.X versions)
codebase used to migrate to java8 the production system