Project

General

Profile

Statistics
| Revision:

# Date Author Comment
53589 30/10/2018 12:27 PM Claudio Atzori

export invalid xml records

53588 30/10/2018 12:26 PM Sandro La Bruzzo

refactored Action

53572 29/10/2018 10:12 AM Claudio Atzori

Map only job that produces [openaireId, doi] pairs of records containing invalid characters

53565 26/10/2018 03:40 PM Sandro La Bruzzo

added parameter to filter only organization in DOIBoostToAction

53556 25/10/2018 04:30 PM Sandro La Bruzzo

fixed problem of missing name in authors

53554 25/10/2018 12:26 PM Sandro La Bruzzo

merged beta branch to master

53518 18/10/2018 02:48 PM Claudio Atzori

introduced use of BlockProcessor

53421 09/10/2018 11:55 AM Miriam Baglioni

fixed issue when country information is not present for datasource

53419 08/10/2018 10:14 AM Miriam Baglioni

change throwing of exception with counters

53410 05/10/2018 04:18 PM Miriam Baglioni

change parameter from ImmutableBytesWritable to Text

53409 05/10/2018 04:17 PM Miriam Baglioni

refactoring and change of counters

53407 05/10/2018 04:06 PM Claudio Atzori

rollback wrong commit

53386 04/10/2018 03:35 PM Claudio Atzori

fixing and testing propagation implementation

53383 04/10/2018 02:46 PM Miriam Baglioni

reducer for country propagation that writes on hdfs

53371 03/10/2018 10:22 AM Claudio Atzori

cleanup pid types in order to make them valid attributes

53369 02/10/2018 02:08 PM Miriam Baglioni
53362 02/10/2018 10:18 AM Miriam Baglioni

added code for propagation of countries from institutional organization

53340 01/10/2018 10:04 AM Claudio Atzori

master branch for deployments @ICM

53336 01/10/2018 09:26 AM Claudio Atzori

why parse strings as Floats?

53288 27/09/2018 01:48 PM Claudio Atzori

reverted to r52985 . Test runs shows we need to rely on the edgeIds produced by the connected components identfication phase instead of the vertexIds

53279 26/09/2018 03:59 PM Miriam Baglioni

alignment to trunk version

53262 25/09/2018 05:41 PM Claudio Atzori

avoid to produce duplicated events by eliminating the roots from the comparison process

53260 25/09/2018 03:26 PM Claudio Atzori

introduced mapping resulttype -> portal url

53213 21/09/2018 12:36 PM Alessia Bardi

Fixed log class name

53191 19/09/2018 05:46 PM Claudio Atzori

avoid collisions when hashing pids by value

53190 19/09/2018 05:33 PM Claudio Atzori

cleaned up unused method, using setDurability in put operation

53103 12/09/2018 03:03 PM Claudio Atzori

added mapper and hadoop job configuration file for importing Grid.AC organization data

53079 11/09/2018 06:31 PM Claudio Atzori

integrating bulktag from trunk to beta branch

53068 11/09/2018 03:27 PM Claudio Atzori

rule out invalid dates also on CrossRefToActions

53067 11/09/2018 03:22 PM Claudio Atzori

rule out invalid dates on ScholixToActions

53036 10/09/2018 10:17 AM Claudio Atzori

cleanup

53035 10/09/2018 10:16 AM Claudio Atzori

produce 'supplement' subrel type in case of supplement relationships

53025 05/09/2018 02:33 PM Claudio Atzori

simplified connected component application on the graph

52993 28/08/2018 05:06 PM Sandro La Bruzzo

adding check to understand the bug of wrong relation generated

52985 27/08/2018 10:07 AM Claudio Atzori

do not skip processing datasets in DedupBuildRootsMapper, improved error reporting in DedupBuildRootsReducer

52984 27/08/2018 10:00 AM Claudio Atzori

do not push vertex ids in memory, process them on the fly

52960 08/08/2018 12:36 PM Claudio Atzori

added jobs for predatory journal analysis

52958 07/08/2018 06:15 PM Sandro La Bruzzo

added invisible setup

52957 07/08/2018 06:12 PM Sandro La Bruzzo

refactored Action

52956 07/08/2018 06:07 PM Sandro La Bruzzo

fixed null element

52955 07/08/2018 05:51 PM Sandro La Bruzzo

Created CrossrefImportMapper

52951 07/08/2018 05:30 PM Sandro La Bruzzo

add CrossRefToAction

52935 07/08/2018 11:29 AM Claudio Atzori

fixed mapping from scholix to openaire model

52931 07/08/2018 09:39 AM Claudio Atzori

small fixes

52930 06/08/2018 06:35 PM Sandro La Bruzzo

changed key type

52929 06/08/2018 06:29 PM Sandro La Bruzzo

changed key type

52916 06/08/2018 05:32 PM Sandro La Bruzzo

implemented mapper writing

52915 06/08/2018 04:52 PM Sandro La Bruzzo

added configuration

52912 06/08/2018 04:09 PM Sandro La Bruzzo

added Mapper for tranform scholexplorer links into actionsets

52883 02/08/2018 04:25 PM Claudio Atzori

deprecation: use setDurability instead of setWriteToWAL

52878 02/08/2018 02:19 PM Claudio Atzori

introduced subType in pace wf configuration

52823 25/07/2018 04:10 PM Claudio Atzori

adjusted ids export procedure

52805 24/07/2018 05:22 PM Claudio Atzori

avoid to emit enrichment events when the similarity score is below the threshold

52804 24/07/2018 02:56 PM Claudio Atzori

avoid to emit enrichment events when the similarity score is below the threshold

52803 24/07/2018 02:53 PM Claudio Atzori

avoid to emit enrichment events when the similarity score is below the threshold

52802 24/07/2018 02:04 PM Claudio Atzori

javadoc and test

52801 24/07/2018 12:14 PM Claudio Atzori

indentation

52797 23/07/2018 04:10 PM Claudio Atzori

pick the 1st instance to avoid collisions

52777 20/07/2018 04:04 PM Claudio Atzori

improved behaviour EventWrapperTest

52775 20/07/2018 03:07 PM Michele Artini

Partial implementation of a unit test

52765 18/07/2018 11:45 AM Michele Artini

Fixed the generation of eventIds

52751 13/07/2018 05:38 PM Alessia Bardi

Workaround for CLARIN mining issue: #3670#note-29

52524 18/06/2018 03:07 PM Claudio Atzori

expand author identifiers

52490 15/06/2018 11:25 AM Claudio Atzori

generate ENRICH/MISSING/PID only when the publication didn

52488 15/06/2018 11:20 AM Claudio Atzori

discover the invalid character from the exception details

52469 13/06/2018 03:05 PM Claudio Atzori

mapper class that parses xml records

52462 13/06/2018 11:43 AM Claudio Atzori

mapper class that parses xml records

52461 13/06/2018 11:42 AM Claudio Atzori

mapper class that parses xml records

52421 08/06/2018 05:57 PM Claudio Atzori

expand field distributionlocation in result's instances

52212 24/05/2018 06:14 PM Alessia Bardi

Including Open SOurce among the licenses

52114 21/05/2018 12:24 PM Alessia Bardi

Added counters for missing date of collection and transformation

52112 21/05/2018 12:19 PM Alessia Bardi

Do not add to the BasicDBObject properties that are not listed as field to index

52111 21/05/2018 11:46 AM Alessia Bardi

splitAsList cannot be found when running on the cluster (dependency issues with guava?). Lets try to work around it.

52109 21/05/2018 11:21 AM Alessia Bardi

OAI M/R jobs expect a new parameter that lists the date patterns to try 'services.publisher.oai.datepatterns'

52078 17/05/2018 07:03 PM Alessia Bardi

We also have some date as ISO DateTime with Zone...

52076 17/05/2018 05:15 PM Alessia Bardi

All date fields are actually added as Date field on mongo, hopefully

52075 17/05/2018 05:14 PM Alessia Bardi

Fixed bug when retrieving info about store indices for a given metadata format

51451 23/03/2018 05:07 PM Claudio Atzori

added preliminary support for events regarding software

51221 14/03/2018 12:26 PM Claudio Atzori

don't fail in case of missing context ids

51006 02/03/2018 10:50 AM Claudio Atzori

force gson to serialise dates in a format that can be undrestood by ElasticSearch, updated elasticsearch-hadoop-mr lib to version 5.2.0

50270 10/01/2018 05:49 PM Claudio Atzori

getting rid of ugly hacks

50256 09/01/2018 10:02 AM Claudio Atzori

use getInvisible instead of hasInvisible

50236 03/01/2018 09:22 AM Claudio Atzori

beta

50157 18/12/2017 04:34 PM Alessia Bardi

Fixed date parsing in OAI

50153 18/12/2017 04:11 PM Claudio Atzori

cleanup

49891 14/11/2017 05:23 PM Alessia Bardi

#3110 Support incremental harvesting: setting dateOfTransformation as datestamp whenever available

49832 07/11/2017 09:58 AM Claudio Atzori

refactored broker events generation

49831 07/11/2017 09:58 AM Claudio Atzori

cleanup

49815 06/11/2017 10:09 AM Claudio Atzori

integrated exportSummaryRecordsJob mapper from dnet40

49565 19/10/2017 05:15 PM Claudio Atzori

using SolrServer (4.X)

49517 17/10/2017 03:08 PM Claudio Atzori

exclude from the deduplication process results that aren't publications

49516 17/10/2017 03:07 PM Claudio Atzori

do not index invisible records

49515 17/10/2017 03:07 PM Claudio Atzori

added support for invisible records

49218 03/10/2017 06:14 PM Claudio Atzori

skip weird cases in CC algo

49096 25/09/2017 05:32 PM Claudio Atzori

fixing mapping for license vs accessright #3128, cleanup

49029 20/09/2017 06:43 PM Claudio Atzori

getting rid of person entities

48892 08/09/2017 02:07 PM Claudio Atzori

upgraded solr version to 6.6.0

48697 25/07/2017 10:13 AM Claudio Atzori

some java8 refactorings, added more tests for the software entities mapping

48145 29/06/2017 10:15 AM Claudio Atzori

integrated latest changes from dnet40

47483 13/06/2017 02:51 PM Claudio Atzori

instead of excluding datasets from the deduplication process, we include only publications