Project

General

Profile

Statistics
| Revision:

# Date Author Comment
36859 03/05/2015 11:34 PM Alessia Bardi

FCT has level_0 only in funding tree

36855 01/05/2015 09:45 AM Claudio Atzori

trying to debug

36833 30/04/2015 12:34 PM Alessia Bardi

Updated version and scm. Changed dependency to 3.0.3 of mapping-utils

36832 30/04/2015 12:30 PM Alessia Bardi

branching for beta and new fundingpaths and context

36804 28/04/2015 06:21 PM Alessia Bardi

Added dc:creator

36796 28/04/2015 05:13 PM Claudio Atzori

csv export of the duplicates original ids

36735 27/04/2015 11:22 AM Claudio Atzori

cleanup

36670 23/04/2015 04:57 PM Claudio Atzori

updated to the new pace specs, cleanup

36650 23/04/2015 02:36 PM Alessia Bardi

WT ids are uniform now

36641 23/04/2015 11:33 AM Alessia Bardi

Using MongoClient instead of deprecated Mongo. Removed record status management, since records are always new anyway.

36583 22/04/2015 01:44 PM Claudio Atzori

trying to perform explicit escape

36570 22/04/2015 10:23 AM Claudio Atzori

delegating to StringEscapeUtils

36569 22/04/2015 10:14 AM Claudio Atzori

escaping also quotes and apostrophes

36568 21/04/2015 05:24 PM Claudio Atzori

aggressive escaping

36566 21/04/2015 05:07 PM Claudio Atzori

avoid infinite loop :)

36562 21/04/2015 02:11 PM Claudio Atzori

updated data used for testing

36561 21/04/2015 02:10 PM Claudio Atzori

fixing WT funding tree translation as contexts

36271 09/04/2015 04:30 PM Claudio Atzori

adding empty solr docs to the rotten record set

36270 09/04/2015 04:29 PM Alessia Bardi

integrated changes of r36247 from trunk

36269 09/04/2015 04:22 PM Alessia Bardi

write skipped records into the rotten folder

36247 09/04/2015 02:53 PM Claudio Atzori

using different counter names

36177 08/04/2015 11:36 AM Alessia Bardi

Better to depend on the branch of mapping utils in this branch of mapreduce-jobs because of the last changes implemented by Claudio.

36169 08/04/2015 11:22 AM Claudio Atzori

reverted to r35900

36168 08/04/2015 11:20 AM Claudio Atzori

merging from trunk

36164 08/04/2015 10:48 AM Claudio Atzori

added dedup roots to csv export job, dedup index feed job, tests

36158 08/04/2015 09:51 AM Claudio Atzori

using proper logger

36157 08/04/2015 09:50 AM Claudio Atzori

added dedup configuration to the entities merging process

36043 03/04/2015 06:41 PM Alessia Bardi

We can use the most up-to-date version of mapping-utils here

36042 03/04/2015 06:39 PM Alessia Bardi

Fixed scm and deploy.info

36041 03/04/2015 06:38 PM Alessia Bardi

Distinguish publications from datasets when counting

35981 03/04/2015 11:32 AM Claudio Atzori

added more detailed counter about entity sub-type

35975 03/04/2015 10:31 AM Claudio Atzori

several improvements

35917 02/04/2015 12:19 PM Alessia Bardi

Increment counter in case of no rows to keep track of records without body.

35900 01/04/2015 05:20 PM Alessia Bardi

updated version to 0.0.6.3.1

35899 01/04/2015 05:17 PM Alessia Bardi

including changes to catch and fail for any exception of r35769 of trunk

35898 01/04/2015 05:14 PM Alessia Bardi

branch for code before the re-implementation of context and fundingpaths

35897 01/04/2015 05:00 PM Alessia Bardi

raised version

35896 01/04/2015 04:55 PM Alessia Bardi

commenting test with big doaj dataset

35771 30/03/2015 11:57 AM Claudio Atzori

different escaping

35769 30/03/2015 11:46 AM Claudio Atzori

trying to catch any kind of exception

35746 27/03/2015 05:23 PM Alessia Bardi

Testing DOAj for #1222#note-4

35476 18/03/2015 06:47 PM Claudio Atzori

added DedupSimilarityToActionsMapper and relative dependency

35452 18/03/2015 01:55 PM Michele Artini

increased version in scripts

35451 18/03/2015 01:07 PM Michele Artini

updated the version of a dependency

35442 18/03/2015 12:15 PM Alessia Bardi

fundingtree is an escaped xml, not a json anymore.

35439 18/03/2015 12:01 PM Michele Artini

increased a minor virsion

35196 09/03/2015 05:09 PM Michele Artini

sample records

35179 09/03/2015 02:39 PM Michele Artini

reimplemented the fundingpath and context generation

35135 05/03/2015 07:46 PM Claudio Atzori

updated packages

35133 05/03/2015 07:44 PM Claudio Atzori

updated packages, codestyle

35129 05/03/2015 07:38 PM Claudio Atzori

codestyle

35128 05/03/2015 07:34 PM Claudio Atzori

updated packages

35127 05/03/2015 07:31 PM Claudio Atzori

OafMerger moved to mapping utils

34901 27/02/2015 05:39 PM Claudio Atzori

temporary commit

34898 27/02/2015 05:37 PM Claudio Atzori

offline dedup

34602 19/02/2015 04:17 PM Claudio Atzori

added protobuf-java-format dependency

34600 19/02/2015 04:14 PM Claudio Atzori

renamed test

34599 19/02/2015 04:07 PM Claudio Atzori

added json size test

34536 16/02/2015 07:56 PM Claudio Atzori

saving disk space, less logging

34454 11/02/2015 07:31 PM Alessia Bardi

Updated configuration for testing

34439 11/02/2015 04:06 PM Claudio Atzori

extended entities join configuration, added more tests

34438 11/02/2015 03:49 PM Claudio Atzori

extended entities join configuration, added more tests

34387 10/02/2015 11:18 AM Alessia Bardi

scripts using updated version 0.0.6.3

34386 10/02/2015 11:16 AM Alessia Bardi

test record took from HDFS

34374 09/02/2015 06:44 PM Alessia Bardi
34358 09/02/2015 12:05 PM Alessia Bardi

discard persons in OAI feeding (#1107)

34225 03/02/2015 11:50 AM Claudio Atzori

do not alter inferenceprovenance; codestyle

33811 09/01/2015 05:25 PM Alessia Bardi

Using released hadoop parent.

33382 12/12/2014 06:07 PM Claudio Atzori

added FCT fundings as contexts

33137 02/12/2014 04:33 PM Claudio Atzori

merged branch ProtoMapping

32832 17/11/2014 04:41 PM Claudio Atzori

imlemented retries

32714 13/11/2014 03:20 PM Alessia Bardi

Added oaf:identifiers to record sample.

32328 07/11/2014 03:53 PM Claudio Atzori

ignored iml file

32094 03/11/2014 05:21 PM Claudio Atzori

updated tests

32008 31/10/2014 10:45 AM Claudio Atzori

updated scripts

32007 31/10/2014 10:36 AM Claudio Atzori

[maven-release-plugin] prepare for next development iteration

32006 31/10/2014 10:36 AM Claudio Atzori

[maven-release-plugin] copy for tag dnet-mapreduce-jobs-0.0.5

32005 31/10/2014 10:36 AM Claudio Atzori

[maven-release-plugin] prepare release dnet-mapreduce-jobs-0.0.5

32004 31/10/2014 10:35 AM Claudio Atzori

removed extra scm tag

32003 31/10/2014 10:31 AM Claudio Atzori

added scm

32002 31/10/2014 10:23 AM Claudio Atzori

[maven-release-plugin] prepare release dnet-mapreduce-jobs-0.0.5

31998 31/10/2014 10:19 AM Claudio Atzori

bumped version, updated parent: let's start to depend on releases

31997 31/10/2014 10:19 AM Claudio Atzori

cleanup & tests

31409 16/10/2014 05:42 PM Claudio Atzori

added default bestlicense value. Used when the records doesn't provide any

31208 08/10/2014 03:25 PM Claudio Atzori

added more fields in test record

31186 07/10/2014 02:54 PM Alessia Bardi

Moved counters from entity body to header.

30969 01/10/2014 02:22 PM Claudio Atzori

- provenance information parsed from element "about"
- namespace aware datacite mapping for oaf:language and oaf:dateaccepted
- dedupBuildRoot doesn't write to WAL
- removed unused claim_2_hbase.xsl
- overall cleanup

30968 01/10/2014 02:15 PM Claudio Atzori

added relationship/children counters

30967 01/10/2014 02:14 PM Claudio Atzori

revised tests

30882 25/09/2014 05:42 PM Alessia Bardi

Avoiding set '___' generated when we have "strange" set names such as those in cyrillic/ukrain. In those cases records are assigned to a default set, currently named "OTHER".

30863 25/09/2014 12:40 PM Claudio Atzori

expanding provenanceaction classid

30835 23/09/2014 06:14 PM Claudio Atzori

merge from branch newIndexFeed

30834 23/09/2014 06:14 PM Claudio Atzori

merge from branch newIndexFeed

30833 23/09/2014 06:14 PM Claudio Atzori

fixing #783 (note-18)

30827 23/09/2014 03:46 PM Claudio Atzori

extraInfo removed from CDATA block, expanding provenance action in inferred elements

30751 18/09/2014 03:02 PM Alessia Bardi

fixed dependencies

30350 17/09/2014 11:05 AM Sandro La Bruzzo

created tag folder for release

30017 04/09/2014 04:08 PM Claudio Atzori

removed CDATA from extraInfo payloads

30005 04/09/2014 11:44 AM Claudio Atzori

using CloudSolrServer for parallel index feeding

29946 02/09/2014 05:23 PM Claudio Atzori

added branch name