/modules/dnet-mapreduce-jobs - Changes - D-Net - D-Net project tracking tool

dnet40/modules/dnet-mapreduce-jobs @ 36466

#			Date	Author	Comment
36271			09/04/2015 04:30 PM	Claudio Atzori	adding empty solr docs to the rotten record set
36270			09/04/2015 04:29 PM	Alessia Bardi	integrated changes of r36247 from trunk
36269			09/04/2015 04:22 PM	Alessia Bardi	write skipped records into the rotten folder
36247			09/04/2015 02:53 PM	Claudio Atzori	using different counter names
36177			08/04/2015 11:36 AM	Alessia Bardi	Better to depend on the branch of mapping utils in this branch of mapreduce-jobs because of the last changes implemented by Claudio.
36169			08/04/2015 11:22 AM	Claudio Atzori	reverted to r35900
36168			08/04/2015 11:20 AM	Claudio Atzori	merging from trunk
36164			08/04/2015 10:48 AM	Claudio Atzori	added dedup roots to csv export job, dedup index feed job, tests
36158			08/04/2015 09:51 AM	Claudio Atzori	using proper logger
36157			08/04/2015 09:50 AM	Claudio Atzori	added dedup configuration to the entities merging process
36043			03/04/2015 06:41 PM	Alessia Bardi	We can use the most up-to-date version of mapping-utils here
36042			03/04/2015 06:39 PM	Alessia Bardi	Fixed scm and deploy.info
36041			03/04/2015 06:38 PM	Alessia Bardi	Distinguish publications from datasets when counting
35981			03/04/2015 11:32 AM	Claudio Atzori	added more detailed counter about entity sub-type
35975			03/04/2015 10:31 AM	Claudio Atzori	several improvements
35917			02/04/2015 12:19 PM	Alessia Bardi	Increment counter in case of no rows to keep track of records without body.
35900			01/04/2015 05:20 PM	Alessia Bardi	updated version to 0.0.6.3.1
35899			01/04/2015 05:17 PM	Alessia Bardi	including changes to catch and fail for any exception of r35769 of trunk
35898			01/04/2015 05:14 PM	Alessia Bardi	branch for code before the re-implementation of context and fundingpaths
35897			01/04/2015 05:00 PM	Alessia Bardi	raised version
35896			01/04/2015 04:55 PM	Alessia Bardi	commenting test with big doaj dataset
35771			30/03/2015 11:57 AM	Claudio Atzori	different escaping
35769			30/03/2015 11:46 AM	Claudio Atzori	trying to catch any kind of exception
35746			27/03/2015 05:23 PM	Alessia Bardi	Testing DOAj for #1222#note-4
35476			18/03/2015 06:47 PM	Claudio Atzori	added DedupSimilarityToActionsMapper and relative dependency
35452			18/03/2015 01:55 PM	Michele Artini	increased version in scripts
35451			18/03/2015 01:07 PM	Michele Artini	updated the version of a dependency
35442			18/03/2015 12:15 PM	Alessia Bardi	fundingtree is an escaped xml, not a json anymore.
35439			18/03/2015 12:01 PM	Michele Artini	increased a minor virsion
35196			09/03/2015 05:09 PM	Michele Artini	sample records
35179			09/03/2015 02:39 PM	Michele Artini	reimplemented the fundingpath and context generation
35135			05/03/2015 07:46 PM	Claudio Atzori	updated packages
35133			05/03/2015 07:44 PM	Claudio Atzori	updated packages, codestyle
35129			05/03/2015 07:38 PM	Claudio Atzori	codestyle
35128			05/03/2015 07:34 PM	Claudio Atzori	updated packages
35127			05/03/2015 07:31 PM	Claudio Atzori	OafMerger moved to mapping utils
34901			27/02/2015 05:39 PM	Claudio Atzori	temporary commit
34898			27/02/2015 05:37 PM	Claudio Atzori	offline dedup
34602			19/02/2015 04:17 PM	Claudio Atzori	added protobuf-java-format dependency
34600			19/02/2015 04:14 PM	Claudio Atzori	renamed test
34599			19/02/2015 04:07 PM	Claudio Atzori	added json size test
34536			16/02/2015 07:56 PM	Claudio Atzori	saving disk space, less logging
34454			11/02/2015 07:31 PM	Alessia Bardi	Updated configuration for testing
34439			11/02/2015 04:06 PM	Claudio Atzori	extended entities join configuration, added more tests
34438			11/02/2015 03:49 PM	Claudio Atzori	extended entities join configuration, added more tests
34387			10/02/2015 11:18 AM	Alessia Bardi	scripts using updated version 0.0.6.3
34386			10/02/2015 11:16 AM	Alessia Bardi	test record took from HDFS
34374			09/02/2015 06:44 PM	Alessia Bardi
34358			09/02/2015 12:05 PM	Alessia Bardi	discard persons in OAI feeding (#1107)
34225			03/02/2015 11:50 AM	Claudio Atzori	do not alter inferenceprovenance; codestyle
33811			09/01/2015 05:25 PM	Alessia Bardi	Using released hadoop parent.
33382			12/12/2014 06:07 PM	Claudio Atzori	added FCT fundings as contexts
33137			02/12/2014 04:33 PM	Claudio Atzori	merged branch ProtoMapping
32832			17/11/2014 04:41 PM	Claudio Atzori	imlemented retries
32714			13/11/2014 03:20 PM	Alessia Bardi	Added oaf:identifiers to record sample.
32328			07/11/2014 03:53 PM	Claudio Atzori	ignored iml file
32094			03/11/2014 05:21 PM	Claudio Atzori	updated tests
32008			31/10/2014 10:45 AM	Claudio Atzori	updated scripts
32007			31/10/2014 10:36 AM	Claudio Atzori	[maven-release-plugin] prepare for next development iteration
32006			31/10/2014 10:36 AM	Claudio Atzori	[maven-release-plugin] copy for tag dnet-mapreduce-jobs-0.0.5
32005			31/10/2014 10:36 AM	Claudio Atzori	[maven-release-plugin] prepare release dnet-mapreduce-jobs-0.0.5
32004			31/10/2014 10:35 AM	Claudio Atzori	removed extra scm tag
32003			31/10/2014 10:31 AM	Claudio Atzori	added scm
32002			31/10/2014 10:23 AM	Claudio Atzori	[maven-release-plugin] prepare release dnet-mapreduce-jobs-0.0.5
31998			31/10/2014 10:19 AM	Claudio Atzori	bumped version, updated parent: let's start to depend on releases
31997			31/10/2014 10:19 AM	Claudio Atzori	cleanup & tests
31409			16/10/2014 05:42 PM	Claudio Atzori	added default bestlicense value. Used when the records doesn't provide any
31208			08/10/2014 03:25 PM	Claudio Atzori	added more fields in test record
31186			07/10/2014 02:54 PM	Alessia Bardi	Moved counters from entity body to header.
30969			01/10/2014 02:22 PM	Claudio Atzori	- provenance information parsed from element "about" - namespace aware datacite mapping for oaf:language and oaf:dateaccepted - dedupBuildRoot doesn't write to WAL - removed unused claim_2_hbase.xsl - overall cleanup
30968			01/10/2014 02:15 PM	Claudio Atzori	added relationship/children counters
30967			01/10/2014 02:14 PM	Claudio Atzori	revised tests
30882			25/09/2014 05:42 PM	Alessia Bardi	Avoiding set '___' generated when we have "strange" set names such as those in cyrillic/ukrain. In those cases records are assigned to a default set, currently named "OTHER".
30863			25/09/2014 12:40 PM	Claudio Atzori	expanding provenanceaction classid
30835			23/09/2014 06:14 PM	Claudio Atzori	merge from branch newIndexFeed
30834			23/09/2014 06:14 PM	Claudio Atzori	merge from branch newIndexFeed
30833			23/09/2014 06:14 PM	Claudio Atzori	fixing #783 (note-18)
30827			23/09/2014 03:46 PM	Claudio Atzori	extraInfo removed from CDATA block, expanding provenance action in inferred elements
30751			18/09/2014 03:02 PM	Alessia Bardi	fixed dependencies
30350			17/09/2014 11:05 AM	Sandro La Bruzzo	created tag folder for release
30017			04/09/2014 04:08 PM	Claudio Atzori	removed CDATA from extraInfo payloads
30005			04/09/2014 11:44 AM	Claudio Atzori	using CloudSolrServer for parallel index feeding
29946			02/09/2014 05:23 PM	Claudio Atzori	added branch name
29905			29/08/2014 10:57 AM	Claudio Atzori	updated branch version and build scripts
29903			29/08/2014 10:50 AM	Claudio Atzori	update branch with contributes from trunk and other branches
29860			26/08/2014 11:14 AM	Claudio Atzori	changed properties passed to index feed m/r job
29858			26/08/2014 10:32 AM	Claudio Atzori	fixed pom and scripts
29856			25/08/2014 06:15 PM	Claudio Atzori	updated index feed job to make use of the new shared solr lib
29850			25/08/2014 05:31 PM	Claudio Atzori	branch to test the new index feeding libs
29734			31/07/2014 02:28 PM	Claudio Atzori	fixed blacklist type
29732			31/07/2014 12:48 PM	Claudio Atzori	more logging. fixed entity type check
29707			30/07/2014 04:15 PM	Claudio Atzori	more logging
29702			30/07/2014 03:38 PM	Claudio Atzori	defined limit to the maximum number of counters
29657			29/07/2014 03:24 PM	Claudio Atzori	defined limit to the maximum number of counters
29043			11/07/2014 06:54 PM	Claudio Atzori	added serialization, tests
29042			11/07/2014 06:53 PM	Claudio Atzori	instantiate one SAXReader for each call
29041			11/07/2014 06:52 PM	Claudio Atzori	fixed format-layout-interpretation concatenation, doesn't fail when the fieldExtractor returns a null result
29040			11/07/2014 06:51 PM	Claudio Atzori	added json serialization, builds the matching key one time only
29036			11/07/2014 04:30 PM	Alessia Bardi	do not upsert sets here in the mapper: we shall delegate to a separate workflow to be run after the OAI feeding is completed.
29034			11/07/2014 03:49 PM	Claudio Atzori	early implementation of jar upload script