Project

General

Profile

Statistics
| Revision:

# Date Author Comment
39165 10/09/2015 06:22 PM Marek Horst

merging trunk changes with IIS-CDH-5.3.0 branch

39055 05/09/2015 09:14 PM Marek Horst

#1498 adding missing extracted metadata fields

39025 04/09/2015 03:42 PM Mateusz Kobos

Removing old code related to Protocol Buffers

Long time ago, we used Protocol Buffers encapsulated in sequence files as the format for data stores. This cleanup is removing code related to this functionality.

37980 26/06/2015 07:46 PM Marek Horst

#1395 WorkflowRuntimeParameters static fields cleanup, moving parameters to dedicated modules to prevent excessing icm-iis-common module modifications

37924 22/06/2015 06:45 PM Marek Horst

fixing test after cermine upgrade

37884 19/06/2015 04:38 PM Marek Horst

merging trunk changes with IIS-CDH-5.3.0 branch

37715 11/06/2015 11:22 AM Marek Horst

fixing typo

37714 10/06/2015 09:02 PM Marek Horst

making integration tests run on dedicated test cluster istead of embedded mini-oozie container

37713 10/06/2015 08:05 PM Marek Horst

removing obsolete avrobased workflow

37665 08/06/2015 04:42 PM Marek Horst

reverting example-1.pdf file removal which seems to be required by CermineMetadataExtractionTest

37660 08/06/2015 03:41 PM Marek Horst

fixing cermine integration test, changing PDF contents

37659 08/06/2015 03:40 PM Marek Horst

fixing cermine integration test, changing PDF contents

37348 20/05/2015 07:00 PM Marek Horst

#1330 icm-iis-metadataextraction and icm-iis-ingest-pmc modules cermine dependency upgraded to recently released 1.6 version

37343 20/05/2015 06:49 PM Marek Horst

#1329 adding affiliations field in ExtractedDocumentMetadata PMC schema. Metadata extraction code refactoring by extracting code responsible for building Affiliation avro records to AffiliationBuilder class and sharing it with pmc ingestion. Implementing affiliations ingestion functionality in PmcXmlHandler covered with unit tests. Adding affiliations field support in ingest pmc metadata transformer.

37143 11/05/2015 06:56 PM Marek Horst

merging trunk changes with IIS-CDH-5.3.0 branch

36394 15/04/2015 05:17 PM Marek Horst

#1240 raising mapred.task.timeout to 3600000 (1h) just in case any extremely complex PDF document appear. All time consuming documents will be registered in failure sink.

36366 14/04/2015 12:54 PM Marek Horst

#1277 upgrading cermine dependency to most recent 1.5 release

36337 13/04/2015 01:33 PM Marek Horst

#1257 raising oozie.action.max.output.data to 8192

36289 09/04/2015 07:10 PM Marek Horst

#1257 dropping schema generation related hacks in all map-reduce modules, switching to literal schema parameters

35986 03/04/2015 01:30 PM Marek Horst

#1248 persisting content url in supplementaryData to make it easier to find content causing failure

35937 02/04/2015 04:15 PM Marek Horst

#1248 bugfix renaming inputEntityId to inputObjectId after schema changes

35936 02/04/2015 04:01 PM Marek Horst

#1248 introducing failures sink datastore support in metadata extraction module

35832 30/03/2015 07:17 PM Marek Horst

#1240 extending mapred.task.timeout for metadata extraction to 30 minutes

35710 27/03/2015 09:45 AM Marek Horst

#1135 switching icm-iis-parent-container version to 1.0.1-SNAPSHOT in order to include workingDir related changes made in icm-iis-core

35701 27/03/2015 06:18 AM Mateusz Kobos

Removing usage of working_dir from Java workflow node.

35624 25/03/2015 11:52 AM Marek Horst

#1068 updating expected test output

35559 23/03/2015 05:50 PM Marek Horst

#1068 upgrading cermine version from 1.4 to 1.5-SNAPSHOT providing proper fix, propagating change to branch

35558 23/03/2015 05:48 PM Marek Horst

#1068 upgrading cermine version from 1.4 to 1.5-SNAPSHOT providing proper fix

35407 17/03/2015 03:04 PM Marek Horst

#1198 aligning IIS dependencies and java code to CDH5.3.0 cluster

35393 17/03/2015 03:01 PM Marek Horst

#1197 introducing job.properties changes aligning paths to rumcajs cluster HDFS structure

35255 11/03/2015 04:51 PM Marek Horst

creating IIS-CDH-5.3.0 branch

35253 11/03/2015 04:50 PM Marek Horst

creating IIS-CDH-5.3.0 branch

34698 20/02/2015 07:17 PM Marek Horst

#1133 dropping useless workfing_dir creation for java nodes

34622 19/02/2015 06:12 PM Marek Horst

#1038 introducing ranges in dependencies definition for all IIS modules

33727 30/12/2014 02:41 PM Marek Horst

[maven-release-plugin] prepare for next development iteration

33726 30/12/2014 02:41 PM Marek Horst

[maven-release-plugin] copy for tag icm-iis-metadataextraction-1.0.0

33725 30/12/2014 02:41 PM Marek Horst

[maven-release-plugin] prepare release icm-iis-metadataextraction-1.0.0

33693 21/12/2014 01:51 AM Dominika Tkaczyk

expected results updated

33673 18/12/2014 03:35 PM Marek Horst

#1044 upgrading dependencies to released versions and parent version to most recent snapshot for unreleased modules

33621 17/12/2014 12:33 PM Marek Horst

#1044 upgrading dependencies to released versions and parent version to most recent snapshot for unreleased modules

33415 15/12/2014 12:46 PM Marek Horst

introducing scm definition

33369 12/12/2014 04:08 PM Marek Horst

#953 extending maximum heap size from 2048 to 4096 after Dominika introduced iText dependency upgrade to 5.5.3 in cermine. This combination should minimize the possibility of failures caused by fatal OOMErr.

32243 05/11/2014 05:33 PM Marek Horst

introducing embedded integration test entry

31845 28/10/2014 03:31 PM Marek Horst

#913 renaming DocumentContentUrl#contentSize to DocumentContentUrl#contentSizeKB changing field type from int to long, importing content size from ObjectStoreFile#fileSizeKB, updating dnet-objectstore-rmi dependency from 1.0.0 to 2.0.1-SNAPSHOT

31782 28/10/2014 11:35 AM Marek Horst

#913 reading content size from newly introduced DocumentContentUrl#contentSize field not from URLConnection where size is not available when Transfer-Encoding=chunked, setting to 0 when size is not available

31757 27/10/2014 06:11 PM Marek Horst

#913 introducing support for max file size parameter, currently checked against Content-Lenght header

31703 24/10/2014 07:33 PM Marek Horst

upgrading cermine version after recent release from 1.3-SNAPSHOT to 1.4-SNAPSHOT

30887 25/09/2014 09:50 PM Dominika Tkaczyk

affiliation's address and country code passed from Cermine to Avro

30878 25/09/2014 05:07 PM Marek Horst

fixing converter code after recent Affiliation.avdl refactoring and adding countryCode field, renaming contry to countryName

30877 25/09/2014 05:07 PM Marek Horst

fixing test code after recent Affiliation.avdl refactoring and adding countryCode field, renaming contry to countryName

30803 21/09/2014 05:54 PM Dominika Tkaczyk

expected output corrected

30789 19/09/2014 02:00 PM Dominika Tkaczyk

expected output updated

30420 17/09/2014 11:06 AM Sandro La Bruzzo

created tag folder for release

29883 27/08/2014 12:48 PM Dominika Tkaczyk

expected record updated

29839 25/08/2014 02:14 PM Dominika Tkaczyk

convertBibEntry() made public

29646 29/07/2014 11:53 AM Dominika Tkaczyk

expected test corrected

28805 02/07/2014 11:48 AM Marek Horst

setting excluded_ids to undefined value

28771 01/07/2014 05:05 PM Marek Horst

introducing deploy.info file for module icm-iis-metadataextraction

28769 01/07/2014 05:04 PM Marek Horst

introducing deploy.info file for module icm-iis-metadataextraction