merging trunk changes with IIS-CDH-5.3.0 branch
#1498 adding missing extracted metadata fields
Removing old code related to Protocol Buffers
Long time ago, we used Protocol Buffers encapsulated in sequence files as the format for data stores. This cleanup is removing code related to this functionality.
#1395 WorkflowRuntimeParameters static fields cleanup, moving parameters to dedicated modules to prevent excessing icm-iis-common module modifications
fixing test after cermine upgrade
fixing typo
making integration tests run on dedicated test cluster istead of embedded mini-oozie container
removing obsolete avrobased workflow
reverting example-1.pdf file removal which seems to be required by CermineMetadataExtractionTest
fixing cermine integration test, changing PDF contents
#1330 icm-iis-metadataextraction and icm-iis-ingest-pmc modules cermine dependency upgraded to recently released 1.6 version
#1329 adding affiliations field in ExtractedDocumentMetadata PMC schema. Metadata extraction code refactoring by extracting code responsible for building Affiliation avro records to AffiliationBuilder class and sharing it with pmc ingestion. Implementing affiliations ingestion functionality in PmcXmlHandler covered with unit tests. Adding affiliations field support in ingest pmc metadata transformer.
#1240 raising mapred.task.timeout to 3600000 (1h) just in case any extremely complex PDF document appear. All time consuming documents will be registered in failure sink.
#1277 upgrading cermine dependency to most recent 1.5 release
#1257 raising oozie.action.max.output.data to 8192
#1257 dropping schema generation related hacks in all map-reduce modules, switching to literal schema parameters
#1248 persisting content url in supplementaryData to make it easier to find content causing failure
#1248 bugfix renaming inputEntityId to inputObjectId after schema changes
#1248 introducing failures sink datastore support in metadata extraction module
#1240 extending mapred.task.timeout for metadata extraction to 30 minutes
#1135 switching icm-iis-parent-container version to 1.0.1-SNAPSHOT in order to include workingDir related changes made in icm-iis-core
Removing usage of working_dir from Java workflow node.
#1068 updating expected test output
#1068 upgrading cermine version from 1.4 to 1.5-SNAPSHOT providing proper fix, propagating change to branch
#1068 upgrading cermine version from 1.4 to 1.5-SNAPSHOT providing proper fix
#1198 aligning IIS dependencies and java code to CDH5.3.0 cluster
#1197 introducing job.properties changes aligning paths to rumcajs cluster HDFS structure
creating IIS-CDH-5.3.0 branch
#1133 dropping useless workfing_dir creation for java nodes
#1038 introducing ranges in dependencies definition for all IIS modules
[maven-release-plugin] prepare for next development iteration
[maven-release-plugin] copy for tag icm-iis-metadataextraction-1.0.0
[maven-release-plugin] prepare release icm-iis-metadataextraction-1.0.0
expected results updated
#1044 upgrading dependencies to released versions and parent version to most recent snapshot for unreleased modules
introducing scm definition
#953 extending maximum heap size from 2048 to 4096 after Dominika introduced iText dependency upgrade to 5.5.3 in cermine. This combination should minimize the possibility of failures caused by fatal OOMErr.
introducing embedded integration test entry
#913 renaming DocumentContentUrl#contentSize to DocumentContentUrl#contentSizeKB changing field type from int to long, importing content size from ObjectStoreFile#fileSizeKB, updating dnet-objectstore-rmi dependency from 1.0.0 to 2.0.1-SNAPSHOT
#913 reading content size from newly introduced DocumentContentUrl#contentSize field not from URLConnection where size is not available when Transfer-Encoding=chunked, setting to 0 when size is not available
#913 introducing support for max file size parameter, currently checked against Content-Lenght header
upgrading cermine version after recent release from 1.3-SNAPSHOT to 1.4-SNAPSHOT
affiliation's address and country code passed from Cermine to Avro
fixing converter code after recent Affiliation.avdl refactoring and adding countryCode field, renaming contry to countryName
fixing test code after recent Affiliation.avdl refactoring and adding countryCode field, renaming contry to countryName
expected output corrected
expected output updated
created tag folder for release
expected record updated
convertBibEntry() made public
expected test corrected
setting excluded_ids to undefined value
introducing deploy.info file for module icm-iis-metadataextraction