merging trunk changes with IIS-CDH-5.3.0 branch
renaming test resources to be compliant with windows file system naming requirements
#1498 introducing major citations related refactoring including new generic direct citation matching moved to processing phase, introduced position field in all citations schemas and updated collapser taking position into account when merging citations details coming from 3 variuos sources: fuzzy citationmatching, direct citationmatching, references metadata
#1435 making PMC XML parser less strict in terms of expected input elements or attributes: article-type is set to 'unknown' value when attribute not defined in XML main element
#1431 fixing PMC XML records parser disallowing null reference type, reference value will be simply omitted
#1383 replacing explicitly defined test cluster properties with init-test-cluster-config maven profile usage
#1381 reintroducing yadda repository required by cermine
#1381 porting pmc citations ingestion from cascading framework to pig. Moving code from icm-iis-ingest-pmc to icm-iis-transformers including itegration tests, removing obsolete scala code along with unneded dependencies. Switching subworkflow in primary workflow.
updating job.properties
#1370 making pmc ingestion integration tests run on dedicated test cluster istead of embedded mini-oozie container
#1329 setting affiliation string as raw text if parser produced empty Element object
#1330 icm-iis-metadataextraction and icm-iis-ingest-pmc modules cermine dependency upgraded to recently released 1.6 version
#1329 adding affiliations field in ExtractedDocumentMetadata PMC schema. Metadata extraction code refactoring by extracting code responsible for building Affiliation avro records to AffiliationBuilder class and sharing it with pmc ingestion. Implementing affiliations ingestion functionality in PmcXmlHandler covered with unit tests. Adding affiliations field support in ingest pmc metadata transformer.
excluding hadoop-client cdh4 atrifact possibly incompatible with cdh5 environment
#1257 raising oozie.action.max.output.data to 8192
#1257 dropping schema generation related hacks in all map-reduce modules, switching to literal schema parameters
#1135 switching icm-iis-parent-container version to 1.0.1-SNAPSHOT in order to include workingDir related changes made in icm-iis-core
Removing usage of working_dir from Java workflow node.
#1198 aligning IIS dependencies and java code to CDH5.3.0 cluster
#1197 introducing job.properties changes aligning paths to rumcajs cluster HDFS structure
creating IIS-CDH-5.3.0 branch
introducing branches folder
#1133 dropping useless workfing_dir creation for java nodes
#1038 introducing ranges in dependencies definition for all IIS modules
[maven-release-plugin] prepare for next development iteration
[maven-release-plugin] copy for tag icm-iis-ingest-pmc-1.0.0
[maven-release-plugin] prepare release icm-iis-ingest-pmc-1.0.0
#1044 pre-release switching to released version of parent pom and released dependencies
introducing scm definition
#1038 changing ceon-scala-commons 0.0.2-SNAPSHOT dependency to released 0..0.2
#1038 dependency cleanup: removing obsolete dnet-openaireplus-mapping-utils dependency
replacing non standard dash character to '-'
fixing test run on jenkins: seting encoding explicitly to utf8
#1017 fixing expected citations
#1017 fixing PMC and DOI identifiers retrieval from avro map: addressing by Utf8 objects not by String
#1017 accepting ExtractedDocumentMetadata instead of DocumentText at PMC citation ingestion input. Aliging integration test and importer workflow.
#1017 introducing new PMC metadata ingestion currently extracing references, journal and pages fields.Replacing DOM/XPath based citations ingestion with much faster SAX version. Changing pmidtooaid transformer utilizing ExtractedDocumentMetadata instead of parsing XML file. Enabling PMC metadata ingestion in common/import.
#955 fixing reference raw text generation for pretty printed NLM documents
introducing embedded integration test entry
#840 renaming DeduplicationMapping to more generic IdentifierMapping
#840 moving IdentifierMapping from importer to common package
#757 adding reducing phase for filtering out pmids by article type, mapping phase groups PmidMapping objects by pmid and at reducer phase duplicates will be filtered out
#757 introducing article type extraction along with unit test. Article type will be required for filtering out pmc duplicates and leaving only proper types
introducing cloudera repository in parent container, removing repository definitions from individual IIS modules
fixing sourceDocumentId which is now extracted from input DocumentText record conveying NLM
#757 fixing pmc citation matching test by providing proper input
#757 fixing pmid and doi matching, fixing sourceDocumentId and destinationDocumentId generation
Commented out test in a stub of a solution to the task #576: Ingestion of metadata from EuropePMC.
Stub of a solution to the task #576: Ingestion of metadata from EuropePMC.
Refactored code to use the XPathEvaluator.fromString method.
created tag folder for release
updating default job properties
renaming workflow to ingest_pmc_plaintext
Excluding conflicting dependency
replacing "result" string with Type.result.name()
dir names in parameters should not contain nameNode
rename a field
introducing deploy.info file for module icm-iis-ingest-pmc