renaming test resources to be compliant with windows file system naming requirements
#1498 introducing major citations related refactoring including new generic direct citation matching moved to processing phase, introduced position field in all citations schemas and updated collapser taking position into account when merging citations details coming from 3 variuos sources: fuzzy citationmatching, direct citationmatching, references metadata
#1381 porting pmc citations ingestion from cascading framework to pig. Moving code from icm-iis-ingest-pmc to icm-iis-transformers including itegration tests, removing obsolete scala code along with unneded dependencies. Switching subworkflow in primary workflow.
#1370 making pmc ingestion integration tests run on dedicated test cluster istead of embedded mini-oozie container
#1329 adding affiliations field in ExtractedDocumentMetadata PMC schema. Metadata extraction code refactoring by extracting code responsible for building Affiliation avro records to AffiliationBuilder class and sharing it with pmc ingestion. Implementing affiliations ingestion functionality in PmcXmlHandler covered with unit tests. Adding affiliations field support in ingest pmc metadata transformer.
Removing usage of working_dir from Java workflow node.
#1133 dropping useless workfing_dir creation for java nodes
replacing non standard dash character to '-'
fixing test run on jenkins: seting encoding explicitly to utf8
#1017 fixing expected citations
#1017 accepting ExtractedDocumentMetadata instead of DocumentText at PMC citation ingestion input. Aliging integration test and importer workflow.
#1017 introducing new PMC metadata ingestion currently extracing references, journal and pages fields.Replacing DOM/XPath based citations ingestion with much faster SAX version. Changing pmidtooaid transformer utilizing ExtractedDocumentMetadata instead of parsing XML file. Enabling PMC metadata ingestion in common/import.
#840 moving IdentifierMapping from importer to common package
#840 renaming DeduplicationMapping to more generic IdentifierMapping
#757 introducing article type extraction along with unit test. Article type will be required for filtering out pmc duplicates and leaving only proper types
fixing sourceDocumentId which is now extracted from input DocumentText record conveying NLM
#757 fixing pmc citation matching test by providing proper input
#757 fixing pmid and doi matching, fixing sourceDocumentId and destinationDocumentId generation
Commented out test in a stub of a solution to the task #576: Ingestion of metadata from EuropePMC.
Stub of a solution to the task #576: Ingestion of metadata from EuropePMC.
dir names in parameters should not contain nameNode
rename a field