#1498 introducing major citations related refactoring including new generic direct citation matching moved to processing phase, introduced position field in all citations schemas and updated collapser taking position into account when merging citations details coming from 3 variuos sources: fuzzy citationmatching, direct citationmatching, references metadata
#1381 porting pmc citations ingestion from cascading framework to pig. Moving code from icm-iis-ingest-pmc to icm-iis-transformers including itegration tests, removing obsolete scala code along with unneded dependencies. Switching subworkflow in primary workflow.
updating job.properties
#1257 dropping schema generation related hacks in all map-reduce modules, switching to literal schema parameters
#1017 accepting ExtractedDocumentMetadata instead of DocumentText at PMC citation ingestion input. Aliging integration test and importer workflow.
#1017 introducing new PMC metadata ingestion currently extracing references, journal and pages fields.Replacing DOM/XPath based citations ingestion with much faster SAX version. Changing pmidtooaid transformer utilizing ExtractedDocumentMetadata instead of parsing XML file. Enabling PMC metadata ingestion in common/import.
#840 moving IdentifierMapping from importer to common package
#840 renaming DeduplicationMapping to more generic IdentifierMapping
#757 adding reducing phase for filtering out pmids by article type, mapping phase groups PmidMapping objects by pmid and at reducer phase duplicates will be filtered out
#757 fixing pmid and doi matching, fixing sourceDocumentId and destinationDocumentId generation
updating default job properties
renaming workflow to ingest_pmc_plaintext
dir names in parameters should not contain nameNode