dnet45dhp-schemasdnet-hadoopdnet40dnet50
renaming test resources to be compliant with windows file system naming requirements
#1381 porting pmc citations ingestion from cascading framework to pig. Moving code from icm-iis-ingest-pmc to icm-iis-transformers including itegration tests, removing obsolete scala code along with unneded dependencies. Switching subworkflow in primary workflow.
#1370 making pmc ingestion integration tests run on dedicated test cluster istead of embedded mini-oozie container
#1329 adding affiliations field in ExtractedDocumentMetadata PMC schema. Metadata extraction code refactoring by extracting code responsible for building Affiliation avro records to AffiliationBuilder class and sharing it with pmc ingestion. Implementing affiliations ingestion functionality in PmcXmlHandler covered with unit tests. Adding affiliations field support in ingest pmc metadata transformer.
replacing non standard dash character to '-'
fixing test run on jenkins: seting encoding explicitly to utf8
#1017 introducing new PMC metadata ingestion currently extracing references, journal and pages fields.Replacing DOM/XPath based citations ingestion with much faster SAX version. Changing pmidtooaid transformer utilizing ExtractedDocumentMetadata instead of parsing XML file. Enabling PMC metadata ingestion in common/import.
#757 introducing article type extraction along with unit test. Article type will be required for filtering out pmc duplicates and leaving only proper types
#757 fixing pmid and doi matching, fixing sourceDocumentId and destinationDocumentId generation
View revisions
Also available in: Atom