dnet45dhp-schemasdnet-hadoopdnet40dnet50
updating job.properties
#1017 accepting ExtractedDocumentMetadata instead of DocumentText at PMC citation ingestion input. Aliging integration test and importer workflow.
#1017 introducing new PMC metadata ingestion currently extracing references, journal and pages fields.Replacing DOM/XPath based citations ingestion with much faster SAX version. Changing pmidtooaid transformer utilizing ExtractedDocumentMetadata instead of parsing XML file. Enabling PMC metadata ingestion in common/import.
#840 moving IdentifierMapping from importer to common package
#840 renaming DeduplicationMapping to more generic IdentifierMapping
#757 adding reducing phase for filtering out pmids by article type, mapping phase groups PmidMapping objects by pmid and at reducer phase duplicates will be filtered out
#757 fixing pmid and doi matching, fixing sourceDocumentId and destinationDocumentId generation
View revisions
Also available in: Atom