dnet45dhp-schemasdnet-hadoopdnet40dnet50
#1133 dropping useless workfing_dir creation for java nodes
replacing non standard dash character to '-'
#1017 fixing expected citations
#1017 accepting ExtractedDocumentMetadata instead of DocumentText at PMC citation ingestion input. Aliging integration test and importer workflow.
#1017 introducing new PMC metadata ingestion currently extracing references, journal and pages fields.Replacing DOM/XPath based citations ingestion with much faster SAX version. Changing pmidtooaid transformer utilizing ExtractedDocumentMetadata instead of parsing XML file. Enabling PMC metadata ingestion in common/import.
#840 moving IdentifierMapping from importer to common package
#840 renaming DeduplicationMapping to more generic IdentifierMapping
#757 introducing article type extraction along with unit test. Article type will be required for filtering out pmc duplicates and leaving only proper types
fixing sourceDocumentId which is now extracted from input DocumentText record conveying NLM
#757 fixing pmc citation matching test by providing proper input
View revisions
Also available in: Atom