Project

General

Profile

Statistics
| Revision:
Name Size Revision Age Author Comment
  eu 37875 about 9 years Marek Horst #1381 porting pmc citations ingestion from casc...

Latest revisions

# Date Author Comment
37875 19/06/2015 02:10 PM Marek Horst

#1381 porting pmc citations ingestion from cascading framework to pig. Moving code from icm-iis-ingest-pmc to icm-iis-transformers including itegration tests, removing obsolete scala code along with unneded dependencies. Switching subworkflow in primary workflow.

37813 16/06/2015 02:05 PM Marek Horst

#1370 making pmc ingestion integration tests run on dedicated test cluster istead of embedded mini-oozie container

37344 20/05/2015 06:49 PM Marek Horst

#1329 adding affiliations field in ExtractedDocumentMetadata PMC schema. Metadata extraction code refactoring by extracting code responsible for building Affiliation avro records to AffiliationBuilder class and sharing it with pmc ingestion. Implementing affiliations ingestion functionality in PmcXmlHandler covered with unit tests. Adding affiliations field support in ingest pmc metadata transformer.

33133 02/12/2014 02:54 PM Marek Horst

replacing non standard dash character to '-'

33131 02/12/2014 12:48 PM Marek Horst

replacing non standard dash character to '-'

33130 02/12/2014 10:42 AM Marek Horst

fixing test run on jenkins: seting encoding explicitly to utf8

32942 21/11/2014 05:50 PM Marek Horst

#1017 introducing new PMC metadata ingestion currently extracing references, journal and pages fields.
Replacing DOM/XPath based citations ingestion with much faster SAX version. Changing pmidtooaid transformer utilizing ExtractedDocumentMetadata instead of parsing XML file. Enabling PMC metadata ingestion in common/import.

31116 06/10/2014 01:18 PM Marek Horst

#757 introducing article type extraction along with unit test. Article type will be required for filtering out pmc duplicates and leaving only proper types

30986 01/10/2014 06:37 PM Marek Horst

#757 fixing pmid and doi matching, fixing sourceDocumentId and destinationDocumentId generation

30804 22/09/2014 08:25 AM Michal Oniszczuk

Commented out test in a stub of a solution to the task #576: Ingestion of metadata from EuropePMC.

View revisions

Also available in: Atom