Project

General

Profile

Statistics
| Revision:

# Date Author Comment
39045 04/09/2015 11:26 PM Marek Horst

#1498 introducing major citations related refactoring including new generic direct citation matching moved to processing phase, introduced position field in all citations schemas and updated collapser taking position into account when merging citations details coming from 3 variuos sources: fuzzy citationmatching, direct citationmatching, references metadata

37875 19/06/2015 02:10 PM Marek Horst

#1381 porting pmc citations ingestion from cascading framework to pig. Moving code from icm-iis-ingest-pmc to icm-iis-transformers including itegration tests, removing obsolete scala code along with unneded dependencies. Switching subworkflow in primary workflow.

37830 17/06/2015 09:57 PM Marek Horst

updating job.properties

36291 09/04/2015 07:10 PM Marek Horst

#1257 dropping schema generation related hacks in all map-reduce modules, switching to literal schema parameters

34945 02/03/2015 01:18 PM Marek Horst

updating job.properties

33104 28/11/2014 06:13 PM Marek Horst

#1017 accepting ExtractedDocumentMetadata instead of DocumentText at PMC citation ingestion input. Aliging integration test and importer workflow.

32942 21/11/2014 05:50 PM Marek Horst

#1017 introducing new PMC metadata ingestion currently extracing references, journal and pages fields.
Replacing DOM/XPath based citations ingestion with much faster SAX version. Changing pmidtooaid transformer utilizing ExtractedDocumentMetadata instead of parsing XML file. Enabling PMC metadata ingestion in common/import.

31225 08/10/2014 06:19 PM Marek Horst

#840 moving IdentifierMapping from importer to common package

31218 08/10/2014 06:12 PM Marek Horst

#840 renaming DeduplicationMapping to more generic IdentifierMapping

31117 06/10/2014 01:20 PM Marek Horst

#757 adding reducing phase for filtering out pmids by article type, mapping phase groups PmidMapping objects by pmid and at reducer phase duplicates will be filtered out

30987 01/10/2014 06:38 PM Marek Horst

#757 fixing pmid and doi matching, fixing sourceDocumentId and destinationDocumentId generation

30986 01/10/2014 06:37 PM Marek Horst

#757 fixing pmid and doi matching, fixing sourceDocumentId and destinationDocumentId generation

30145 12/09/2014 03:16 PM Marek Horst

updating default job properties

29631 28/07/2014 09:45 PM Marek Horst

renaming workflow to ingest_pmc_plaintext

28990 10/07/2014 04:15 PM Marek Horst

updating job.properties

28973 09/07/2014 05:55 PM mateusz.fedoryszak

dir names in parameters should not contain nameNode