Project

General

Profile

Statistics
| Revision:

# Date Author Comment
39163 10/09/2015 06:13 PM Marek Horst

merging trunk changes with IIS-CDH-5.3.0 branch

39086 08/09/2015 01:43 PM Marek Horst

renaming test resources to be compliant with windows file system naming requirements

39045 04/09/2015 11:26 PM Marek Horst

#1498 introducing major citations related refactoring including new generic direct citation matching moved to processing phase, introduced position field in all citations schemas and updated collapser taking position into account when merging citations details coming from 3 variuos sources: fuzzy citationmatching, direct citationmatching, references metadata

38183 13/07/2015 01:37 PM Marek Horst

#1435 making PMC XML parser less strict in terms of expected input elements or attributes: article-type is set to 'unknown' value when attribute not defined in XML main element

38172 13/07/2015 11:30 AM Marek Horst

#1431 fixing PMC XML records parser disallowing null reference type, reference value will be simply omitted

37914 22/06/2015 03:13 PM Marek Horst

#1383 replacing explicitly defined test cluster properties with init-test-cluster-config maven profile usage

37881 19/06/2015 04:10 PM Marek Horst

merging trunk changes with IIS-CDH-5.3.0 branch

37876 19/06/2015 02:45 PM Marek Horst

#1381 reintroducing yadda repository required by cermine

37875 19/06/2015 02:10 PM Marek Horst

#1381 porting pmc citations ingestion from cascading framework to pig. Moving code from icm-iis-ingest-pmc to icm-iis-transformers including itegration tests, removing obsolete scala code along with unneded dependencies. Switching subworkflow in primary workflow.

37830 17/06/2015 09:57 PM Marek Horst

updating job.properties

37813 16/06/2015 02:05 PM Marek Horst

#1370 making pmc ingestion integration tests run on dedicated test cluster istead of embedded mini-oozie container

37356 21/05/2015 12:35 PM Marek Horst

#1329 setting affiliation string as raw text if parser produced empty Element object

37349 20/05/2015 07:00 PM Marek Horst

#1330 icm-iis-metadataextraction and icm-iis-ingest-pmc modules cermine dependency upgraded to recently released 1.6 version

37344 20/05/2015 06:49 PM Marek Horst

#1329 adding affiliations field in ExtractedDocumentMetadata PMC schema. Metadata extraction code refactoring by extracting code responsible for building Affiliation avro records to AffiliationBuilder class and sharing it with pmc ingestion. Implementing affiliations ingestion functionality in PmcXmlHandler covered with unit tests. Adding affiliations field support in ingest pmc metadata transformer.

37186 12/05/2015 07:14 PM Marek Horst

excluding hadoop-client cdh4 atrifact possibly incompatible with cdh5 environment

37117 11/05/2015 02:58 PM Marek Horst

merging trunk changes with IIS-CDH-5.3.0 branch

36339 13/04/2015 01:34 PM Marek Horst

#1257 raising oozie.action.max.output.data to 8192

36291 09/04/2015 07:10 PM Marek Horst

#1257 dropping schema generation related hacks in all map-reduce modules, switching to literal schema parameters

35709 27/03/2015 09:44 AM Marek Horst

#1135 switching icm-iis-parent-container version to 1.0.1-SNAPSHOT in order to include workingDir related changes made in icm-iis-core

35701 27/03/2015 06:18 AM Mateusz Kobos

Removing usage of working_dir from Java workflow node.

35416 17/03/2015 03:04 PM Marek Horst

#1198 aligning IIS dependencies and java code to CDH5.3.0 cluster

35402 17/03/2015 03:01 PM Marek Horst

#1197 introducing job.properties changes aligning paths to rumcajs cluster HDFS structure

35259 11/03/2015 04:53 PM Marek Horst

creating IIS-CDH-5.3.0 branch

35258 11/03/2015 04:52 PM Marek Horst

introducing branches folder

34945 02/03/2015 01:18 PM Marek Horst

updating job.properties

34693 20/02/2015 07:16 PM Marek Horst

#1133 dropping useless workfing_dir creation for java nodes

34615 19/02/2015 06:12 PM Marek Horst

#1038 introducing ranges in dependencies definition for all IIS modules

33593 16/12/2014 05:15 PM Marek Horst

[maven-release-plugin] prepare for next development iteration

33592 16/12/2014 05:14 PM Marek Horst

[maven-release-plugin] copy for tag icm-iis-ingest-pmc-1.0.0

33591 16/12/2014 05:14 PM Marek Horst

[maven-release-plugin] prepare release icm-iis-ingest-pmc-1.0.0

33590 16/12/2014 05:09 PM Marek Horst

#1044 pre-release switching to released version of parent pom and released dependencies

33413 15/12/2014 12:45 PM Marek Horst

introducing scm definition

33370 12/12/2014 04:19 PM Marek Horst

#1038 changing ceon-scala-commons 0.0.2-SNAPSHOT dependency to released 0..0.2

33367 12/12/2014 03:32 PM Marek Horst

#1038 dependency cleanup: removing obsolete dnet-openaireplus-mapping-utils dependency

33133 02/12/2014 02:54 PM Marek Horst

replacing non standard dash character to '-'

33131 02/12/2014 12:48 PM Marek Horst

replacing non standard dash character to '-'

33130 02/12/2014 10:42 AM Marek Horst

fixing test run on jenkins: seting encoding explicitly to utf8

33125 01/12/2014 09:06 PM Marek Horst

#1017 fixing expected citations

33123 01/12/2014 07:40 PM Marek Horst

#1017 fixing PMC and DOI identifiers retrieval from avro map: addressing by Utf8 objects not by String

33104 28/11/2014 06:13 PM Marek Horst

#1017 accepting ExtractedDocumentMetadata instead of DocumentText at PMC citation ingestion input. Aliging integration test and importer workflow.

32942 21/11/2014 05:50 PM Marek Horst

#1017 introducing new PMC metadata ingestion currently extracing references, journal and pages fields.
Replacing DOM/XPath based citations ingestion with much faster SAX version. Changing pmidtooaid transformer utilizing ExtractedDocumentMetadata instead of parsing XML file. Enabling PMC metadata ingestion in common/import.

32324 07/11/2014 02:57 PM Marek Horst

#955 fixing reference raw text generation for pretty printed NLM documents

32242 05/11/2014 05:32 PM Marek Horst

introducing embedded integration test entry

31234 08/10/2014 07:45 PM Marek Horst

#840 renaming DeduplicationMapping to more generic IdentifierMapping

31225 08/10/2014 06:19 PM Marek Horst

#840 moving IdentifierMapping from importer to common package

31218 08/10/2014 06:12 PM Marek Horst

#840 renaming DeduplicationMapping to more generic IdentifierMapping

31117 06/10/2014 01:20 PM Marek Horst

#757 adding reducing phase for filtering out pmids by article type, mapping phase groups PmidMapping objects by pmid and at reducer phase duplicates will be filtered out

31116 06/10/2014 01:18 PM Marek Horst

#757 introducing article type extraction along with unit test. Article type will be required for filtering out pmc duplicates and leaving only proper types

31035 02/10/2014 02:29 PM Marek Horst

introducing cloudera repository in parent container, removing repository definitions from individual IIS modules

31031 02/10/2014 01:44 PM Marek Horst

fixing sourceDocumentId which is now extracted from input DocumentText record conveying NLM

31023 02/10/2014 01:08 PM Marek Horst

#757 fixing pmc citation matching test by providing proper input

31022 02/10/2014 01:08 PM Marek Horst

#757 fixing pmc citation matching test by providing proper input

30987 01/10/2014 06:38 PM Marek Horst

#757 fixing pmid and doi matching, fixing sourceDocumentId and destinationDocumentId generation

30986 01/10/2014 06:37 PM Marek Horst

#757 fixing pmid and doi matching, fixing sourceDocumentId and destinationDocumentId generation

30804 22/09/2014 08:25 AM Michal Oniszczuk

Commented out test in a stub of a solution to the task #576: Ingestion of metadata from EuropePMC.

30802 20/09/2014 02:19 PM Michal Oniszczuk

Stub of a solution to the task #576: Ingestion of metadata from EuropePMC.

30801 20/09/2014 02:18 PM Michal Oniszczuk

Refactored code to use the XPathEvaluator.fromString method.

30418 17/09/2014 11:06 AM Sandro La Bruzzo

created tag folder for release

30145 12/09/2014 03:16 PM Marek Horst

updating default job properties

29631 28/07/2014 09:45 PM Marek Horst

renaming workflow to ingest_pmc_plaintext

29390 21/07/2014 11:54 AM Mateusz Kobos

Excluding conflicting dependency

29097 14/07/2014 03:58 PM Marek Horst

replacing "result" string with Type.result.name()

28990 10/07/2014 04:15 PM Marek Horst

updating job.properties

28973 09/07/2014 05:55 PM mateusz.fedoryszak

dir names in parameters should not contain nameNode

28931 07/07/2014 05:52 PM mateusz.fedoryszak

rename a field

28768 01/07/2014 05:04 PM Marek Horst

introducing deploy.info file for module icm-iis-ingest-pmc