Project

General

Profile

  • svn:mime-type: text/plain

# Date Author Comment
39164 10/09/2015 06:19 PM Marek Horst

merging trunk changes with IIS-CDH-5.3.0 branch

37892 19/06/2015 06:20 PM Marek Horst

merging trunk changes with IIS-CDH-5.3.0 branch

37134 11/05/2015 05:28 PM Marek Horst

merging trunk changes with IIS-CDH-5.3.0 branch

35244 11/03/2015 04:43 PM Marek Horst

creating IIS-CDH-5.3.0 branch

35229 11/03/2015 01:14 PM Marek Horst

#1195 removing obsolete ports docreation and datasetid from hbase mapred import, removing references to those ports in workflow.xml files, updating transformer by removing filtering by datasetid due to decisions made in #1072

35048 04/03/2015 04:44 PM Marek Horst

#1176 introducing side products removal in common import by maintaining remove_sideproducts flag set to true by default.
Notice: do not provide any output directory location pointing to workingDir subdirectory!

34914 27/02/2015 07:34 PM Marek Horst

#1147 introducing HTML import and HTML plaintext ingestion in main workflows: primary and preprocessing

34702 20/02/2015 07:17 PM Marek Horst

#1133 dropping useless workfing_dir creation for java nodes

34212 02/02/2015 06:21 PM Marek Horst

#1070 introducing support for multiple context identifiers, replacing import_project_concepts_context_id IIS input parameter with import_project_concepts_context_ids_csv

33228 09/12/2014 11:02 AM Marek Horst

#1022 introducing PMC extracted document metadata collapser removing duplicates before sending output to PMC citation ingestion module

33184 04/12/2014 04:09 PM Marek Horst

#919 enabling concepts matching for FET projects in mainworkflows: import, export, primary and preprocessing

33105 28/11/2014 06:13 PM Marek Horst

#1017 accepting ExtractedDocumentMetadata instead of DocumentText at PMC citation ingestion input. Aliging integration test and importer workflow.

33098 28/11/2014 04:27 PM Marek Horst

#1022 introducing extracted document metadata collapser at importing phase.
Propagating extracted document mentadata (including PMC ingested metadata) to processing part of workflow what can be exploited by citation matching module.
Introducing citations collapser in last stage of processing phase collapsing ingested citations with matched citations.

32943 21/11/2014 05:50 PM Marek Horst

#1017 introducing new PMC metadata ingestion currently extracing references, journal and pages fields.
Replacing DOM/XPath based citations ingestion with much faster SAX version. Changing pmidtooaid transformer utilizing ExtractedDocumentMetadata instead of parsing XML file. Enabling PMC metadata ingestion in common/import.

32829 17/11/2014 03:45 PM Marek Horst

#963 propagating dataset -> mdstore from import to exporting phase: importer produces DocumentToMDStore datasetore utilized by exporter module. Updating transformer definition to handle DocumentToMDStore instead of Identifier schema

32042 31/10/2014 02:45 PM Marek Horst

introducing support for active_existence_filter, set to true by default. Setting this parameter to false allows processing contents not having its counterpart among metadata records retrieved from HBase. This solution was required to e.g. process ubiquity contents which were not present in HBase dump metadata.

31759 27/10/2014 06:20 PM Marek Horst

renaming metadataextraction_excluded_ids to more appropriate metadataextraction_excluded_checksums

31758 27/10/2014 06:11 PM Marek Horst

#913 introducing support for max file size parameter, currently checked against Content-Lenght header

31498 20/10/2014 06:03 PM Marek Horst

#757 hooking up ingest_pmc_idmapping_pmidtooaid subworkflow with mainworkflows/common/import. From now on citations are matched by pmid as well.

31267 10/10/2014 03:37 PM Marek Horst

introducing merge_body_with_updates flag support in common/import, setting to true in statistics workflow

31250 09/10/2014 03:33 PM Marek Horst

introducing regex support in result approver to support iis::* kind of provenance, updating workflow definitions with proper regex values

31228 08/10/2014 06:19 PM Marek Horst

#840 moving IdentifierMapping from importer to common package

31222 08/10/2014 06:12 PM Marek Horst

#840 renaming DeduplicationMapping to more generic IdentifierMapping

31216 08/10/2014 05:56 PM Marek Horst

#757 aligning common importer with current API of PMC citations ingestion

29826 22/08/2014 02:27 PM Marek Horst

introducing trust_level_threshold support in common import workflow

29817 22/08/2014 11:30 AM Marek Horst

allowing overriding inference_provenance_blacklist default 'iis' value which will be required in mainworkflows/statistcs where inferenced document to project relations should be taken into account

29616 28/07/2014 03:14 PM Marek Horst

fixing output port names: removing default values for citation_pmc and dataset, setting proper output_citation_pmc in both preprocessing and primary workflows

29167 16/07/2014 12:14 PM Marek Horst

skipping PMC citations ingestion when citationmatching algorithm is not enabled

28987 10/07/2014 03:37 PM Marek Horst

intregrating pmc citations ingestion with primary workflow, adjust port names, deduplicating dependencies