Project

General

Profile

  • svn:mime-type: text/plain

# Date Author Comment
37464 25/05/2015 10:13 PM Marek Horst

#1308 reverting uri:oozie:distcp-action:0.2 change: version is not properly recognized by oozie 3.3.2-cdh4.3.1

37432 25/05/2015 01:15 PM Marek Horst

#1260 enabling document to protein databank reference extraction in primary workflow, supporting 3 new parameters: active_referenceextraction_pdb, export_action_set_id_document_pdb, export_referenceextraction_pdb_url_root

37231 14/05/2015 11:56 AM Marek Horst

#1301 introducing explicit export mode flags: active_export_to_hbase and active_export_to_json. This way both exports can be enabled or both of them can be disabled.

37194 13/05/2015 12:49 PM Marek Horst

#1308 switching distcp namespace to uri:oozie:distcp-action:0.2

37026 07/05/2015 01:27 PM Marek Horst

#1301 introducing common/export_to_json and utilizing this subworkflow in both primary and preprocessing workflows executing it when active_export=false which means hbase export is disabled

35946 02/04/2015 05:52 PM Marek Horst

#1248 introducing fault subdirectory support in all workflows wrapping metadataextraction subworkflow up to the processing and primary root workflows. This should prevent fault directory from being removed when ${remove_sideproducts} flag is enabled, it will be propagated along with metadata and plaintext.

35701 27/03/2015 06:18 AM Mateusz Kobos

Removing usage of working_dir from Java workflow node.

35229 11/03/2015 01:14 PM Marek Horst

#1195 removing obsolete ports docreation and datasetid from hbase mapred import, removing references to those ports in workflow.xml files, updating transformer by removing filtering by datasetid due to decisions made in #1072

35057 04/03/2015 05:30 PM Marek Horst

#1176 defining remove_sideproducts property in workflows headers

35031 04/03/2015 01:08 PM Marek Horst

#1172 introducing support for active_export parameter in both preprocessing and primary workflows

34958 02/03/2015 05:21 PM Marek Horst

#1153 utilizing ${user.name} placeholder in ${workingDir} generation process, copying version.properties from oozie_app to mark execution environment with application version

34914 27/02/2015 07:34 PM Marek Horst

#1147 introducing HTML import and HTML plaintext ingestion in main workflows: primary and preprocessing

34702 20/02/2015 07:17 PM Marek Horst

#1133 dropping useless workfing_dir creation for java nodes

34574 18/02/2015 03:49 PM Marek Horst

#118 fixing typos

34563 18/02/2015 01:26 PM Marek Horst

#118 introducing website usage analysis as integral part of primary workflow

34213 02/02/2015 06:22 PM Marek Horst

#1070 updating import_project_concepts_context_ids_csv default value to "fet-fp7,fet-h2020"

34212 02/02/2015 06:21 PM Marek Horst

#1070 introducing support for multiple context identifiers, replacing import_project_concepts_context_id IIS input parameter with import_project_concepts_context_ids_csv

34008 20/01/2015 04:34 PM Marek Horst

#1072 transition fix: replacing forking_skip_imported_data with export

34007 20/01/2015 03:56 PM Marek Horst

#1072 dropping IIS feature filtering out already existing project and dataset references from IIS export

33184 04/12/2014 04:09 PM Marek Horst

#919 enabling concepts matching for FET projects in mainworkflows: import, export, primary and preprocessing

33098 28/11/2014 04:27 PM Marek Horst

#1022 introducing extracted document metadata collapser at importing phase.
Propagating extracted document mentadata (including PMC ingested metadata) to processing part of workflow what can be exploited by citation matching module.
Introducing citations collapser in last stage of processing phase collapsing ingested citations with matched citations.

32829 17/11/2014 03:45 PM Marek Horst

#963 propagating dataset -> mdstore from import to exporting phase: importer produces DocumentToMDStore datasetore utilized by exporter module. Updating transformer definition to handle DocumentToMDStore instead of Identifier schema

32042 31/10/2014 02:45 PM Marek Horst

introducing support for active_existence_filter, set to true by default. Setting this parameter to false allows processing contents not having its counterpart among metadata records retrieved from HBase. This solution was required to e.g. process ubiquity contents which were not present in HBase dump metadata.

31759 27/10/2014 06:20 PM Marek Horst

renaming metadataextraction_excluded_ids to more appropriate metadataextraction_excluded_checksums

31758 27/10/2014 06:11 PM Marek Horst

#913 introducing support for max file size parameter, currently checked against Content-Lenght header

31410 16/10/2014 05:48 PM Marek Horst

input port name fix: input_citation->input_citations

31154 06/10/2014 03:47 PM Marek Horst

#637 renaming document_extractedMetadata algorithm to more descriptive document_affiliations, propagating changes to action set identifier properties names

31033 02/10/2014 02:15 PM Marek Horst

reenabling PMC ingestion when citationmatching flag is set

30006 04/09/2014 01:10 PM Marek Horst

setting export_action_set_id_entity_dataset to $UNDEFINED$ by default, this should not be required because dataset reference extraction module might be deactivated. Check will be performed at dataset entity exporter module and when value is not set - exception will be raised.

29982 03/09/2014 05:53 PM Marek Horst

#757 temporarily disabling PMC ingestion until fixing openaire identifiers building process

29967 03/09/2014 11:04 AM Marek Horst

#568, #577 enabling proper citations export by introducing PMC citation ingestion and citation matching outcome merging and grouping for exporting purposes. Introducing union instead of collapser which should be introduced in near future.

29893 28/08/2014 01:38 PM Marek Horst

removing output_citation_pmc port duplicate

29731 31/07/2014 12:28 PM Marek Horst

#9059 reverting #717 change: shortening app_path for primary workflow due to the fix applied by Paweł on WF_JOBS MODIFY mysql table: canging varchar(255) to mediumtext.

29626 28/07/2014 05:01 PM Marek Horst

#717 shortening app_path for primary workflow

29616 28/07/2014 03:14 PM Marek Horst

fixing output port names: removing default values for citation_pmc and dataset, setting proper output_citation_pmc in both preprocessing and primary workflows

29167 16/07/2014 12:14 PM Marek Horst

skipping PMC citations ingestion when citationmatching algorithm is not enabled

29098 14/07/2014 04:02 PM Marek Horst

shortening transformer_export_documentto* action names to be less than 50 characters

29090 14/07/2014 02:37 PM Marek Horst

#354 hooking up primary/main workflow with documenttodataset and documenttoproject transformers skipping export of already existing relations in HBase

29016 11/07/2014 10:26 AM Marek Horst

#486 introducing last piece missing: text collapser in front of referenceextraction_researchinitiatives joining text contents coming from already existing document_text input port and newly introduced document_text_wos input port providing WoS contents

28987 10/07/2014 03:37 PM Marek Horst

intregrating pmc citations ingestion with primary workflow, adjust port names, deduplicating dependencies

28951 08/07/2014 04:55 PM Marek Horst

skipping exporting citation matching outcome