/modules/icm-iis-mainworkflows/trunk/src - Changes - D-Net - D-Net project tracking tool

dnet40/modules/icm-iis-mainworkflows/trunk/src @ 60966

#	Date	Author	Comment
39127	09/09/2015 08:53 AM	Marek Horst	renaming test resources to be compliant with windows file system naming requirements: replacing '\|' with '_'
39089	08/09/2015 03:09 PM	Marek Horst	renaming test resources to be compliant with windows file system naming requirements
39057	05/09/2015 09:42 PM	Marek Horst	fixing destination id in expected citation record
39056	05/09/2015 09:22 PM	Marek Horst	updating fundingtree value to xml representation and changing expected fundingclass as outcome
39053	05/09/2015 03:34 PM	Marek Horst	#1498 adding missing propagate configuration element
39050	04/09/2015 11:47 PM	Marek Horst	#1498 adding missing collapsers_basic_collapser in imports.txt file
39043	04/09/2015 11:26 PM	Marek Horst	#1498 introducing major citations related refactoring including new generic direct citation matching moved to processing phase, introduced position field in all citations schemas and updated collapser taking position into account when merging citations details coming from 3 variuos sources: fuzzy citationmatching, direct citationmatching, references metadata
38872	31/08/2015 12:23 PM	Marek Horst	updating job.properties
38122	08/07/2015 05:36 PM	Marek Horst	updating job.properties
38032	30/06/2015 06:49 PM	Marek Horst	updating job.properties
38007	29/06/2015 03:14 PM	Marek Horst	#1397 removing obsolete parameters in subworkflow actions definitions
37976	26/06/2015 05:48 PM	Marek Horst	#1209 introducing support for trust level thresholds provided as IIS input parameter
37972	26/06/2015 04:05 PM	Marek Horst	removing obsolete quick run workflows
37947	24/06/2015 12:13 PM	Marek Horst	#1212 updating classification test expected results after fixing typo: dccclasses->ddcclasses in taxonomies.db
37883	19/06/2015 04:35 PM	Marek Horst	updating job.properties
37873	19/06/2015 02:06 PM	Marek Horst	#1381 porting pmc citations ingestion from cascading framework to pig. Moving code from icm-iis-ingest-pmc to icm-iis-transformers including itegration tests, removing obsolete scala code along with unneded dependencies. Switching subworkflow in primary workflow.
37872	19/06/2015 01:48 PM	Marek Horst	updating job.properties
37780	15/06/2015 12:29 PM	Marek Horst	updating expected classes, setting acm classes
37777	15/06/2015 11:14 AM	Marek Horst	updating expected classes
37585	29/05/2015 04:17 PM	Marek Horst	#1339 fixing input_dedup_map in pmc citation ingestion when match_content_with_metadata=false. Should not be set dynamically but statically, it will be enabled only when metadata_import is enabled
37561	29/05/2015 02:27 PM	Marek Horst	#1339 replacing active_existence_filter flag with match_content_with_metadata and changing identifiers matching logic: when flag is disabled neither contents identifiers will be filtered nor deduplicated against metadata identifiers. Up unit now, when active_existence_filter flag was disabled contents were deduplicated which is not desired when running IIS in standalone mode on contents having their representatives in HBase
37533	28/05/2015 04:16 PM	Marek Horst	#1329 enabling pmc ingestion when active_metadataextraction_export flag is enabled
37470	26/05/2015 10:30 AM	Marek Horst	introducing missing pdb reference extraction missing parameters
37469	26/05/2015 10:25 AM	Marek Horst	bugfix: renaming obsolete decision-export to decision-export-to-hbase
37464	25/05/2015 10:13 PM	Marek Horst	#1308 reverting uri:oozie:distcp-action:0.2 change: version is not properly recognized by oozie 3.3.2-cdh4.3.1
37432	25/05/2015 01:15 PM	Marek Horst	#1260 enabling document to protein databank reference extraction in primary workflow, supporting 3 new parameters: active_referenceextraction_pdb, export_action_set_id_document_pdb, export_referenceextraction_pdb_url_root
37414	22/05/2015 05:32 PM	Marek Horst	#1315 providing missing confidenceLevel
37408	22/05/2015 02:41 PM	Marek Horst	#1315 providing missing confidenceLevel
37392	22/05/2015 01:13 PM	Marek Horst	#1315 updating expected jsons in integration test after DocumentToConceptIds schema refactoring
37231	14/05/2015 11:56 AM	Marek Horst	#1301 introducing explicit export mode flags: active_export_to_hbase and active_export_to_json. This way both exports can be enabled or both of them can be disabled.
37194	13/05/2015 12:49 PM	Marek Horst	#1308 switching distcp namespace to uri:oozie:distcp-action:0.2
37095	11/05/2015 11:46 AM	Marek Horst	disabling export by setting active_export flag to false. Results will be converted to JSON records
37026	07/05/2015 01:27 PM	Marek Horst	#1301 introducing common/export_to_json and utilizing this subworkflow in both primary and preprocessing workflows executing it when active_export=false which means hbase export is disabled
36989	06/05/2015 04:43 PM	Marek Horst	#118 explicitly defining input_document_websiteusage_similarity parameter. This is not a bug fix because exporter works properly without explicitly defining input port due to propagate-configuration mode but we should have all input port definitions aligned to avoid confusions.
36473	20/04/2015 12:53 PM	Marek Horst	bugfixing existing fault removal which was missing
36469	20/04/2015 10:53 AM	Marek Horst	bugfixing existing fault removal which was missing
36450	17/04/2015 04:38 PM	Marek Horst	upgrading xmlns version to 0.4 in order to support global element
36443	17/04/2015 02:56 PM	Marek Horst	setting false to remove_sideproducts, otherwise whole workingDir will be erased
36286	09/04/2015 07:10 PM	Marek Horst	#1257 dropping schema generation related hacks in all map-reduce modules, switching to literal schema parameters
35985	03/04/2015 01:11 PM	Marek Horst	updating job.properties, disabling all algorithms by default
35967	02/04/2015 10:17 PM	Marek Horst	#1248 fixing transition node name postprocessing-joining to merge-joining
35946	02/04/2015 05:52 PM	Marek Horst	#1248 introducing fault subdirectory support in all workflows wrapping metadataextraction subworkflow up to the processing and primary root workflows. This should prevent fault directory from being removed when ${remove_sideproducts} flag is enabled, it will be propagated along with metadata and plaintext.
35935	02/04/2015 03:59 PM	Marek Horst	updating job.properties
35701	27/03/2015 06:18 AM	Mateusz Kobos	Removing usage of working_dir from Java workflow node.
35232	11/03/2015 02:19 PM	Marek Horst	reenabling document to project reference import validation
35231	11/03/2015 01:55 PM	Marek Horst	updating expected documents list
35230	11/03/2015 01:30 PM	Marek Horst	temporarily skipping docproject validation
35229	11/03/2015 01:14 PM	Marek Horst	#1195 removing obsolete ports docreation and datasetid from hbase mapred import, removing references to those ports in workflow.xml files, updating transformer by removing filtering by datasetid due to decisions made in #1072
35200	09/03/2015 07:07 PM	Marek Horst	fixing json escape character by putting \\ in place of \
35199	09/03/2015 06:45 PM	Marek Horst	extending mapreduce metadata importer test with validating import of different kind of relations and dataset identifier
35198	09/03/2015 06:44 PM	Marek Horst	extending mapreduce metadata importer test with validating import of different kind of relations and dataset identifier
35191	09/03/2015 04:52 PM	Marek Horst	removing obsolete citations
35189	09/03/2015 04:24 PM	Marek Horst	updating confidence level value to 1.0 for record coming from PMC
35187	09/03/2015 04:21 PM	Marek Horst	removing obsolete pdf directory
35183	09/03/2015 03:16 PM	Marek Horst	adding missing "confidenceLevel" field
35153	06/03/2015 06:25 PM	Marek Horst	maintaining pmc citation and testing citations merging process
35152	06/03/2015 05:36 PM	Marek Horst	reintroducing multiple citations after introducing sorting in transformer
35149	06/03/2015 05:25 PM	Marek Horst	limiting citations count to 1 until results order produced by citation matching module is repetitive
35146	06/03/2015 04:13 PM	Marek Horst	including: FET project reference extraction, EGI case, dataset reference extraction outcome validation
35144	06/03/2015 03:06 PM	Marek Horst	enabling citation matching algorithm
35143	06/03/2015 03:06 PM	Marek Horst	updating expected citations
35122	05/03/2015 06:47 PM	Marek Horst	removing comment
35120	05/03/2015 06:46 PM	Marek Horst	primary processing integration test major refactoring: dropping cermine execution and providing plaintext and extracted metadata as json records
35057	04/03/2015 05:30 PM	Marek Horst	#1176 defining remove_sideproducts property in workflows headers
35048	04/03/2015 04:44 PM	Marek Horst	#1176 introducing side products removal in common import by maintaining remove_sideproducts flag set to true by default. Notice: do not provide any output directory location pointing to workingDir subdirectory!
35042	04/03/2015 03:01 PM	Marek Horst	removing duplicate collapser import and aligning worklfow definition
35031	04/03/2015 01:08 PM	Marek Horst	#1172 introducing support for active_export parameter in both preprocessing and primary workflows
35030	04/03/2015 12:16 PM	Marek Horst	updating job.properties
34958	02/03/2015 05:21 PM	Marek Horst	#1153 utilizing ${user.name} placeholder in ${workingDir} generation process, copying version.properties from oozie_app to mark execution environment with application version
34914	27/02/2015 07:34 PM	Marek Horst	#1147 introducing HTML import and HTML plaintext ingestion in main workflows: primary and preprocessing
34893	27/02/2015 05:32 PM	Marek Horst	updating job.properties
34876	27/02/2015 04:08 PM	Marek Horst	overriding memory parameter due to test cluster memory limitations, setting it to Xmx256m
34875	27/02/2015 03:24 PM	Marek Horst	overriding memory parameter due to test cluster memory limitations, setting it to Xmx128m
34871	27/02/2015 02:49 PM	Marek Horst	overriding memory parameter due to test cluster memory limitations, setting it to Xmx512m
34869	27/02/2015 02:48 PM	Marek Horst	updating expected classes in integration test after recent #720 change and fixing confidence level distribution
34804	25/02/2015 07:19 PM	Marek Horst	overriding memory parameter due to test cluster memory limitations
34702	20/02/2015 07:17 PM	Marek Horst	#1133 dropping useless workfing_dir creation for java nodes
34574	18/02/2015 03:49 PM	Marek Horst	#118 fixing typos
34572	18/02/2015 03:32 PM	Marek Horst	updating job.properties
34563	18/02/2015 01:26 PM	Marek Horst	#118 introducing website usage analysis as integral part of primary workflow
34535	16/02/2015 06:52 PM	Marek Horst	updating job.properties
34533	16/02/2015 06:35 PM	Marek Horst	#118 propagating configuration in main workflow.xml
34530	16/02/2015 05:57 PM	Marek Horst	#118 updating job.properties
34519	13/02/2015 07:00 PM	Marek Horst	comments added
34516	13/02/2015 05:55 PM	Marek Horst	#118 introducing mainworkflows_websiteusage_document_main workflow binding all subworkflows required to process logs and generate document similarities
34434	11/02/2015 02:26 PM	Marek Horst	#1083 enabling webcrawl ingester module extracting FX field from plaintext before executing project reference extraction
34433	11/02/2015 02:26 PM	Marek Horst	updating default job properties
34213	02/02/2015 06:22 PM	Marek Horst	#1070 updating import_project_concepts_context_ids_csv default value to "fet-fp7,fet-h2020"
34212	02/02/2015 06:21 PM	Marek Horst	#1070 introducing support for multiple context identifiers, replacing import_project_concepts_context_id IIS input parameter with import_project_concepts_context_ids_csv
34207	02/02/2015 05:39 PM	Marek Horst	updating job.properties
34008	20/01/2015 04:34 PM	Marek Horst	#1072 transition fix: replacing forking_skip_imported_data with export
34007	20/01/2015 03:56 PM	Marek Horst	#1072 dropping IIS feature filtering out already existing project and dataset references from IIS export
33398	15/12/2014 12:25 PM	Marek Horst	updating job.properties
33355	11/12/2014 08:36 PM	Marek Horst	updating job.properties
33249	09/12/2014 06:41 PM	Marek Horst	#919 renaming DocumentToResearchInitiative to DocumentToConceptId and DocumentToResearchInitiatives to DocumentToConceptIds
33228	09/12/2014 11:02 AM	Marek Horst	#1022 introducing PMC extracted document metadata collapser removing duplicates before sending output to PMC citation ingestion module
33218	05/12/2014 04:26 PM	Marek Horst	#919 adding missing i/o ports related to FET projects reference extraction
33184	04/12/2014 04:09 PM	Marek Horst	#919 enabling concepts matching for FET projects in mainworkflows: import, export, primary and preprocessing
33105	28/11/2014 06:13 PM	Marek Horst	#1017 accepting ExtractedDocumentMetadata instead of DocumentText at PMC citation ingestion input. Aliging integration test and importer workflow.
33098	28/11/2014 04:27 PM	Marek Horst	#1022 introducing extracted document metadata collapser at importing phase. Propagating extracted document mentadata (including PMC ingested metadata) to processing part of workflow what can be exploited by citation matching module. Introducing citations collapser in last stage of processing phase collapsing ingested citations with matched citations.

Project

General

Profile

D-Net