/modules/icm-iis-mainworkflows/trunk/src - Changes - D-Net - D-Net project tracking tool

dnet40/modules/icm-iis-mainworkflows/trunk/src @ 31999

#	Date	Author	Comment
31846	28/10/2014 03:45 PM	Marek Horst	fixing citations schema type
31835	28/10/2014 02:24 PM	Marek Horst	updating job.properties
31759	27/10/2014 06:20 PM	Marek Horst	renaming metadataextraction_excluded_ids to more appropriate metadataextraction_excluded_checksums
31758	27/10/2014 06:11 PM	Marek Horst	#913 introducing support for max file size parameter, currently checked against Content-Lenght header
31667	23/10/2014 04:13 PM	Marek Horst	updating job.properties
31647	22/10/2014 06:31 PM	Marek Horst	enabling document classification and reserach initiatives reference extraction algorithms
31498	20/10/2014 06:03 PM	Marek Horst	#757 hooking up ingest_pmc_idmapping_pmidtooaid subworkflow with mainworkflows/common/import. From now on citations are matched by pmid as well.
31434	17/10/2014 06:30 PM	Marek Horst	updating job.properties
31428	17/10/2014 03:56 PM	Marek Horst	#883 providing blacklisted_objectstores_csv input parameter set to $UNDEFINED$ value by default
31422	17/10/2014 12:54 PM	Marek Horst	updating job.properties
31410	16/10/2014 05:48 PM	Marek Horst	input port name fix: input_citation->input_citations
31267	10/10/2014 03:37 PM	Marek Horst	introducing merge_body_with_updates flag support in common/import, setting to true in statistics workflow
31250	09/10/2014 03:33 PM	Marek Horst	introducing regex support in result approver to support iis::* kind of provenance, updating workflow definitions with proper regex values
31228	08/10/2014 06:19 PM	Marek Horst	#840 moving IdentifierMapping from importer to common package
31222	08/10/2014 06:12 PM	Marek Horst	#840 renaming DeduplicationMapping to more generic IdentifierMapping
31216	08/10/2014 05:56 PM	Marek Horst	#757 aligning common importer with current API of PMC citations ingestion
31206	08/10/2014 01:46 PM	Marek Horst	disabling workflow tests
31154	06/10/2014 03:47 PM	Marek Horst	#637 renaming document_extractedMetadata algorithm to more descriptive document_affiliations, propagating changes to action set identifier properties names
31034	02/10/2014 02:15 PM	Marek Horst	removing extracted_metadata.json which will not be checked anymore
31033	02/10/2014 02:15 PM	Marek Horst	reenabling PMC ingestion when citationmatching flag is set
30981	01/10/2014 06:22 PM	Marek Horst	updating job properties
30938	29/09/2014 06:18 PM	Marek Horst	skipping extracted_metadata comparison which is cumbersome due to frequent changes and large volume of references
30885	25/09/2014 06:40 PM	Marek Horst	introducing newly added address field in json record
30876	25/09/2014 05:03 PM	Marek Horst	fixing field names after recent Affiliation.avdl refactoring and adding countryCode field, renaming contry to countryName
30006	04/09/2014 01:10 PM	Marek Horst	setting export_action_set_id_entity_dataset to $UNDEFINED$ by default, this should not be required because dataset reference extraction module might be deactivated. Check will be performed at dataset entity exporter module and when value is not set - exception will be raised.
29982	03/09/2014 05:53 PM	Marek Horst	#757 temporarily disabling PMC ingestion until fixing openaire identifiers building process
29967	03/09/2014 11:04 AM	Marek Horst	#568, #577 enabling proper citations export by introducing PMC citation ingestion and citation matching outcome merging and grouping for exporting purposes. Introducing union instead of collapser which should be introduced in near future.
29895	28/08/2014 04:32 PM	Marek Horst	updating expected output
29893	28/08/2014 01:38 PM	Marek Horst	removing output_citation_pmc port duplicate
29855	25/08/2014 06:09 PM	Marek Horst	updating performance test
29854	25/08/2014 06:06 PM	Marek Horst	moving ACM importer to icm-iis-mainworkflows due to extending dependances with cermine, introducing performance tests
29835	22/08/2014 05:38 PM	Marek Horst	removing common import input parameters which are not required in this context
29827	22/08/2014 02:34 PM	Marek Horst	introducing trust_level_threshold support in statistics workflow
29826	22/08/2014 02:27 PM	Marek Horst	introducing trust_level_threshold support in common import workflow
29821	22/08/2014 01:13 PM	Marek Horst	providing default value for action_set_id_entity_dataset set to $UNDEFINED$. This change is required when exporting in statistics export mode where no entities are exported and such parameter should not be required.
29819	22/08/2014 12:56 PM	Marek Horst	introducing dedicated statistics mainworkflow encapsulating importing, processing and exporting phases. This workflow was introduced explicitly for statistics purposes because we want to operate over InformationSpace imported data in contrary to primary workflow where some of the statistics input was inferred and it wasn't clear whether it will become part of InformationSpace.
29817	22/08/2014 11:30 AM	Marek Horst	allowing overriding inference_provenance_blacklist default 'iis' value which will be required in mainworkflows/statistcs where inferenced document to project relations should be taken into account
29816	21/08/2014 06:48 PM	Marek Horst	setting default $undefined$ value for 'input_aux_dataset_existing_id'
29815	21/08/2014 06:33 PM	Marek Horst	setting default undefined values for 'mdstore_service_location' and 'dataset_mdstore_ids_csv'
29731	31/07/2014 12:28 PM	Marek Horst	#9059 reverting #717 change: shortening app_path for primary workflow due to the fix applied by Paweł on WF_JOBS MODIFY mysql table: canging varchar(255) to mediumtext.
29730	31/07/2014 12:25 PM	Marek Horst	updating job.properties
29645	29/07/2014 10:43 AM	Marek Horst	updating expected record content
29632	28/07/2014 10:40 PM	Marek Horst	fixing placeholder name
29626	28/07/2014 05:01 PM	Marek Horst	#717 shortening app_path for primary workflow
29621	28/07/2014 04:08 PM	Marek Horst	updating job.properties
29616	28/07/2014 03:14 PM	Marek Horst	fixing output port names: removing default values for citation_pmc and dataset, setting proper output_citation_pmc in both preprocessing and primary workflows
29612	28/07/2014 02:56 PM	Marek Horst	updating job.properties
29611	28/07/2014 02:56 PM	Marek Horst	#717 shortening app_path for preprocessing workflow and subworkflows
29483	23/07/2014 06:32 PM	Marek Horst	#712 introducing plaintext caching
29479	23/07/2014 05:06 PM	Marek Horst	shortening node names
29478	23/07/2014 05:04 PM	Marek Horst	updating workingDir for generating empty outputs: removing import_dataset part
29398	21/07/2014 04:21 PM	Marek Horst	updating expected extracted metadata
29300	19/07/2014 12:43 AM	Mateusz Kobos	Fixing names of parameters accepted by workflow nodes
29167	16/07/2014 12:14 PM	Marek Horst	skipping PMC citations ingestion when citationmatching algorithm is not enabled
29098	14/07/2014 04:02 PM	Marek Horst	shortening transformer_export_documentto* action names to be less than 50 characters
29090	14/07/2014 02:37 PM	Marek Horst	#354 hooking up primary/main workflow with documenttodataset and documenttoproject transformers skipping export of already existing relations in HBase
29089	14/07/2014 02:36 PM	Marek Horst	updating default job.properties
29017	11/07/2014 10:29 AM	Marek Horst	#486 fixing integration test: introducing missing document_text_wos input port for primary/processing
29016	11/07/2014 10:26 AM	Marek Horst	#486 introducing last piece missing: text collapser in front of referenceextraction_researchinitiatives joining text contents coming from already existing document_text input port and newly introduced document_text_wos input port providing WoS contents
29005	10/07/2014 06:04 PM	Marek Horst	#486 bugfix: reordering existence filter with id relacer: we need to update identifiers first, then update existence filter
28987	10/07/2014 03:37 PM	Marek Horst	intregrating pmc citations ingestion with primary workflow, adjust port names, deduplicating dependencies
28957	08/07/2014 05:19 PM	Marek Horst	updating default job.properties
28952	08/07/2014 04:57 PM	Marek Horst	renaming input ports from input_citation to input_citations to be aligned with exporter subworkflow
28951	08/07/2014 04:55 PM	Marek Horst	skipping exporting citation matching outcome
28950	08/07/2014 04:44 PM	Marek Horst	renaming input ports from input_citation to input_citations to be aligned with exporter subworkflow
28872	03/07/2014 02:00 PM	Marek Horst	updating expected references output for doc=id-3
28843	02/07/2014 05:56 PM	Marek Horst	updating default job.properties
28817	02/07/2014 02:47 PM	Marek Horst	fixing affiliations and positions in authors details
28816	02/07/2014 02:36 PM	Marek Horst	fixing HBase model json representation to be compliant with most recent dnet-openaire-data-protos:3.0.0-SNAPSHOT model: complex relation identifiers, dataInfo on fields level etc
28813	02/07/2014 01:30 PM	Marek Horst	introducing additional logging
28804	02/07/2014 11:48 AM	Marek Horst	setting excluded_ids to undefined value
28803	02/07/2014 11:47 AM	Marek Horst	setting excluded_ids to undefined value
28802	02/07/2014 11:47 AM	Marek Horst	setting excluded_ids to undefined value
28801	02/07/2014 11:46 AM	Marek Horst	setting excluded_ids to undefined value

Project

General

Profile

D-Net