/modules/icm-iis-mainworkflows/trunk - Changes - D-Net - D-Net project tracking tool

dnet40/modules/icm-iis-mainworkflows/trunk @ 35030

svn:ignore: .* bin target build

#	Date	Author	Comment
35030	04/03/2015 12:16 PM	Marek Horst	updating job.properties
34958	02/03/2015 05:21 PM	Marek Horst	#1153 utilizing ${user.name} placeholder in ${workingDir} generation process, copying version.properties from oozie_app to mark execution environment with application version
34914	27/02/2015 07:34 PM	Marek Horst	#1147 introducing HTML import and HTML plaintext ingestion in main workflows: primary and preprocessing
34896	27/02/2015 05:36 PM	Marek Horst	#1147 renaming icm-iis-ingest-webcrawl module to icm-iis-ingest to make it more generic so it could contain not only webcrawl related ingesters but html ingesters as well
34893	27/02/2015 05:32 PM	Marek Horst	updating job.properties
34876	27/02/2015 04:08 PM	Marek Horst	overriding memory parameter due to test cluster memory limitations, setting it to Xmx256m
34875	27/02/2015 03:24 PM	Marek Horst	overriding memory parameter due to test cluster memory limitations, setting it to Xmx128m
34871	27/02/2015 02:49 PM	Marek Horst	overriding memory parameter due to test cluster memory limitations, setting it to Xmx512m
34869	27/02/2015 02:48 PM	Marek Horst	updating expected classes in integration test after recent #720 change and fixing confidence level distribution
34804	25/02/2015 07:19 PM	Marek Horst	overriding memory parameter due to test cluster memory limitations
34702	20/02/2015 07:17 PM	Marek Horst	#1133 dropping useless workfing_dir creation for java nodes
34626	19/02/2015 06:12 PM	Marek Horst	#1038 introducing ranges in dependencies definition for all IIS modules
34574	18/02/2015 03:49 PM	Marek Horst	#118 fixing typos
34572	18/02/2015 03:32 PM	Marek Horst	updating job.properties
34563	18/02/2015 01:26 PM	Marek Horst	#118 introducing website usage analysis as integral part of primary workflow
34535	16/02/2015 06:52 PM	Marek Horst	updating job.properties
34533	16/02/2015 06:35 PM	Marek Horst	#118 propagating configuration in main workflow.xml
34532	16/02/2015 06:28 PM	Marek Horst	introducing explicitly defined icm-iis-schemas SNAPSHOT dependency to prevent resolving earlier, released transitive version
34531	16/02/2015 05:58 PM	Marek Horst	#118 upgrading IIS dependencies to most recent snapshots
34530	16/02/2015 05:57 PM	Marek Horst	#118 updating job.properties
34520	13/02/2015 07:01 PM	Marek Horst	#118 introducing uoa-iis-websiteusage dependency in mainworkflows
34519	13/02/2015 07:00 PM	Marek Horst	comments added
34516	13/02/2015 05:55 PM	Marek Horst	#118 introducing mainworkflows_websiteusage_document_main workflow binding all subworkflows required to process logs and generate document similarities
34434	11/02/2015 02:26 PM	Marek Horst	#1083 enabling webcrawl ingester module extracting FX field from plaintext before executing project reference extraction
34433	11/02/2015 02:26 PM	Marek Horst	updating default job properties
34429	11/02/2015 02:15 PM	Marek Horst	#720 fixing document classification algorithm confidence level distribution, switching mainworkflows pom dependency to the fixed document classification snapshot
34213	02/02/2015 06:22 PM	Marek Horst	#1070 updating import_project_concepts_context_ids_csv default value to "fet-fp7,fet-h2020"
34212	02/02/2015 06:21 PM	Marek Horst	#1070 introducing support for multiple context identifiers, replacing import_project_concepts_context_id IIS input parameter with import_project_concepts_context_ids_csv
34207	02/02/2015 05:39 PM	Marek Horst	updating job.properties
34008	20/01/2015 04:34 PM	Marek Horst	#1072 transition fix: replacing forking_skip_imported_data with export
34007	20/01/2015 03:56 PM	Marek Horst	#1072 dropping IIS feature filtering out already existing project and dataset references from IIS export
33973	16/01/2015 06:45 PM	Marek Horst	#1065 upgrading icm-iis-parent-container dependency from 1.0.0 to 1.0.1-SNAPSHOT after introducing FCT support
33972	16/01/2015 06:11 PM	Marek Horst	#1065 upgrading uoa-iis-referenceextraction dependency from 1.0.0 to 1.0.1-SNAPSHOT after introducing FCT support
33731	30/12/2014 05:17 PM	Marek Horst	[maven-release-plugin] prepare for next development iteration
33729	30/12/2014 05:17 PM	Marek Horst	[maven-release-plugin] prepare release icm-iis-mainworkflows-1.0.0
33728	30/12/2014 02:50 PM	Marek Horst	changing snapshot dependencies to released ones
33622	17/12/2014 12:33 PM	Marek Horst	#1044 upgrading dependencies to released versions and parent version to most recent snapshot for unreleased modules
33414	15/12/2014 12:46 PM	Marek Horst	introducing scm definition
33398	15/12/2014 12:25 PM	Marek Horst	updating job.properties
33355	11/12/2014 08:36 PM	Marek Horst	updating job.properties
33249	09/12/2014 06:41 PM	Marek Horst	#919 renaming DocumentToResearchInitiative to DocumentToConceptId and DocumentToResearchInitiatives to DocumentToConceptIds
33228	09/12/2014 11:02 AM	Marek Horst	#1022 introducing PMC extracted document metadata collapser removing duplicates before sending output to PMC citation ingestion module
33218	05/12/2014 04:26 PM	Marek Horst	#919 adding missing i/o ports related to FET projects reference extraction
33184	04/12/2014 04:09 PM	Marek Horst	#919 enabling concepts matching for FET projects in mainworkflows: import, export, primary and preprocessing
33105	28/11/2014 06:13 PM	Marek Horst	#1017 accepting ExtractedDocumentMetadata instead of DocumentText at PMC citation ingestion input. Aliging integration test and importer workflow.
33098	28/11/2014 04:27 PM	Marek Horst	#1022 introducing extracted document metadata collapser at importing phase. Propagating extracted document mentadata (including PMC ingested metadata) to processing part of workflow what can be exploited by citation matching module. Introducing citations collapser in last stage of processing phase collapsing ingested citations with matched citations.
32943	21/11/2014 05:50 PM	Marek Horst	#1017 introducing new PMC metadata ingestion currently extracing references, journal and pages fields. Replacing DOM/XPath based citations ingestion with much faster SAX version. Changing pmidtooaid transformer utilizing ExtractedDocumentMetadata instead of parsing XML file. Enabling PMC metadata ingestion in common/import.
32829	17/11/2014 03:45 PM	Marek Horst	#963 propagating dataset -> mdstore from import to exporting phase: importer produces DocumentToMDStore datasetore utilized by exporter module. Updating transformer definition to handle DocumentToMDStore instead of Identifier schema
32825	17/11/2014 03:43 PM	Marek Horst	introducing separate citations json containing expected results, not enabled in workflow yet
32824	17/11/2014 03:42 PM	Marek Horst	updating job.properties
32823	17/11/2014 03:42 PM	Marek Horst	updating job.properties
32167	04/11/2014 02:04 PM	Marek Horst	updating job.properties
32166	04/11/2014 02:01 PM	Marek Horst	updating job.properties
32165	04/11/2014 02:01 PM	Marek Horst	updating job.properties
32164	04/11/2014 02:00 PM	Marek Horst	updating job.properties
32162	04/11/2014 01:44 PM	Marek Horst	updating job.properties
32045	31/10/2014 02:59 PM	Marek Horst	updating job.properties: adding metadataextraction_excluded_checksums=4f5cc34f137de4dc89766a9366ca66de,6495a568200b1cee40baa00072b1800a
32043	31/10/2014 02:45 PM	Marek Horst	updating job.properties
32042	31/10/2014 02:45 PM	Marek Horst	introducing support for active_existence_filter, set to true by default. Setting this parameter to false allows processing contents not having its counterpart among metadata records retrieved from HBase. This solution was required to e.g. process ubiquity contents which were not present in HBase dump metadata.
31846	28/10/2014 03:45 PM	Marek Horst	fixing citations schema type
31835	28/10/2014 02:24 PM	Marek Horst	updating job.properties
31759	27/10/2014 06:20 PM	Marek Horst	renaming metadataextraction_excluded_ids to more appropriate metadataextraction_excluded_checksums
31758	27/10/2014 06:11 PM	Marek Horst	#913 introducing support for max file size parameter, currently checked against Content-Lenght header
31682	23/10/2014 07:30 PM	Marek Horst	adding integration-test job name suffix
31680	23/10/2014 07:05 PM	Marek Horst	adding icm-iis-mainworkflows_import entry
31679	23/10/2014 06:57 PM	Marek Horst	setting nigtly parameter
31667	23/10/2014 04:13 PM	Marek Horst	updating job.properties
31647	22/10/2014 06:31 PM	Marek Horst	enabling document classification and reserach initiatives reference extraction algorithms
31498	20/10/2014 06:03 PM	Marek Horst	#757 hooking up ingest_pmc_idmapping_pmidtooaid subworkflow with mainworkflows/common/import. From now on citations are matched by pmid as well.
31496	20/10/2014 05:57 PM	Marek Horst	updating profiles names
31495	20/10/2014 05:52 PM	Marek Horst	fixing job name for integration test
31434	17/10/2014 06:30 PM	Marek Horst	updating job.properties
31428	17/10/2014 03:56 PM	Marek Horst	#883 providing blacklisted_objectstores_csv input parameter set to $UNDEFINED$ value by default
31422	17/10/2014 12:54 PM	Marek Horst	updating job.properties
31410	16/10/2014 05:48 PM	Marek Horst	input port name fix: input_citation->input_citations
31267	10/10/2014 03:37 PM	Marek Horst	introducing merge_body_with_updates flag support in common/import, setting to true in statistics workflow
31250	09/10/2014 03:33 PM	Marek Horst	introducing regex support in result approver to support iis::* kind of provenance, updating workflow definitions with proper regex values
31228	08/10/2014 06:19 PM	Marek Horst	#840 moving IdentifierMapping from importer to common package
31222	08/10/2014 06:12 PM	Marek Horst	#840 renaming DeduplicationMapping to more generic IdentifierMapping
31216	08/10/2014 05:56 PM	Marek Horst	#757 aligning common importer with current API of PMC citations ingestion
31206	08/10/2014 01:46 PM	Marek Horst	disabling workflow tests
31203	08/10/2014 01:15 PM	Marek Horst	introducing external-integration-test: iis/mainworkflows/integration/primary/processing entry
31154	06/10/2014 03:47 PM	Marek Horst	#637 renaming document_extractedMetadata algorithm to more descriptive document_affiliations, propagating changes to action set identifier properties names
31041	02/10/2014 02:29 PM	Marek Horst	introducing cloudera repository in parent container, removing repository definitions from individual IIS modules
31034	02/10/2014 02:15 PM	Marek Horst	removing extracted_metadata.json which will not be checked anymore
31033	02/10/2014 02:15 PM	Marek Horst	reenabling PMC ingestion when citationmatching flag is set
30981	01/10/2014 06:22 PM	Marek Horst	updating job properties
30938	29/09/2014 06:18 PM	Marek Horst	skipping extracted_metadata comparison which is cumbersome due to frequent changes and large volume of references
30885	25/09/2014 06:40 PM	Marek Horst	introducing newly added address field in json record
30876	25/09/2014 05:03 PM	Marek Horst	fixing field names after recent Affiliation.avdl refactoring and adding countryCode field, renaming contry to countryName
30006	04/09/2014 01:10 PM	Marek Horst	setting export_action_set_id_entity_dataset to $UNDEFINED$ by default, this should not be required because dataset reference extraction module might be deactivated. Check will be performed at dataset entity exporter module and when value is not set - exception will be raised.
29982	03/09/2014 05:53 PM	Marek Horst	#757 temporarily disabling PMC ingestion until fixing openaire identifiers building process
29967	03/09/2014 11:04 AM	Marek Horst	#568, #577 enabling proper citations export by introducing PMC citation ingestion and citation matching outcome merging and grouping for exporting purposes. Introducing union instead of collapser which should be introduced in near future.
29895	28/08/2014 04:32 PM	Marek Horst	updating expected output
29893	28/08/2014 01:38 PM	Marek Horst	removing output_citation_pmc port duplicate
29855	25/08/2014 06:09 PM	Marek Horst	updating performance test
29854	25/08/2014 06:06 PM	Marek Horst	moving ACM importer to icm-iis-mainworkflows due to extending dependances with cermine, introducing performance tests
29835	22/08/2014 05:38 PM	Marek Horst	removing common import input parameters which are not required in this context

Project

General

Profile

D-Net