


| Revision:
  • svn:ignore: .* bin target build

# Date Author Comment
33731 30/12/2014 05:17 PM Marek Horst

[maven-release-plugin] prepare for next development iteration

33729 30/12/2014 05:17 PM Marek Horst

[maven-release-plugin] prepare release icm-iis-mainworkflows-1.0.0

33728 30/12/2014 02:50 PM Marek Horst

changing snapshot dependencies to released ones

33622 17/12/2014 12:33 PM Marek Horst

#1044 upgrading dependencies to released versions and parent version to most recent snapshot for unreleased modules

33414 15/12/2014 12:46 PM Marek Horst

introducing scm definition

33398 15/12/2014 12:25 PM Marek Horst


33355 11/12/2014 08:36 PM Marek Horst


33249 09/12/2014 06:41 PM Marek Horst

#919 renaming DocumentToResearchInitiative to DocumentToConceptId and DocumentToResearchInitiatives to DocumentToConceptIds

33228 09/12/2014 11:02 AM Marek Horst

#1022 introducing PMC extracted document metadata collapser removing duplicates before sending output to PMC citation ingestion module

33218 05/12/2014 04:26 PM Marek Horst

#919 adding missing i/o ports related to FET projects reference extraction

33184 04/12/2014 04:09 PM Marek Horst

#919 enabling concepts matching for FET projects in mainworkflows: import, export, primary and preprocessing

33105 28/11/2014 06:13 PM Marek Horst

#1017 accepting ExtractedDocumentMetadata instead of DocumentText at PMC citation ingestion input. Aliging integration test and importer workflow.

33098 28/11/2014 04:27 PM Marek Horst

#1022 introducing extracted document metadata collapser at importing phase.
Propagating extracted document mentadata (including PMC ingested metadata) to processing part of workflow what can be exploited by citation matching module.
Introducing citations collapser in last stage of processing phase collapsing ingested citations with matched citations.

32943 21/11/2014 05:50 PM Marek Horst

#1017 introducing new PMC metadata ingestion currently extracing references, journal and pages fields.
Replacing DOM/XPath based citations ingestion with much faster SAX version. Changing pmidtooaid transformer utilizing ExtractedDocumentMetadata instead of parsing XML file. Enabling PMC metadata ingestion in common/import.

32829 17/11/2014 03:45 PM Marek Horst

#963 propagating dataset -> mdstore from import to exporting phase: importer produces DocumentToMDStore datasetore utilized by exporter module. Updating transformer definition to handle DocumentToMDStore instead of Identifier schema

32825 17/11/2014 03:43 PM Marek Horst

introducing separate citations json containing expected results, not enabled in workflow yet

32824 17/11/2014 03:42 PM Marek Horst


32823 17/11/2014 03:42 PM Marek Horst


32167 04/11/2014 02:04 PM Marek Horst


32166 04/11/2014 02:01 PM Marek Horst


32165 04/11/2014 02:01 PM Marek Horst


32164 04/11/2014 02:00 PM Marek Horst


32162 04/11/2014 01:44 PM Marek Horst


32045 31/10/2014 02:59 PM Marek Horst

updating adding metadataextraction_excluded_checksums=4f5cc34f137de4dc89766a9366ca66de,6495a568200b1cee40baa00072b1800a

32043 31/10/2014 02:45 PM Marek Horst


32042 31/10/2014 02:45 PM Marek Horst

introducing support for active_existence_filter, set to true by default. Setting this parameter to false allows processing contents not having its counterpart among metadata records retrieved from HBase. This solution was required to e.g. process ubiquity contents which were not present in HBase dump metadata.

31846 28/10/2014 03:45 PM Marek Horst

fixing citations schema type

31835 28/10/2014 02:24 PM Marek Horst


31759 27/10/2014 06:20 PM Marek Horst

renaming metadataextraction_excluded_ids to more appropriate metadataextraction_excluded_checksums

31758 27/10/2014 06:11 PM Marek Horst

#913 introducing support for max file size parameter, currently checked against Content-Lenght header

31682 23/10/2014 07:30 PM Marek Horst

adding integration-test job name suffix

31680 23/10/2014 07:05 PM Marek Horst

adding icm-iis-mainworkflows_import entry

31679 23/10/2014 06:57 PM Marek Horst

setting nigtly parameter

31667 23/10/2014 04:13 PM Marek Horst


31647 22/10/2014 06:31 PM Marek Horst

enabling document classification and reserach initiatives reference extraction algorithms

31498 20/10/2014 06:03 PM Marek Horst

#757 hooking up ingest_pmc_idmapping_pmidtooaid subworkflow with mainworkflows/common/import. From now on citations are matched by pmid as well.

31496 20/10/2014 05:57 PM Marek Horst

updating profiles names

31495 20/10/2014 05:52 PM Marek Horst

fixing job name for integration test

31434 17/10/2014 06:30 PM Marek Horst


31428 17/10/2014 03:56 PM Marek Horst

#883 providing blacklisted_objectstores_csv input parameter set to $UNDEFINED$ value by default

31422 17/10/2014 12:54 PM Marek Horst


31410 16/10/2014 05:48 PM Marek Horst

input port name fix: input_citation->input_citations

31267 10/10/2014 03:37 PM Marek Horst

introducing merge_body_with_updates flag support in common/import, setting to true in statistics workflow

31250 09/10/2014 03:33 PM Marek Horst

introducing regex support in result approver to support iis::* kind of provenance, updating workflow definitions with proper regex values

31228 08/10/2014 06:19 PM Marek Horst

#840 moving IdentifierMapping from importer to common package

31222 08/10/2014 06:12 PM Marek Horst

#840 renaming DeduplicationMapping to more generic IdentifierMapping

31216 08/10/2014 05:56 PM Marek Horst

#757 aligning common importer with current API of PMC citations ingestion

31206 08/10/2014 01:46 PM Marek Horst

disabling workflow tests

31203 08/10/2014 01:15 PM Marek Horst

introducing external-integration-test: iis/mainworkflows/integration/primary/processing entry

31154 06/10/2014 03:47 PM Marek Horst

#637 renaming document_extractedMetadata algorithm to more descriptive document_affiliations, propagating changes to action set identifier properties names

31041 02/10/2014 02:29 PM Marek Horst

introducing cloudera repository in parent container, removing repository definitions from individual IIS modules

31034 02/10/2014 02:15 PM Marek Horst

removing extracted_metadata.json which will not be checked anymore

31033 02/10/2014 02:15 PM Marek Horst

reenabling PMC ingestion when citationmatching flag is set

30981 01/10/2014 06:22 PM Marek Horst

updating job properties

30938 29/09/2014 06:18 PM Marek Horst

skipping extracted_metadata comparison which is cumbersome due to frequent changes and large volume of references

30885 25/09/2014 06:40 PM Marek Horst

introducing newly added address field in json record

30876 25/09/2014 05:03 PM Marek Horst

fixing field names after recent Affiliation.avdl refactoring and adding countryCode field, renaming contry to countryName

30006 04/09/2014 01:10 PM Marek Horst

setting export_action_set_id_entity_dataset to $UNDEFINED$ by default, this should not be required because dataset reference extraction module might be deactivated. Check will be performed at dataset entity exporter module and when value is not set - exception will be raised.

29982 03/09/2014 05:53 PM Marek Horst

#757 temporarily disabling PMC ingestion until fixing openaire identifiers building process

29967 03/09/2014 11:04 AM Marek Horst

#568, #577 enabling proper citations export by introducing PMC citation ingestion and citation matching outcome merging and grouping for exporting purposes. Introducing union instead of collapser which should be introduced in near future.

29895 28/08/2014 04:32 PM Marek Horst

updating expected output

29893 28/08/2014 01:38 PM Marek Horst

removing output_citation_pmc port duplicate

29855 25/08/2014 06:09 PM Marek Horst

updating performance test

29854 25/08/2014 06:06 PM Marek Horst

moving ACM importer to icm-iis-mainworkflows due to extending dependances with cermine, introducing performance tests

29835 22/08/2014 05:38 PM Marek Horst

removing common import input parameters which are not required in this context

29827 22/08/2014 02:34 PM Marek Horst

introducing trust_level_threshold support in statistics workflow

29826 22/08/2014 02:27 PM Marek Horst

introducing trust_level_threshold support in common import workflow

29821 22/08/2014 01:13 PM Marek Horst

providing default value for action_set_id_entity_dataset set to $UNDEFINED$. This change is required when exporting in statistics export mode where no entities are exported and such parameter should not be required.

29819 22/08/2014 12:56 PM Marek Horst

introducing dedicated statistics mainworkflow encapsulating importing, processing and exporting phases. This workflow was introduced explicitly for statistics purposes because we want to operate over InformationSpace imported data in contrary to primary workflow where some of the statistics input was inferred and it wasn't clear whether it will become part of InformationSpace.

29817 22/08/2014 11:30 AM Marek Horst

allowing overriding inference_provenance_blacklist default 'iis' value which will be required in mainworkflows/statistcs where inferenced document to project relations should be taken into account

29816 21/08/2014 06:48 PM Marek Horst

setting default $undefined$ value for 'input_aux_dataset_existing_id'

29815 21/08/2014 06:33 PM Marek Horst

setting default undefined values for 'mdstore_service_location' and 'dataset_mdstore_ids_csv'

29731 31/07/2014 12:28 PM Marek Horst

#9059 reverting #717 change: shortening app_path for primary workflow due to the fix applied by Paweł on WF_JOBS MODIFY mysql table: canging varchar(255) to mediumtext.

29730 31/07/2014 12:25 PM Marek Horst


29645 29/07/2014 10:43 AM Marek Horst

updating expected record content

29632 28/07/2014 10:40 PM Marek Horst

fixing placeholder name

29626 28/07/2014 05:01 PM Marek Horst

#717 shortening app_path for primary workflow

29621 28/07/2014 04:08 PM Marek Horst


29616 28/07/2014 03:14 PM Marek Horst

fixing output port names: removing default values for citation_pmc and dataset, setting proper output_citation_pmc in both preprocessing and primary workflows

29612 28/07/2014 02:56 PM Marek Horst


29611 28/07/2014 02:56 PM Marek Horst

#717 shortening app_path for preprocessing workflow and subworkflows

29483 23/07/2014 06:32 PM Marek Horst

#712 introducing plaintext caching

29479 23/07/2014 05:06 PM Marek Horst

shortening node names

29478 23/07/2014 05:04 PM Marek Horst

updating workingDir for generating empty outputs: removing import_dataset part

29398 21/07/2014 04:21 PM Marek Horst

updating expected extracted metadata

29300 19/07/2014 12:43 AM Mateusz Kobos

Fixing names of parameters accepted by workflow nodes

29167 16/07/2014 12:14 PM Marek Horst

skipping PMC citations ingestion when citationmatching algorithm is not enabled

29098 14/07/2014 04:02 PM Marek Horst

shortening transformer_export_documentto* action names to be less than 50 characters

29090 14/07/2014 02:37 PM Marek Horst

#354 hooking up primary/main workflow with documenttodataset and documenttoproject transformers skipping export of already existing relations in HBase

29089 14/07/2014 02:36 PM Marek Horst

updating default

29017 11/07/2014 10:29 AM Marek Horst

#486 fixing integration test: introducing missing document_text_wos input port for primary/processing

29016 11/07/2014 10:26 AM Marek Horst

#486 introducing last piece missing: text collapser in front of referenceextraction_researchinitiatives joining text contents coming from already existing document_text input port and newly introduced document_text_wos input port providing WoS contents

29005 10/07/2014 06:04 PM Marek Horst

#486 bugfix: reordering existence filter with id relacer: we need to update identifiers first, then update existence filter

28987 10/07/2014 03:37 PM Marek Horst

intregrating pmc citations ingestion with primary workflow, adjust port names, deduplicating dependencies

28957 08/07/2014 05:19 PM Marek Horst

updating default

28952 08/07/2014 04:57 PM Marek Horst

renaming input ports from input_citation to input_citations to be aligned with exporter subworkflow

28951 08/07/2014 04:55 PM Marek Horst

skipping exporting citation matching outcome

28950 08/07/2014 04:44 PM Marek Horst

renaming input ports from input_citation to input_citations to be aligned with exporter subworkflow