Project

General

Profile

Statistics
| Revision:

# Date Author Comment
35985 03/04/2015 01:11 PM Marek Horst

updating job.properties, disabling all algorithms by default

35967 02/04/2015 10:17 PM Marek Horst

#1248 fixing transition node name postprocessing-joining to merge-joining

35946 02/04/2015 05:52 PM Marek Horst

#1248 introducing fault subdirectory support in all workflows wrapping metadataextraction subworkflow up to the processing and primary root workflows. This should prevent fault directory from being removed when ${remove_sideproducts} flag is enabled, it will be propagated along with metadata and plaintext.

35935 02/04/2015 03:59 PM Marek Horst

updating job.properties

35701 27/03/2015 06:18 AM Mateusz Kobos

Removing usage of working_dir from Java workflow node.

35437 18/03/2015 11:40 AM Marek Horst
35436 18/03/2015 11:39 AM Marek Horst

updating README file

35427 17/03/2015 07:08 PM Marek Horst

#1187 moving changelog contents to redmine wiki

35404 17/03/2015 03:03 PM Marek Horst

#1198 aligning IIS dependencies and java code to CDH5.3.0 cluster

35391 17/03/2015 03:00 PM Marek Horst

#1197 introducing job.properties changes aligning paths to rumcajs cluster HDFS structure

35244 11/03/2015 04:43 PM Marek Horst

creating IIS-CDH-5.3.0 branch

35243 11/03/2015 04:43 PM Marek Horst
35232 11/03/2015 02:19 PM Marek Horst

reenabling document to project reference import validation

35231 11/03/2015 01:55 PM Marek Horst

updating expected documents list

35230 11/03/2015 01:30 PM Marek Horst

temporarily skipping docproject validation

35229 11/03/2015 01:14 PM Marek Horst

#1195 removing obsolete ports docreation and datasetid from hbase mapred import, removing references to those ports in workflow.xml files, updating transformer by removing filtering by datasetid due to decisions made in #1072

35200 09/03/2015 07:07 PM Marek Horst

fixing json escape character by putting \\ in place of \

35199 09/03/2015 06:45 PM Marek Horst

extending mapreduce metadata importer test with validating import of different kind of relations and dataset identifier

35198 09/03/2015 06:44 PM Marek Horst

extending mapreduce metadata importer test with validating import of different kind of relations and dataset identifier

35191 09/03/2015 04:52 PM Marek Horst

removing obsolete citations

35189 09/03/2015 04:24 PM Marek Horst

updating confidence level value to 1.0 for record coming from PMC

35187 09/03/2015 04:21 PM Marek Horst

removing obsolete pdf directory

35183 09/03/2015 03:16 PM Marek Horst

adding missing "confidenceLevel" field

35178 09/03/2015 02:39 PM Marek Horst

#1187 introducing IIS changelog

35153 06/03/2015 06:25 PM Marek Horst

maintaining pmc citation and testing citations merging process

35152 06/03/2015 05:36 PM Marek Horst

reintroducing multiple citations after introducing sorting in transformer

35149 06/03/2015 05:25 PM Marek Horst

limiting citations count to 1 until results order produced by citation matching module is repetitive

35146 06/03/2015 04:13 PM Marek Horst

including: FET project reference extraction, EGI case, dataset reference extraction outcome validation

35144 06/03/2015 03:06 PM Marek Horst

enabling citation matching algorithm

35143 06/03/2015 03:06 PM Marek Horst

updating expected citations

35122 05/03/2015 06:47 PM Marek Horst

removing comment

35120 05/03/2015 06:46 PM Marek Horst

primary processing integration test major refactoring: dropping cermine execution and providing plaintext and extracted metadata as json records

35057 04/03/2015 05:30 PM Marek Horst

#1176 defining remove_sideproducts property in workflows headers

35048 04/03/2015 04:44 PM Marek Horst

#1176 introducing side products removal in common import by maintaining remove_sideproducts flag set to true by default.
Notice: do not provide any output directory location pointing to workingDir subdirectory!

35042 04/03/2015 03:01 PM Marek Horst

removing duplicate collapser import and aligning worklfow definition

35031 04/03/2015 01:08 PM Marek Horst

#1172 introducing support for active_export parameter in both preprocessing and primary workflows

35030 04/03/2015 12:16 PM Marek Horst

updating job.properties

34958 02/03/2015 05:21 PM Marek Horst

#1153 utilizing ${user.name} placeholder in ${workingDir} generation process, copying version.properties from oozie_app to mark execution environment with application version

34914 27/02/2015 07:34 PM Marek Horst

#1147 introducing HTML import and HTML plaintext ingestion in main workflows: primary and preprocessing

34896 27/02/2015 05:36 PM Marek Horst

#1147 renaming icm-iis-ingest-webcrawl module to icm-iis-ingest to make it more generic so it could contain not only webcrawl related ingesters but html ingesters as well

34893 27/02/2015 05:32 PM Marek Horst

updating job.properties

34876 27/02/2015 04:08 PM Marek Horst

overriding memory parameter due to test cluster memory limitations, setting it to Xmx256m

34875 27/02/2015 03:24 PM Marek Horst

overriding memory parameter due to test cluster memory limitations, setting it to Xmx128m

34871 27/02/2015 02:49 PM Marek Horst

overriding memory parameter due to test cluster memory limitations, setting it to Xmx512m

34869 27/02/2015 02:48 PM Marek Horst

updating expected classes in integration test after recent #720 change and fixing confidence level distribution

34804 25/02/2015 07:19 PM Marek Horst

overriding memory parameter due to test cluster memory limitations

34702 20/02/2015 07:17 PM Marek Horst

#1133 dropping useless workfing_dir creation for java nodes

34626 19/02/2015 06:12 PM Marek Horst

#1038 introducing ranges in dependencies definition for all IIS modules

34574 18/02/2015 03:49 PM Marek Horst

#118 fixing typos

34572 18/02/2015 03:32 PM Marek Horst

updating job.properties

34563 18/02/2015 01:26 PM Marek Horst

#118 introducing website usage analysis as integral part of primary workflow

34535 16/02/2015 06:52 PM Marek Horst

updating job.properties

34533 16/02/2015 06:35 PM Marek Horst

#118 propagating configuration in main workflow.xml

34532 16/02/2015 06:28 PM Marek Horst

introducing explicitly defined icm-iis-schemas SNAPSHOT dependency to prevent resolving earlier, released transitive version

34531 16/02/2015 05:58 PM Marek Horst

#118 upgrading IIS dependencies to most recent snapshots

34530 16/02/2015 05:57 PM Marek Horst

#118 updating job.properties

34520 13/02/2015 07:01 PM Marek Horst

#118 introducing uoa-iis-websiteusage dependency in mainworkflows

34519 13/02/2015 07:00 PM Marek Horst

comments added

34516 13/02/2015 05:55 PM Marek Horst

#118 introducing mainworkflows_websiteusage_document_main workflow binding all subworkflows required to process logs and generate document similarities

34434 11/02/2015 02:26 PM Marek Horst

#1083 enabling webcrawl ingester module extracting FX field from plaintext before executing project reference extraction

34433 11/02/2015 02:26 PM Marek Horst

updating default job properties

34429 11/02/2015 02:15 PM Marek Horst

#720 fixing document classification algorithm confidence level distribution, switching mainworkflows pom dependency to the fixed document classification snapshot

34213 02/02/2015 06:22 PM Marek Horst

#1070 updating import_project_concepts_context_ids_csv default value to "fet-fp7,fet-h2020"

34212 02/02/2015 06:21 PM Marek Horst

#1070 introducing support for multiple context identifiers, replacing import_project_concepts_context_id IIS input parameter with import_project_concepts_context_ids_csv

34207 02/02/2015 05:39 PM Marek Horst

updating job.properties

34008 20/01/2015 04:34 PM Marek Horst

#1072 transition fix: replacing forking_skip_imported_data with export

34007 20/01/2015 03:56 PM Marek Horst

#1072 dropping IIS feature filtering out already existing project and dataset references from IIS export

33973 16/01/2015 06:45 PM Marek Horst

#1065 upgrading icm-iis-parent-container dependency from 1.0.0 to 1.0.1-SNAPSHOT after introducing FCT support

33972 16/01/2015 06:11 PM Marek Horst

#1065 upgrading uoa-iis-referenceextraction dependency from 1.0.0 to 1.0.1-SNAPSHOT after introducing FCT support

33731 30/12/2014 05:17 PM Marek Horst

[maven-release-plugin] prepare for next development iteration

33730 30/12/2014 05:17 PM Marek Horst

[maven-release-plugin] copy for tag icm-iis-mainworkflows-1.0.0

33729 30/12/2014 05:17 PM Marek Horst

[maven-release-plugin] prepare release icm-iis-mainworkflows-1.0.0

33728 30/12/2014 02:50 PM Marek Horst

changing snapshot dependencies to released ones

33622 17/12/2014 12:33 PM Marek Horst

#1044 upgrading dependencies to released versions and parent version to most recent snapshot for unreleased modules

33414 15/12/2014 12:46 PM Marek Horst

introducing scm definition

33398 15/12/2014 12:25 PM Marek Horst

updating job.properties

33355 11/12/2014 08:36 PM Marek Horst

updating job.properties

33249 09/12/2014 06:41 PM Marek Horst

#919 renaming DocumentToResearchInitiative to DocumentToConceptId and DocumentToResearchInitiatives to DocumentToConceptIds

33228 09/12/2014 11:02 AM Marek Horst

#1022 introducing PMC extracted document metadata collapser removing duplicates before sending output to PMC citation ingestion module

33218 05/12/2014 04:26 PM Marek Horst

#919 adding missing i/o ports related to FET projects reference extraction

33184 04/12/2014 04:09 PM Marek Horst

#919 enabling concepts matching for FET projects in mainworkflows: import, export, primary and preprocessing

33105 28/11/2014 06:13 PM Marek Horst

#1017 accepting ExtractedDocumentMetadata instead of DocumentText at PMC citation ingestion input. Aliging integration test and importer workflow.

33098 28/11/2014 04:27 PM Marek Horst

#1022 introducing extracted document metadata collapser at importing phase.
Propagating extracted document mentadata (including PMC ingested metadata) to processing part of workflow what can be exploited by citation matching module.
Introducing citations collapser in last stage of processing phase collapsing ingested citations with matched citations.

32943 21/11/2014 05:50 PM Marek Horst

#1017 introducing new PMC metadata ingestion currently extracing references, journal and pages fields.
Replacing DOM/XPath based citations ingestion with much faster SAX version. Changing pmidtooaid transformer utilizing ExtractedDocumentMetadata instead of parsing XML file. Enabling PMC metadata ingestion in common/import.

32829 17/11/2014 03:45 PM Marek Horst

#963 propagating dataset -> mdstore from import to exporting phase: importer produces DocumentToMDStore datasetore utilized by exporter module. Updating transformer definition to handle DocumentToMDStore instead of Identifier schema

32825 17/11/2014 03:43 PM Marek Horst

introducing separate citations json containing expected results, not enabled in workflow yet

32824 17/11/2014 03:42 PM Marek Horst

updating job.properties

32823 17/11/2014 03:42 PM Marek Horst

updating job.properties

32167 04/11/2014 02:04 PM Marek Horst

updating job.properties

32166 04/11/2014 02:01 PM Marek Horst

updating job.properties

32165 04/11/2014 02:01 PM Marek Horst

updating job.properties

32164 04/11/2014 02:00 PM Marek Horst

updating job.properties

32162 04/11/2014 01:44 PM Marek Horst

updating job.properties

32045 31/10/2014 02:59 PM Marek Horst

updating job.properties: adding metadataextraction_excluded_checksums=4f5cc34f137de4dc89766a9366ca66de,6495a568200b1cee40baa00072b1800a

32043 31/10/2014 02:45 PM Marek Horst

updating job.properties

32042 31/10/2014 02:45 PM Marek Horst

introducing support for active_existence_filter, set to true by default. Setting this parameter to false allows processing contents not having its counterpart among metadata records retrieved from HBase. This solution was required to e.g. process ubiquity contents which were not present in HBase dump metadata.

31846 28/10/2014 03:45 PM Marek Horst

fixing citations schema type

31835 28/10/2014 02:24 PM Marek Horst

updating job.properties

31759 27/10/2014 06:20 PM Marek Horst

renaming metadataextraction_excluded_ids to more appropriate metadataextraction_excluded_checksums

31758 27/10/2014 06:11 PM Marek Horst

#913 introducing support for max file size parameter, currently checked against Content-Lenght header