Project

General

Profile

Statistics
| Revision:

# Date Author Comment
39164 10/09/2015 06:19 PM Marek Horst

merging trunk changes with IIS-CDH-5.3.0 branch

39127 09/09/2015 08:53 AM Marek Horst

renaming test resources to be compliant with windows file system naming requirements: replacing '|' with '_'

39089 08/09/2015 03:09 PM Marek Horst

renaming test resources to be compliant with windows file system naming requirements

39057 05/09/2015 09:42 PM Marek Horst

fixing destination id in expected citation record

39056 05/09/2015 09:22 PM Marek Horst

updating fundingtree value to xml representation and changing expected fundingclass as outcome

39053 05/09/2015 03:34 PM Marek Horst

#1498 adding missing propagate configuration element

39050 04/09/2015 11:47 PM Marek Horst

#1498 adding missing collapsers_basic_collapser in imports.txt file

39043 04/09/2015 11:26 PM Marek Horst

#1498 introducing major citations related refactoring including new generic direct citation matching moved to processing phase, introduced position field in all citations schemas and updated collapser taking position into account when merging citations details coming from 3 variuos sources: fuzzy citationmatching, direct citationmatching, references metadata

38872 31/08/2015 12:23 PM Marek Horst

updating job.properties

38122 08/07/2015 05:36 PM Marek Horst

updating job.properties

38032 30/06/2015 06:49 PM Marek Horst

updating job.properties

38007 29/06/2015 03:14 PM Marek Horst

#1397 removing obsolete parameters in subworkflow actions definitions

37976 26/06/2015 05:48 PM Marek Horst

#1209 introducing support for trust level thresholds provided as IIS input parameter

37972 26/06/2015 04:05 PM Marek Horst

removing obsolete quick run workflows

37961 25/06/2015 01:26 PM Marek Horst

updating job.properties

37960 25/06/2015 12:57 PM Marek Horst

updating job.properties

37956 24/06/2015 04:51 PM Marek Horst

updating job.properties

37947 24/06/2015 12:13 PM Marek Horst

#1212 updating classification test expected results after fixing typo: dccclasses->ddcclasses in taxonomies.db

37918 22/06/2015 04:00 PM Marek Horst

#1383 replacing explicitly defined test cluster properties with init-test-cluster-config maven profile usage

37910 22/06/2015 12:39 PM Marek Horst

updating properties

37892 19/06/2015 06:20 PM Marek Horst

merging trunk changes with IIS-CDH-5.3.0 branch

37883 19/06/2015 04:35 PM Marek Horst

updating job.properties

37873 19/06/2015 02:06 PM Marek Horst

#1381 porting pmc citations ingestion from cascading framework to pig. Moving code from icm-iis-ingest-pmc to icm-iis-transformers including itegration tests, removing obsolete scala code along with unneded dependencies. Switching subworkflow in primary workflow.

37872 19/06/2015 01:48 PM Marek Horst

updating job.properties

37780 15/06/2015 12:29 PM Marek Horst

updating expected classes, setting acm classes

37777 15/06/2015 11:14 AM Marek Horst

updating expected classes

37585 29/05/2015 04:17 PM Marek Horst

#1339 fixing input_dedup_map in pmc citation ingestion when match_content_with_metadata=false. Should not be set dynamically but statically, it will be enabled only when metadata_import is enabled

37561 29/05/2015 02:27 PM Marek Horst

#1339 replacing active_existence_filter flag with match_content_with_metadata and changing identifiers matching logic: when flag is disabled neither contents identifiers will be filtered nor deduplicated against metadata identifiers. Up unit now, when active_existence_filter flag was disabled contents were deduplicated which is not desired when running IIS in standalone mode on contents having their representatives in HBase

37533 28/05/2015 04:16 PM Marek Horst

#1329 enabling pmc ingestion when active_metadataextraction_export flag is enabled

37470 26/05/2015 10:30 AM Marek Horst

introducing missing pdb reference extraction missing parameters

37469 26/05/2015 10:25 AM Marek Horst

bugfix: renaming obsolete decision-export to decision-export-to-hbase

37464 25/05/2015 10:13 PM Marek Horst

#1308 reverting uri:oozie:distcp-action:0.2 change: version is not properly recognized by oozie 3.3.2-cdh4.3.1

37432 25/05/2015 01:15 PM Marek Horst

#1260 enabling document to protein databank reference extraction in primary workflow, supporting 3 new parameters: active_referenceextraction_pdb, export_action_set_id_document_pdb, export_referenceextraction_pdb_url_root

37414 22/05/2015 05:32 PM Marek Horst

#1315 providing missing confidenceLevel

37408 22/05/2015 02:41 PM Marek Horst

#1315 providing missing confidenceLevel

37392 22/05/2015 01:13 PM Marek Horst

#1315 updating expected jsons in integration test after DocumentToConceptIds schema refactoring

37231 14/05/2015 11:56 AM Marek Horst

#1301 introducing explicit export mode flags: active_export_to_hbase and active_export_to_json. This way both exports can be enabled or both of them can be disabled.

37194 13/05/2015 12:49 PM Marek Horst

#1308 switching distcp namespace to uri:oozie:distcp-action:0.2

37180 12/05/2015 06:53 PM Marek Horst

reverting 37153 rev change by removing oozie-sharelib-distcp dependency from pom.xml file and relying on oozie.use.system.libpath=true set among job.properties

37179 12/05/2015 06:47 PM Marek Horst

updating job.properties

37174 12/05/2015 03:37 PM Marek Horst

removing log.txt

37159 12/05/2015 01:05 PM Marek Horst

disabling provided scope for hbase-client dependency

37153 11/05/2015 08:44 PM Marek Horst

adding oozie-sharelib-distcp dependency missing in cdh5

37134 11/05/2015 05:28 PM Marek Horst

merging trunk changes with IIS-CDH-5.3.0 branch

37095 11/05/2015 11:46 AM Marek Horst

disabling export by setting active_export flag to false. Results will be converted to JSON records

37026 07/05/2015 01:27 PM Marek Horst

#1301 introducing common/export_to_json and utilizing this subworkflow in both primary and preprocessing workflows executing it when active_export=false which means hbase export is disabled

36989 06/05/2015 04:43 PM Marek Horst

#118 explicitly defining input_document_websiteusage_similarity parameter. This is not a bug fix because exporter works properly without explicitly defining input port due to propagate-configuration mode but we should have all input port definitions aligned to avoid confusions.

36473 20/04/2015 12:53 PM Marek Horst

bugfixing existing fault removal which was missing

36469 20/04/2015 10:53 AM Marek Horst

bugfixing existing fault removal which was missing

36450 17/04/2015 04:38 PM Marek Horst

upgrading xmlns version to 0.4 in order to support global element

36443 17/04/2015 02:56 PM Marek Horst

setting false to remove_sideproducts, otherwise whole workingDir will be erased

36435 17/04/2015 11:44 AM Marek Horst

changing home dir to /mnt/tmp

36404 16/04/2015 12:19 PM Marek Horst

updating deploy.info with new IIS test cluster parameters

36338 13/04/2015 01:33 PM Marek Horst

#1257 raising oozie.action.max.output.data to 8192

36286 09/04/2015 07:10 PM Marek Horst

#1257 dropping schema generation related hacks in all map-reduce modules, switching to literal schema parameters

35985 03/04/2015 01:11 PM Marek Horst

updating job.properties, disabling all algorithms by default

35967 02/04/2015 10:17 PM Marek Horst

#1248 fixing transition node name postprocessing-joining to merge-joining

35946 02/04/2015 05:52 PM Marek Horst

#1248 introducing fault subdirectory support in all workflows wrapping metadataextraction subworkflow up to the processing and primary root workflows. This should prevent fault directory from being removed when ${remove_sideproducts} flag is enabled, it will be propagated along with metadata and plaintext.

35935 02/04/2015 03:59 PM Marek Horst

updating job.properties

35701 27/03/2015 06:18 AM Mateusz Kobos

Removing usage of working_dir from Java workflow node.

35437 18/03/2015 11:40 AM Marek Horst
35436 18/03/2015 11:39 AM Marek Horst

updating README file

35427 17/03/2015 07:08 PM Marek Horst

#1187 moving changelog contents to redmine wiki

35404 17/03/2015 03:03 PM Marek Horst

#1198 aligning IIS dependencies and java code to CDH5.3.0 cluster

35391 17/03/2015 03:00 PM Marek Horst

#1197 introducing job.properties changes aligning paths to rumcajs cluster HDFS structure

35244 11/03/2015 04:43 PM Marek Horst

creating IIS-CDH-5.3.0 branch

35243 11/03/2015 04:43 PM Marek Horst
35232 11/03/2015 02:19 PM Marek Horst

reenabling document to project reference import validation

35231 11/03/2015 01:55 PM Marek Horst

updating expected documents list

35230 11/03/2015 01:30 PM Marek Horst

temporarily skipping docproject validation

35229 11/03/2015 01:14 PM Marek Horst

#1195 removing obsolete ports docreation and datasetid from hbase mapred import, removing references to those ports in workflow.xml files, updating transformer by removing filtering by datasetid due to decisions made in #1072

35200 09/03/2015 07:07 PM Marek Horst

fixing json escape character by putting \\ in place of \

35199 09/03/2015 06:45 PM Marek Horst

extending mapreduce metadata importer test with validating import of different kind of relations and dataset identifier

35198 09/03/2015 06:44 PM Marek Horst

extending mapreduce metadata importer test with validating import of different kind of relations and dataset identifier

35191 09/03/2015 04:52 PM Marek Horst

removing obsolete citations

35189 09/03/2015 04:24 PM Marek Horst

updating confidence level value to 1.0 for record coming from PMC

35187 09/03/2015 04:21 PM Marek Horst

removing obsolete pdf directory

35183 09/03/2015 03:16 PM Marek Horst

adding missing "confidenceLevel" field

35178 09/03/2015 02:39 PM Marek Horst

#1187 introducing IIS changelog

35153 06/03/2015 06:25 PM Marek Horst

maintaining pmc citation and testing citations merging process

35152 06/03/2015 05:36 PM Marek Horst

reintroducing multiple citations after introducing sorting in transformer

35149 06/03/2015 05:25 PM Marek Horst

limiting citations count to 1 until results order produced by citation matching module is repetitive

35146 06/03/2015 04:13 PM Marek Horst

including: FET project reference extraction, EGI case, dataset reference extraction outcome validation

35144 06/03/2015 03:06 PM Marek Horst

enabling citation matching algorithm

35143 06/03/2015 03:06 PM Marek Horst

updating expected citations

35122 05/03/2015 06:47 PM Marek Horst

removing comment

35120 05/03/2015 06:46 PM Marek Horst

primary processing integration test major refactoring: dropping cermine execution and providing plaintext and extracted metadata as json records

35057 04/03/2015 05:30 PM Marek Horst

#1176 defining remove_sideproducts property in workflows headers

35048 04/03/2015 04:44 PM Marek Horst

#1176 introducing side products removal in common import by maintaining remove_sideproducts flag set to true by default.
Notice: do not provide any output directory location pointing to workingDir subdirectory!

35042 04/03/2015 03:01 PM Marek Horst

removing duplicate collapser import and aligning worklfow definition

35031 04/03/2015 01:08 PM Marek Horst

#1172 introducing support for active_export parameter in both preprocessing and primary workflows

35030 04/03/2015 12:16 PM Marek Horst

updating job.properties

34958 02/03/2015 05:21 PM Marek Horst

#1153 utilizing ${user.name} placeholder in ${workingDir} generation process, copying version.properties from oozie_app to mark execution environment with application version

34914 27/02/2015 07:34 PM Marek Horst

#1147 introducing HTML import and HTML plaintext ingestion in main workflows: primary and preprocessing

34896 27/02/2015 05:36 PM Marek Horst

#1147 renaming icm-iis-ingest-webcrawl module to icm-iis-ingest to make it more generic so it could contain not only webcrawl related ingesters but html ingesters as well

34893 27/02/2015 05:32 PM Marek Horst

updating job.properties

34876 27/02/2015 04:08 PM Marek Horst

overriding memory parameter due to test cluster memory limitations, setting it to Xmx256m

34875 27/02/2015 03:24 PM Marek Horst

overriding memory parameter due to test cluster memory limitations, setting it to Xmx128m

34871 27/02/2015 02:49 PM Marek Horst

overriding memory parameter due to test cluster memory limitations, setting it to Xmx512m

34869 27/02/2015 02:48 PM Marek Horst

updating expected classes in integration test after recent #720 change and fixing confidence level distribution