merging trunk changes with IIS-CDH-5.3.0 branch
#1498 introducing major citations related refactoring including new generic direct citation matching moved to processing phase, introduced position field in all citations schemas and updated collapser taking position into account when merging citations details coming from 3 variuos sources: fuzzy citationmatching, direct citationmatching, references metadata
Removing old code related to Protocol Buffers
Long time ago, we used Protocol Buffers encapsulated in sequence files as the format for data stores. This cleanup is removing code related to this functionality.
#1302 merging 20150518_new_funding_model branch back to the trunk
merging trunk changes with 20150518_new_funding_model branch
#1395 WorkflowRuntimeParameters static fields cleanup, moving parameters to dedicated modules to prevent excessing icm-iis-common module modifications
merging 20150518_new_funding_model branch changes with IIS-CDH-5.3.0 branch in order to support new funding model in IIS-CDH-5.3.0 branch
concatenating identifiers with stringutils
introducing objectstores provider for manual object store identifiers retrieval
#1304 updating project imported unit test
#1304 updating both project importer modules, reading from rdb and hbase, by supporting new XML funding tree representation.
#1302 introducing 20150518_new_funding_model branch
#1302 concatenating funder with top level funding when building Project#fundingClass
#1302 introducing support for updated project model containing funding tree defined as XML instead of JSON, not enabled yet.
changing hbase-server dependency scope from provided to compile, apparently hbase-server is not available on CDH5 cluster
#1257 dropping schema generation related hacks in all map-reduce modules, switching to literal schema parameters
#1135 switching icm-iis-parent-container version to 1.0.1-SNAPSHOT in order to include workingDir related changes made in icm-iis-core
Removing usage of working_dir from Java workflow node.
#1208 upgrading dnet-openaireplus-mapping-utils dependency range to [3.0.0,4.0.0)
#1198 aligning IIS dependencies and java code to CDH5.3.0 cluster
#1197 introducing job.properties changes aligning paths to rumcajs cluster HDFS structure
creating IIS-CDH-5.3.0 branch
introducing branches folder
#1195 removing obsolete ports docreation and datasetid from hbase mapred import, removing references to those ports in workflow.xml files, updating transformer by removing filtering by datasetid due to decisions made in #1072
#1133 dropping useless workfing_dir creation for java nodes
#1038 introducing ranges in dependencies definition for all IIS modules
#1038 reintroducing ranges in dependencies definition for all non-iis dnet modules
updating job.properties
#1109 fixing building excluded acronym values
updating default job properties
#1109 utilizing isAcronymValid() method in relational db importer. skipping project grant id whenever code is empty
#1109 making isAcronymValid() method public so it could be utilized by relational db importer as well
#1109 introducing support for multiple acronym values to be skipped, currently set to 'unknown' and 'undefined' values.
#1070 updating import_project_concepts_context_ids_csv default value to "fet-fp7,fet-h2020"
#1070 introducing support for multiple context identifiers, replacing import_project_concepts_context_id IIS input parameter with import_project_concepts_context_ids_csv
#1065 updating job properties
[maven-release-plugin] prepare for next development iteration
[maven-release-plugin] copy for tag icm-iis-import-1.0.0
[maven-release-plugin] prepare release icm-iis-import-1.0.0
#1044 pre-release switching to released version of parent pom and released dependencies
introducing scm definition
#1038 upgrading dnet dependencies to latest released versions listed by Claudio in #1038#note-3
#919 removing context_id default value from workflow.xml definition
#968 aligning IIS importer with ObjectStore#deliverObjects() API method changes
#919 supporting multiple profiles in concept importer, logging error instead of throwing exception when profile not found
removing obsolete package
#919 introducing Concept schema and importer module producing avro datastore based on XML profile
#963 propagating dataset -> mdstore from import to exporting phase: importer produces DocumentToMDStore datasetore utilized by exporter module. Updating transformer definition to handle DocumentToMDStore instead of Identifier schema
updating mdstore test
introducing embedded integration test entry
#118 introducing piwik logs importer module
#913 updating dnet-objectstore-rmi dependency from 2.0.1-SNAPSHOT to 2.0.0
#913 renaming DocumentContentUrl#contentSize to DocumentContentUrl#contentSizeKB changing field type from int to long, importing content size from ObjectStoreFile#fileSizeKB, updating dnet-objectstore-rmi dependency from 1.0.0 to 2.0.1-SNAPSHOT
#913 temporarily setting contentSize to -1 in ObjectStore DocumentContentURL importer module until ObjectStore exposes proper size value
removing redundant logging
logging added: imported -> processed, not all of them were imported
logging added: presenting total number of imported records
logging added when id type of id value is null and record is not written
#883 introducing support for blacklisting object store identifiers
bugfixing citations converter by prefixing identifier with 50| prefix which was removed when exporting destination document id in BLOB exporter
introducing support for handling update column qualifiers holding inferenced data, disabled by default
fixing NullPointerException in citations exporter
introducing regex support in result approver to support iis::* kind of provenance, updating workflow definitions with proper regex values
#840 moving IdentifierMapping from importer to common package
#840 renaming DeduplicationMapping to more generic IdentifierMapping
imports cleanup
#637 treemap->hashmap, order is not preserved anyway
#637 introducing ISLookup based vocabulary importer
introducing cloudera repository in parent container, removing repository definitions from individual IIS modules
removing obsolete comment
#433 introducing natural citations ordering
updating test
updating default job.properties
#799 aligning dataset importer test with recent changes
#799 updating header name from header to oai:header. Introducing additional check verifying empty id.
introducing datadump provider for obtaining contents
#780 fixing dependecy issues after recent CNR modules release by sticking to released versions of CNR modules in icm-iis-parent-container, icm-iis-import and icm-iis-export-actionmanager modules
created tag folder for release
moving ACM importer to icm-iis-mainworkflows due to extending dependances with cermine, introducing performance tests
removing 'import' directory creation and removal which was obsolete
checking whether trust level is empty before comparing to predefined threshold
introducing trust level threshold support when importing information space data
extending description
introducing shared citation ExtraData XML model in icm-iis-common, implementing citation importer in mapred_import workflow, implementing exporter module
supporting $UNDEFINED$ value in IMPORT_INFERENCE_PROVENANCE_BLACKLIST
fixing dirs creation: removing obsolete directories
#527 introducing ACM XML dump importer module importing bibliographic references for further citation-matching analysis
extending progress log interval from 10 000 to 100 000
fixing importing abstract after introducing fieldApprover for all Result fields
introducing fieldApprover for all Result fields