merging trunk changes with IIS-CDH-5.3.0 branch
#1306 introducing dummy field in DocumentId schema required to overcome https://issues.apache.org/jira/browse/PIG-3358 issue. Handling dummy filed in transformer pig scripts when it is required. Should be reverted as soon as PIG-3358 issue is fixed
#1312 wrapping tuple schema returned by outputSchema() method as described in PIG-3082
removing oozie-sharelib-distcp dependency from pom.xml file and relying on oozie.use.system.libpath=true set among job.properties
replacing icm-iis-3rdparty-pig-avrostorage dependency with original piggybank
#1198 aligning IIS dependencies and java code to CDH5.3.0 cluster
#1197 introducing job.properties changes aligning paths to rumcajs cluster HDFS structure
creating IIS-CDH-5.3.0 branch
#1195 removing obsolete ports docreation and datasetid from hbase mapred import, removing references to those ports in workflow.xml files, updating transformer by removing filtering by datasetid due to decisions made in #1072
introducing repetetive ordering of citations by ordering them by citation rawText
#1169 fixing duplicate context issue, introducing integration test proving implemented solution works properly
simplifying schema related PIG parameters
#1147 introducing union4 pig script
#1133 dropping useless workfing_dir creation for java nodes
#1133 dropping useless workfing_dir creation for pig nodes
#1038 introducing ranges in dependencies definition for all IIS modules
#118 introducing website usage community filter filtering out publication identifiers based on ids set retrieved from InformationSpace. This is required to exclude removed publications which were still present in logs.
#118 removing obsolete and duplicate transformer
updating job.properties
[maven-release-plugin] prepare for next development iteration
[maven-release-plugin] prepare release icm-iis-transformers-1.0.0
#1044 pre-release switching to released version of parent pom and released dependencies
introducing scm definition
#919 renaming DocumentToResearchInitiative to DocumentToConceptId and DocumentToResearchInitiatives to DocumentToConceptIds
#1019 introducing integration test
#919 introducing integration test input and output
#919 introducing integration test containing empty input and output
#919 introducing project to concept transformer module
#1019 introducing PIG module transforming pmc ingested metadata into common extracted document metadata
#963 propagating dataset -> mdstore from import to exporting phase: importer produces DocumentToMDStore datasetore utilized by exporter module. Updating transformer definition to handle DocumentToMDStore instead of Identifier schema
introducing embedded integration test entry
#913 renaming DocumentContentUrl#contentSize to DocumentContentUrl#contentSizeKB changing field type from int to long, importing content size from ObjectStoreFile#fileSizeKB, updating dnet-objectstore-rmi dependency from 1.0.0 to 2.0.1-SNAPSHOT
#913 supplementing json files with newly introduced DocumentContentUrl#contentSize field value set to null
#913 introducing DocumentContentUrl#contentSize field, handling it properly in all PIG transformers
#840 moving IdentifierMapping from importer to common package
#840 renaming DeduplicationMapping to more generic IdentifierMapping
introducing cloudera repository in parent container, removing repository definitions from individual IIS modules
adding missing affiliation fields: countryCode, address, renaming country to countryName
#757 introducing doitooaid transformer processing DocumentMetadata datastore holding metadata imported from InformationSpace and creating datastore holding <doi,oaid> pairs which will be used by pmc ingestor for matching references identified by doi
null reference ids removed
updating default job.properties
removing memory related properties, fixing #757 should solve all memory related problems
#568 introducing citations grouping by sourceDocumentId, still to be adjusted for ingested pmc citations outcome which currently seems to hang up
#577 introducing UDF producing empty map, two transformers building common Citation datastore from citationmatching and pmc ingestion outcome. Both are required by collapser.
introducing importer/plaintext/skip_extracted transformer required for plaintext import caching
#354 removing obsolete transformers/export/person transformer along with tests
#354 removing obsolete transformers/export/inferenced_document_without_imported_data transformer along with tests
#354 removing obsolete transformers/export/identifier/referenceddatasets transformer along with tests
#354 removing obsolete transformers/export/identifier/documents transformer along with tests
#354 removing obsolete transformers/export/document transformer along with tests
replacing redundant transformers/ingest/pmc/citations with already existing transformers/importer/documentmetadata/idextractor
adding missing "confidenceLevel" field
introducing deploy.info file for module icm-iis-transformers