#1257 dropping schema generation related hacks in all PIG modules, switching to literal schema parameters
Removing usage of working_dir from Java workflow node.
#1210 introducing generic PIG module filtering inferred data by confidence level
#1195 removing obsolete ports docreation and datasetid from hbase mapred import, removing references to those ports in workflow.xml files, updating transformer by removing filtering by datasetid due to decisions made in #1072
introducing repetetive ordering of citations by ordering them by citation rawText
#1169 fixing duplicate context issue, introducing integration test proving implemented solution works properly
simplifying schema related PIG parameters
#1147 introducing union4 pig script
#1133 dropping useless workfing_dir creation for java nodes
#1133 dropping useless workfing_dir creation for pig nodes
#118 introducing website usage community filter filtering out publication identifiers based on ids set retrieved from InformationSpace. This is required to exclude removed publications which were still present in logs.
#118 removing obsolete and duplicate transformer
updating job.properties
#919 renaming DocumentToResearchInitiative to DocumentToConceptId and DocumentToResearchInitiatives to DocumentToConceptIds
#1019 introducing integration test
#919 introducing integration test input and output
#919 introducing integration test containing empty input and output
#919 introducing project to concept transformer module
#1019 introducing PIG module transforming pmc ingested metadata into common extracted document metadata
#963 propagating dataset -> mdstore from import to exporting phase: importer produces DocumentToMDStore datasetore utilized by exporter module. Updating transformer definition to handle DocumentToMDStore instead of Identifier schema
#913 renaming DocumentContentUrl#contentSize to DocumentContentUrl#contentSizeKB changing field type from int to long, importing content size from ObjectStoreFile#fileSizeKB, updating dnet-objectstore-rmi dependency from 1.0.0 to 2.0.1-SNAPSHOT
#913 supplementing json files with newly introduced DocumentContentUrl#contentSize field value set to null
#913 introducing DocumentContentUrl#contentSize field, handling it properly in all PIG transformers
#840 moving IdentifierMapping from importer to common package
#840 renaming DeduplicationMapping to more generic IdentifierMapping
adding missing affiliation fields: countryCode, address, renaming country to countryName
#757 introducing doitooaid transformer processing DocumentMetadata datastore holding metadata imported from InformationSpace and creating datastore holding <doi,oaid> pairs which will be used by pmc ingestor for matching references identified by doi
null reference ids removed
updating default job.properties
removing memory related properties, fixing #757 should solve all memory related problems
#568 introducing citations grouping by sourceDocumentId, still to be adjusted for ingested pmc citations outcome which currently seems to hang up
#577 introducing UDF producing empty map, two transformers building common Citation datastore from citationmatching and pmc ingestion outcome. Both are required by collapser.
introducing importer/plaintext/skip_extracted transformer required for plaintext import caching
#354 removing obsolete transformers/export/person transformer along with tests
#354 removing obsolete transformers/export/inferenced_document_without_imported_data transformer along with tests
#354 removing obsolete transformers/export/identifier/referenceddatasets transformer along with tests
#354 removing obsolete transformers/export/identifier/documents transformer along with tests
#354 removing obsolete transformers/export/document transformer along with tests
replacing redundant transformers/ingest/pmc/citations with already existing transformers/importer/documentmetadata/idextractor
adding missing "confidenceLevel" field