dnet45dhp-schemasdnet-hadoopdnet40dnet50
#1315 propagating confidenceLevel to DocumentToConceptIds. Updating PIG transformer script by introducing concept identifiers deduplication UDF function picking record with the highest confidence level, introducing unit and integration tests. Propagating changes in document to concepts exporter module.
#1329 adding affiliations field in ExtractedDocumentMetadata PMC schema. Metadata extraction code refactoring by extracting code responsible for building Affiliation avro records to AffiliationBuilder class and sharing it with pmc ingestion. Implementing affiliations ingestion functionality in PmcXmlHandler covered with unit tests. Adding affiliations field support in ingest pmc metadata transformer.
#1301 skipping transformation when input set to $UNDEFINED$ value
#1301 removing redundant schema parameter
#1301 introducing generic avro to json transformer
bugfix: adding missing start element
#1257 dropping schema generation related hacks in all PIG modules, switching to literal schema parameters
#1210 introducing generic PIG module filtering inferred data by confidence level
#1195 removing obsolete ports docreation and datasetid from hbase mapred import, removing references to those ports in workflow.xml files, updating transformer by removing filtering by datasetid due to decisions made in #1072
introducing repetetive ordering of citations by ordering them by citation rawText
View revisions
Also available in: Atom