dnet45dhp-schemasdnet-hadoopdnet40dnet50
#1498 introducing major citations related refactoring including new generic direct citation matching moved to processing phase, introduced position field in all citations schemas and updated collapser taking position into account when merging citations details coming from 3 variuos sources: fuzzy citationmatching, direct citationmatching, references metadata
#1212 updating taxomonies database, introducing acm taxonomy classification, introducing acm classes support in exporter module, updating integration tests
#1315 propagating confidenceLevel to DocumentToConceptIds. Updating PIG transformer script by introducing concept identifiers deduplication UDF function picking record with the highest confidence level, introducing unit and integration tests. Propagating changes in document to concepts exporter module.
#1329 adding affiliations field in ExtractedDocumentMetadata PMC schema. Metadata extraction code refactoring by extracting code responsible for building Affiliation avro records to AffiliationBuilder class and sharing it with pmc ingestion. Implementing affiliations ingestion functionality in PmcXmlHandler covered with unit tests. Adding affiliations field support in ingest pmc metadata transformer.
#1247 renaming inputEntityId to inputObjectId because not all objects are entities (e.g. metadataextraction input)
#1247 renaming id field to more descriptive inputEntityId
#1247 introducing third draft of Fault avro schema: adding missing stracktrace
#1247 introducing second draft of Fault avro schema: refactoring recursive causes to array of causes
#1247 introducing first draft of Fault avro schema
#118 introducing madis based communities generation for website usage analysis
View revisions
Also available in: Atom