Project

General

Profile

Statistics
| Revision:

# Date Author Comment
39054 05/09/2015 08:49 PM Marek Horst

#1498 adding missing position field

39049 04/09/2015 11:36 PM Marek Horst

#1498 introducing major citations related refactoring including new generic direct citation matching moved to processing phase, introduced position field in all citations schemas and updated collapser taking position into account when merging citations details coming from 3 variuos sources: fuzzy citationmatching, direct citationmatching, references metadata

38126 08/07/2015 06:17 PM Marek Horst

#1422 fixing Java Heap Space error while executing checksum postprocessing worfklow on pmc plaintexts

37874 19/06/2015 02:07 PM Marek Horst

#1381 porting pmc citations ingestion from cascading framework to pig. Moving code from icm-iis-ingest-pmc to icm-iis-transformers including itegration tests, removing obsolete scala code along with unneded dependencies. Switching subworkflow in primary workflow.

37368 21/05/2015 06:26 PM Marek Horst

#1315 propagating confidenceLevel to DocumentToConceptIds. Updating PIG transformer script by introducing concept identifiers deduplication UDF function picking record with the highest confidence level, introducing unit and integration tests. Propagating changes in document to concepts exporter module.

37347 20/05/2015 06:49 PM Marek Horst

#1329 adding affiliations field in ExtractedDocumentMetadata PMC schema. Metadata extraction code refactoring by extracting code responsible for building Affiliation avro records to AffiliationBuilder class and sharing it with pmc ingestion. Implementing affiliations ingestion functionality in PmcXmlHandler covered with unit tests. Adding affiliations field support in ingest pmc metadata transformer.

36984 06/05/2015 04:09 PM Marek Horst

#1301 skipping transformation when input set to $UNDEFINED$ value

36980 06/05/2015 03:55 PM Marek Horst

#1301 removing redundant schema parameter

36970 06/05/2015 03:08 PM Marek Horst

#1301 introducing generic avro to json transformer

36455 17/04/2015 05:52 PM Marek Horst

bugfix: adding missing start element

36306 10/04/2015 01:03 PM Marek Horst

#1257 dropping schema generation related hacks in all PIG modules, switching to literal schema parameters

35517 19/03/2015 05:59 PM Marek Horst

#1210 introducing generic PIG module filtering inferred data by confidence level

35228 11/03/2015 01:14 PM Marek Horst

#1195 removing obsolete ports docreation and datasetid from hbase mapred import, removing references to those ports in workflow.xml files, updating transformer by removing filtering by datasetid due to decisions made in #1072

35151 06/03/2015 05:34 PM Marek Horst

introducing repetetive ordering of citations by ordering them by citation rawText

34993 03/03/2015 02:36 PM Marek Horst

#1169 fixing duplicate context issue, introducing integration test proving implemented solution works properly

34910 27/02/2015 06:50 PM Marek Horst

simplifying schema related PIG parameters

34909 27/02/2015 06:49 PM Marek Horst

simplifying schema related PIG parameters

34908 27/02/2015 06:48 PM Marek Horst

#1147 introducing union4 pig script

34687 20/02/2015 06:04 PM Marek Horst

#1133 dropping useless workfing_dir creation for pig nodes

34506 13/02/2015 02:12 PM Marek Horst

#118 introducing website usage community filter filtering out publication identifiers based on ids set retrieved from InformationSpace. This is required to exclude removed publications which were still present in logs.

34504 13/02/2015 01:07 PM Marek Horst

#118 removing obsolete and duplicate transformer

33665 18/12/2014 10:19 AM Marek Horst

updating job.properties

33245 09/12/2014 06:41 PM Marek Horst

#919 renaming DocumentToResearchInitiative to DocumentToConceptId and DocumentToResearchInitiatives to DocumentToConceptIds

33119 01/12/2014 01:33 PM Marek Horst

#919 introducing project to concept transformer module

32993 26/11/2014 03:57 PM Marek Horst

#1019 introducing PIG module transforming pmc ingested metadata into common extracted document metadata

32827 17/11/2014 03:45 PM Marek Horst

#963 propagating dataset -> mdstore from import to exporting phase: importer produces DocumentToMDStore datasetore utilized by exporter module. Updating transformer definition to handle DocumentToMDStore instead of Identifier schema

31843 28/10/2014 03:31 PM Marek Horst

#913 renaming DocumentContentUrl#contentSize to DocumentContentUrl#contentSizeKB changing field type from int to long, importing content size from ObjectStoreFile#fileSizeKB, updating dnet-objectstore-rmi dependency from 1.0.0 to 2.0.1-SNAPSHOT

31779 28/10/2014 11:29 AM Marek Horst

#913 introducing DocumentContentUrl#contentSize field, handling it properly in all PIG transformers

31226 08/10/2014 06:19 PM Marek Horst

#840 moving IdentifierMapping from importer to common package

31220 08/10/2014 06:12 PM Marek Horst

#840 renaming DeduplicationMapping to more generic IdentifierMapping

30188 16/09/2014 10:22 AM Marek Horst

#757 introducing doitooaid transformer processing DocumentMetadata datastore holding metadata imported from InformationSpace and creating datastore holding <doi,oaid> pairs which will be used by pmc ingestor for matching references identified by doi

30181 15/09/2014 05:31 PM Dominika Tkaczyk

null reference ids removed

30121 11/09/2014 12:44 PM Marek Horst

updating default job.properties

29936 02/09/2014 02:49 PM Marek Horst

removing memory related properties, fixing #757 should solve all memory related problems

29914 29/08/2014 06:29 PM Marek Horst

#568 introducing citations grouping by sourceDocumentId, still to be adjusted for ingested pmc citations outcome which currently seems to hang up

29906 29/08/2014 11:53 AM Marek Horst

#577 introducing UDF producing empty map, two transformers building common Citation datastore from citationmatching and pmc ingestion outcome. Both are required by collapser.

29482 23/07/2014 05:36 PM Marek Horst

introducing importer/plaintext/skip_extracted transformer required for plaintext import caching

29087 14/07/2014 02:08 PM Marek Horst

#354 removing obsolete transformers/export/person transformer along with tests

29084 14/07/2014 01:49 PM Marek Horst

#354 removing obsolete transformers/export/inferenced_document_without_imported_data transformer along with tests

29083 14/07/2014 01:21 PM Marek Horst

#354 removing obsolete transformers/export/identifier/referenceddatasets transformer along with tests

29080 14/07/2014 12:47 PM Marek Horst

#354 removing obsolete transformers/export/identifier/documents transformer along with tests

29079 14/07/2014 12:43 PM Marek Horst

#354 removing obsolete transformers/export/document transformer along with tests

28991 10/07/2014 04:23 PM Marek Horst

replacing redundant transformers/ingest/pmc/citations with already existing transformers/importer/documentmetadata/idextractor

28954 08/07/2014 05:14 PM Marek Horst

updating default job.properties

28953 08/07/2014 05:14 PM Marek Horst

updating default job.properties

28850 02/07/2014 07:08 PM Marek Horst

updating default job.properties