Project

General

Profile

Statistics
| Revision:

# Date Author Comment
36306 10/04/2015 01:03 PM Marek Horst

#1257 dropping schema generation related hacks in all PIG modules, switching to literal schema parameters

35701 27/03/2015 06:18 AM Mateusz Kobos

Removing usage of working_dir from Java workflow node.

35517 19/03/2015 05:59 PM Marek Horst

#1210 introducing generic PIG module filtering inferred data by confidence level

35228 11/03/2015 01:14 PM Marek Horst

#1195 removing obsolete ports docreation and datasetid from hbase mapred import, removing references to those ports in workflow.xml files, updating transformer by removing filtering by datasetid due to decisions made in #1072

35151 06/03/2015 05:34 PM Marek Horst

introducing repetetive ordering of citations by ordering them by citation rawText

34993 03/03/2015 02:36 PM Marek Horst

#1169 fixing duplicate context issue, introducing integration test proving implemented solution works properly

34910 27/02/2015 06:50 PM Marek Horst

simplifying schema related PIG parameters

34909 27/02/2015 06:49 PM Marek Horst

simplifying schema related PIG parameters

34908 27/02/2015 06:48 PM Marek Horst

#1147 introducing union4 pig script

34695 20/02/2015 07:17 PM Marek Horst

#1133 dropping useless workfing_dir creation for java nodes

34687 20/02/2015 06:04 PM Marek Horst

#1133 dropping useless workfing_dir creation for pig nodes

34506 13/02/2015 02:12 PM Marek Horst

#118 introducing website usage community filter filtering out publication identifiers based on ids set retrieved from InformationSpace. This is required to exclude removed publications which were still present in logs.

34504 13/02/2015 01:07 PM Marek Horst

#118 removing obsolete and duplicate transformer

33665 18/12/2014 10:19 AM Marek Horst

updating job.properties

33245 09/12/2014 06:41 PM Marek Horst

#919 renaming DocumentToResearchInitiative to DocumentToConceptId and DocumentToResearchInitiatives to DocumentToConceptIds

33237 09/12/2014 02:13 PM Marek Horst

#1019 introducing integration test

33179 04/12/2014 01:29 PM Marek Horst

#919 introducing integration test input and output

33177 04/12/2014 12:08 PM Marek Horst

#919 introducing integration test containing empty input and output

33119 01/12/2014 01:33 PM Marek Horst

#919 introducing project to concept transformer module

32993 26/11/2014 03:57 PM Marek Horst

#1019 introducing PIG module transforming pmc ingested metadata into common extracted document metadata

32827 17/11/2014 03:45 PM Marek Horst

#963 propagating dataset -> mdstore from import to exporting phase: importer produces DocumentToMDStore datasetore utilized by exporter module. Updating transformer definition to handle DocumentToMDStore instead of Identifier schema

31843 28/10/2014 03:31 PM Marek Horst

#913 renaming DocumentContentUrl#contentSize to DocumentContentUrl#contentSizeKB changing field type from int to long, importing content size from ObjectStoreFile#fileSizeKB, updating dnet-objectstore-rmi dependency from 1.0.0 to 2.0.1-SNAPSHOT

31783 28/10/2014 11:50 AM Marek Horst

#913 supplementing json files with newly introduced DocumentContentUrl#contentSize field value set to null

31779 28/10/2014 11:29 AM Marek Horst

#913 introducing DocumentContentUrl#contentSize field, handling it properly in all PIG transformers

31226 08/10/2014 06:19 PM Marek Horst

#840 moving IdentifierMapping from importer to common package

31220 08/10/2014 06:12 PM Marek Horst

#840 renaming DeduplicationMapping to more generic IdentifierMapping

30897 26/09/2014 02:49 PM Marek Horst

adding missing affiliation fields: countryCode, address, renaming country to countryName

30896 26/09/2014 02:47 PM Marek Horst

adding missing affiliation fields: countryCode, address, renaming country to countryName

30188 16/09/2014 10:22 AM Marek Horst

#757 introducing doitooaid transformer processing DocumentMetadata datastore holding metadata imported from InformationSpace and creating datastore holding <doi,oaid> pairs which will be used by pmc ingestor for matching references identified by doi

30181 15/09/2014 05:31 PM Dominika Tkaczyk

null reference ids removed

30121 11/09/2014 12:44 PM Marek Horst

updating default job.properties

29936 02/09/2014 02:49 PM Marek Horst

removing memory related properties, fixing #757 should solve all memory related problems

29914 29/08/2014 06:29 PM Marek Horst

#568 introducing citations grouping by sourceDocumentId, still to be adjusted for ingested pmc citations outcome which currently seems to hang up

29906 29/08/2014 11:53 AM Marek Horst

#577 introducing UDF producing empty map, two transformers building common Citation datastore from citationmatching and pmc ingestion outcome. Both are required by collapser.

29482 23/07/2014 05:36 PM Marek Horst

introducing importer/plaintext/skip_extracted transformer required for plaintext import caching

29087 14/07/2014 02:08 PM Marek Horst

#354 removing obsolete transformers/export/person transformer along with tests

29084 14/07/2014 01:49 PM Marek Horst

#354 removing obsolete transformers/export/inferenced_document_without_imported_data transformer along with tests

29083 14/07/2014 01:21 PM Marek Horst

#354 removing obsolete transformers/export/identifier/referenceddatasets transformer along with tests

29080 14/07/2014 12:47 PM Marek Horst

#354 removing obsolete transformers/export/identifier/documents transformer along with tests

29079 14/07/2014 12:43 PM Marek Horst

#354 removing obsolete transformers/export/document transformer along with tests

28991 10/07/2014 04:23 PM Marek Horst

replacing redundant transformers/ingest/pmc/citations with already existing transformers/importer/documentmetadata/idextractor

28967 09/07/2014 01:12 PM Marek Horst

replacing redundant transformers/ingest/pmc/citations with already existing transformers/importer/documentmetadata/idextractor

28966 09/07/2014 01:02 PM Marek Horst

replacing redundant transformers/ingest/pmc/citations with already existing transformers/importer/documentmetadata/idextractor

28954 08/07/2014 05:14 PM Marek Horst

updating default job.properties

28953 08/07/2014 05:14 PM Marek Horst

updating default job.properties

28850 02/07/2014 07:08 PM Marek Horst

updating default job.properties

28800 02/07/2014 11:43 AM Marek Horst

adding missing "confidenceLevel" field

28799 02/07/2014 11:43 AM Marek Horst

adding missing "confidenceLevel" field

28798 02/07/2014 11:42 AM Marek Horst

adding missing "confidenceLevel" field

28796 02/07/2014 11:40 AM Marek Horst

adding missing "confidenceLevel" field

28795 02/07/2014 11:40 AM Marek Horst

adding missing "confidenceLevel" field