Project

General

Profile

Statistics
| Revision:

# Date Author Comment
38126 08/07/2015 06:17 PM Marek Horst

#1422 fixing Java Heap Space error while executing checksum postprocessing worfklow on pmc plaintexts

37882 19/06/2015 04:22 PM Marek Horst

merging trunk changes with IIS-CDH-5.3.0 branch

37874 19/06/2015 02:07 PM Marek Horst

#1381 porting pmc citations ingestion from cascading framework to pig. Moving code from icm-iis-ingest-pmc to icm-iis-transformers including itegration tests, removing obsolete scala code along with unneded dependencies. Switching subworkflow in primary workflow.

37652 08/06/2015 01:37 PM Marek Horst

expecting null affiliations instead of empty array

37651 08/06/2015 01:29 PM Marek Horst

adding missing affiliations field in input data, removing duplicates from outut

37594 29/05/2015 05:16 PM Marek Horst

adding missing affiliations field in integration test expected output

37368 21/05/2015 06:26 PM Marek Horst

#1315 propagating confidenceLevel to DocumentToConceptIds. Updating PIG transformer script by introducing concept identifiers deduplication UDF function picking record with the highest confidence level, introducing unit and integration tests. Propagating changes in document to concepts exporter module.

37360 21/05/2015 02:46 PM Marek Horst

removing obsolete test resources

37347 20/05/2015 06:49 PM Marek Horst

#1329 adding affiliations field in ExtractedDocumentMetadata PMC schema. Metadata extraction code refactoring by extracting code responsible for building Affiliation avro records to AffiliationBuilder class and sharing it with pmc ingestion. Implementing affiliations ingestion functionality in PmcXmlHandler covered with unit tests. Adding affiliations field support in ingest pmc metadata transformer.

37263 15/05/2015 12:57 PM Marek Horst

#1306 introducing dummy field in DocumentId schema required to overcome https://issues.apache.org/jira/browse/PIG-3358 issue. Handling dummy filed in transformer pig scripts when it is required. Should be reverted as soon as PIG-3358 issue is fixed

37258 14/05/2015 11:16 PM Marek Horst

#1312 wrapping tuple schema returned by outputSchema() method as described in PIG-3082

37181 12/05/2015 07:01 PM Marek Horst

removing oozie-sharelib-distcp dependency from pom.xml file and relying on oozie.use.system.libpath=true set among job.properties

37147 11/05/2015 07:31 PM Marek Horst

replacing icm-iis-3rdparty-pig-avrostorage dependency with original piggybank

37109 11/05/2015 02:07 PM Marek Horst

merging trunk changes with IIS-CDH-5.3.0 branch

36984 06/05/2015 04:09 PM Marek Horst

#1301 skipping transformation when input set to $UNDEFINED$ value

36980 06/05/2015 03:55 PM Marek Horst

#1301 removing redundant schema parameter

36970 06/05/2015 03:08 PM Marek Horst

#1301 introducing generic avro to json transformer

36455 17/04/2015 05:52 PM Marek Horst

bugfix: adding missing start element

36333 13/04/2015 01:30 PM Marek Horst

#1257 raising oozie.action.max.output.data to 8192

36306 10/04/2015 01:03 PM Marek Horst

#1257 dropping schema generation related hacks in all PIG modules, switching to literal schema parameters

35712 27/03/2015 09:46 AM Marek Horst

#1135 switching icm-iis-parent-container version to 1.0.1-SNAPSHOT in order to include workingDir related changes made in icm-iis-core

35701 27/03/2015 06:18 AM Mateusz Kobos

Removing usage of working_dir from Java workflow node.

35517 19/03/2015 05:59 PM Marek Horst

#1210 introducing generic PIG module filtering inferred data by confidence level

35411 17/03/2015 03:04 PM Marek Horst

#1198 aligning IIS dependencies and java code to CDH5.3.0 cluster

35397 17/03/2015 03:01 PM Marek Horst

#1197 introducing job.properties changes aligning paths to rumcajs cluster HDFS structure

35252 11/03/2015 04:49 PM Marek Horst

creating IIS-CDH-5.3.0 branch

35251 11/03/2015 04:49 PM Marek Horst

introducing branches folder

35228 11/03/2015 01:14 PM Marek Horst

#1195 removing obsolete ports docreation and datasetid from hbase mapred import, removing references to those ports in workflow.xml files, updating transformer by removing filtering by datasetid due to decisions made in #1072

35151 06/03/2015 05:34 PM Marek Horst

introducing repetetive ordering of citations by ordering them by citation rawText

34993 03/03/2015 02:36 PM Marek Horst

#1169 fixing duplicate context issue, introducing integration test proving implemented solution works properly

34910 27/02/2015 06:50 PM Marek Horst

simplifying schema related PIG parameters

34909 27/02/2015 06:49 PM Marek Horst

simplifying schema related PIG parameters

34908 27/02/2015 06:48 PM Marek Horst

#1147 introducing union4 pig script

34695 20/02/2015 07:17 PM Marek Horst

#1133 dropping useless workfing_dir creation for java nodes

34687 20/02/2015 06:04 PM Marek Horst

#1133 dropping useless workfing_dir creation for pig nodes

34617 19/02/2015 06:12 PM Marek Horst

#1038 introducing ranges in dependencies definition for all IIS modules

34506 13/02/2015 02:12 PM Marek Horst

#118 introducing website usage community filter filtering out publication identifiers based on ids set retrieved from InformationSpace. This is required to exclude removed publications which were still present in logs.

34504 13/02/2015 01:07 PM Marek Horst

#118 removing obsolete and duplicate transformer

33665 18/12/2014 10:19 AM Marek Horst

updating job.properties

33544 16/12/2014 12:20 PM Marek Horst

[maven-release-plugin] prepare for next development iteration

33543 16/12/2014 12:20 PM Marek Horst

[maven-release-plugin] copy for tag icm-iis-transformers-1.0.0

33542 16/12/2014 12:20 PM Marek Horst

[maven-release-plugin] prepare release icm-iis-transformers-1.0.0

33541 16/12/2014 11:49 AM Marek Horst

#1044 pre-release switching to released version of parent pom and released dependencies

33422 15/12/2014 12:51 PM Marek Horst

introducing scm definition

33245 09/12/2014 06:41 PM Marek Horst

#919 renaming DocumentToResearchInitiative to DocumentToConceptId and DocumentToResearchInitiatives to DocumentToConceptIds

33237 09/12/2014 02:13 PM Marek Horst

#1019 introducing integration test

33179 04/12/2014 01:29 PM Marek Horst

#919 introducing integration test input and output

33177 04/12/2014 12:08 PM Marek Horst

#919 introducing integration test containing empty input and output

33119 01/12/2014 01:33 PM Marek Horst

#919 introducing project to concept transformer module

32993 26/11/2014 03:57 PM Marek Horst

#1019 introducing PIG module transforming pmc ingested metadata into common extracted document metadata

32827 17/11/2014 03:45 PM Marek Horst

#963 propagating dataset -> mdstore from import to exporting phase: importer produces DocumentToMDStore datasetore utilized by exporter module. Updating transformer definition to handle DocumentToMDStore instead of Identifier schema

32244 05/11/2014 05:34 PM Marek Horst

introducing embedded integration test entry

31843 28/10/2014 03:31 PM Marek Horst

#913 renaming DocumentContentUrl#contentSize to DocumentContentUrl#contentSizeKB changing field type from int to long, importing content size from ObjectStoreFile#fileSizeKB, updating dnet-objectstore-rmi dependency from 1.0.0 to 2.0.1-SNAPSHOT

31783 28/10/2014 11:50 AM Marek Horst

#913 supplementing json files with newly introduced DocumentContentUrl#contentSize field value set to null

31779 28/10/2014 11:29 AM Marek Horst

#913 introducing DocumentContentUrl#contentSize field, handling it properly in all PIG transformers

31226 08/10/2014 06:19 PM Marek Horst

#840 moving IdentifierMapping from importer to common package

31220 08/10/2014 06:12 PM Marek Horst

#840 renaming DeduplicationMapping to more generic IdentifierMapping

31037 02/10/2014 02:29 PM Marek Horst

introducing cloudera repository in parent container, removing repository definitions from individual IIS modules

30897 26/09/2014 02:49 PM Marek Horst

adding missing affiliation fields: countryCode, address, renaming country to countryName

30896 26/09/2014 02:47 PM Marek Horst

adding missing affiliation fields: countryCode, address, renaming country to countryName

30431 17/09/2014 11:06 AM Sandro La Bruzzo

created tag folder for release

30188 16/09/2014 10:22 AM Marek Horst

#757 introducing doitooaid transformer processing DocumentMetadata datastore holding metadata imported from InformationSpace and creating datastore holding <doi,oaid> pairs which will be used by pmc ingestor for matching references identified by doi

30181 15/09/2014 05:31 PM Dominika Tkaczyk

null reference ids removed

30121 11/09/2014 12:44 PM Marek Horst

updating default job.properties

29936 02/09/2014 02:49 PM Marek Horst

removing memory related properties, fixing #757 should solve all memory related problems

29914 29/08/2014 06:29 PM Marek Horst

#568 introducing citations grouping by sourceDocumentId, still to be adjusted for ingested pmc citations outcome which currently seems to hang up

29906 29/08/2014 11:53 AM Marek Horst

#577 introducing UDF producing empty map, two transformers building common Citation datastore from citationmatching and pmc ingestion outcome. Both are required by collapser.

29482 23/07/2014 05:36 PM Marek Horst

introducing importer/plaintext/skip_extracted transformer required for plaintext import caching

29087 14/07/2014 02:08 PM Marek Horst

#354 removing obsolete transformers/export/person transformer along with tests

29084 14/07/2014 01:49 PM Marek Horst

#354 removing obsolete transformers/export/inferenced_document_without_imported_data transformer along with tests

29083 14/07/2014 01:21 PM Marek Horst

#354 removing obsolete transformers/export/identifier/referenceddatasets transformer along with tests

29080 14/07/2014 12:47 PM Marek Horst

#354 removing obsolete transformers/export/identifier/documents transformer along with tests

29079 14/07/2014 12:43 PM Marek Horst

#354 removing obsolete transformers/export/document transformer along with tests

28991 10/07/2014 04:23 PM Marek Horst

replacing redundant transformers/ingest/pmc/citations with already existing transformers/importer/documentmetadata/idextractor

28967 09/07/2014 01:12 PM Marek Horst

replacing redundant transformers/ingest/pmc/citations with already existing transformers/importer/documentmetadata/idextractor

28966 09/07/2014 01:02 PM Marek Horst

replacing redundant transformers/ingest/pmc/citations with already existing transformers/importer/documentmetadata/idextractor

28954 08/07/2014 05:14 PM Marek Horst

updating default job.properties

28953 08/07/2014 05:14 PM Marek Horst

updating default job.properties

28850 02/07/2014 07:08 PM Marek Horst

updating default job.properties

28800 02/07/2014 11:43 AM Marek Horst

adding missing "confidenceLevel" field

28799 02/07/2014 11:43 AM Marek Horst

adding missing "confidenceLevel" field

28798 02/07/2014 11:42 AM Marek Horst

adding missing "confidenceLevel" field

28796 02/07/2014 11:40 AM Marek Horst

adding missing "confidenceLevel" field

28795 02/07/2014 11:40 AM Marek Horst

adding missing "confidenceLevel" field

28777 01/07/2014 05:07 PM Marek Horst

introducing deploy.info file for module icm-iis-transformers