updating IIS cache builder profile by removing obsolete properties which are currently defined in config-default.xml file on IIS cluster
import_hbase_dump_location parameter must by passed by the wf
dropping _default part from cache location parameters
updated iis cache builder library version
added LOD export configuration job profile
introducing ingest_pmc_default_cache_location parameter required by newly introduced pmc ingestion caching mechanism
updating metadataextraction cache location
removing obsolete properties which are controlled on IIS side in default-config.xml file, more details in #2177#note-8
added 'reports_external_path' property as indicated by Marek in #1356#note-19
updated iis V2 workflows definitions and relative action set profiles
updated IIS CDH5 specific job profiles
refinements in the cache builder workflow
updated action set profiles, introduced iisCacheBuilderJob, CDH5 specific inference Jobs
datasource with type "websource" will end up with typeui "other"
#2192: I can do it...job properties have key, not name
#2192: fixed profile and more logs
#2192: entityregistry::* should end up with "other" datas ource type label for the portal. PrepareReduceFeeder now expects a 'ui.other.datasourcetypes' job param with the list of datasource types to be handled this way.
[broker] hadoop job profile
removed empty SCAN element
added more jobs
fixed extra char
deleteSimRelJob updated to deleteDedupRelsJob
added new M/R Jobs
making schema validation happy
introduced HDFS Action related job profiles
added anchorStats job
added hadoop job profile for the openaire identifiers export workflow
document similarity threshold set to 0.7 instead of 0.8.
#1772: changed default trust thresholds
using new metadata cache location
introduced new hadoop job profiles (dedup)
use of external properties
use of Text instead of ImmutableBytesWritable
reimplemented calculatePersonDistribution M/R job to consider only the results from pubsrepositories (not journals)
added default threshold parameters. #1209
informationSpaceImportJob
updated compression parameters
compressing output
added information space export job
added hadoop jobs (dedup person)
MapDocument implements a more general view of the pace model
added trust level threshold for document similarity and document classes
new parameter for pdb inference module
added coauthor workflow and hadoop job
profiles to run calculate Person Distribution
updated job props
added workflow to export the representative publications as json on hdfs
updated primary iis job profile and workflow to the latest specs
merged branch dedupConf
#953 blacklisting da458477233b5561ae47042aa2a73086 content
#953 adding bea4728578070c3d66774bf9454d41fe checksum to blacklisted
attempt to define custom user names #1153
including merge relationship in duplicate scan phase
wf and hadoop job updates to support the exclusion of persons and duplicate records during the OAI feed.
indentation
added scheduler pool name
updated profiles
stats conf moved in the resp. HadoopJobConfiguration profile.
updating metadataextraction_excluded_checksums to 1e5b574109da731f4918c7f91fc24864 value
updated job profile
setting metadataextraction_excluded_checksums to $UNDEFINED$ which means no documents should be excluded
updated copytable job definition
renaming input parameter: metadataextraction_excluded_ids -> metadataextraction_excluded_checksums
added copytable job profile
added scanner caching
updated required parameters
added flags to enable/disable metadata extraction module
added action sets dedicated to each inference module
updated jobs specs
Context profiles will be fetched by the oozie process, so we pass the isLookupEndpoint as wf param
updated job definition
added stats update job profile
submittable M/R OAI feeding job
updated dedup/indexing configuration and the relative job definitions
updated IIS job interfaces
improved parameters management
map only job configuration used to feed the oai store
updated iisMain workflow configuration profile