#1147 preserving newlines when ingesting plaintext from htmls. This will eliminate some of the false positives in reference extraction algorithms
merging trunk changes with IIS-CDH-5.3.0 branch
#1257 dropping schema generation related hacks in all map-reduce modules, switching to literal schema parameters
#1135 switching icm-iis-parent-container version to 1.0.1-SNAPSHOT in order to include workingDir related changes made in icm-iis-core
#1198 aligning IIS dependencies and java code to CDH5.3.0 cluster
#1197 introducing job.properties changes aligning paths to rumcajs cluster HDFS structure
creating IIS-CDH-5.3.0 branch
introducing branches folder
#1147 renaming toplaintext wf name with plaintext to be more appriopriate
#1147 renaming toplaintext dir name with plaintext to be more appriopriate
#1147 introducing first version of html->plaintext ingester utilizing jsoup library
#1047 renaming icm-iis-ingest-webcrawl SVN location to icm-iis-ingest
#1147 renaming icm-iis-ingest-webcrawl module to icm-iis-ingest to make it more generic so it could contain not only webcrawl related ingesters but html ingesters as well
#1038 introducing ranges in dependencies definition for all IIS modules
setting svn:ingore
#1083 introducing webcrawl ingester module extracting FX field from plaintext before executing project reference extraction
Share project "icm-iis-ingest-webcrawl" into "https://svn.driver.research-infrastructures.eu/driver"