#1147 preserving newlines when ingesting plaintext from htmls. This will eliminate some of the false positives in reference extraction algorithms
#1257 dropping schema generation related hacks in all map-reduce modules, switching to literal schema parameters
#1147 renaming toplaintext wf name with plaintext to be more appriopriate
#1147 renaming toplaintext dir name with plaintext to be more appriopriate
#1147 introducing first version of html->plaintext ingester utilizing jsoup library
#1047 renaming icm-iis-ingest-webcrawl SVN location to icm-iis-ingest
#1083 introducing webcrawl ingester module extracting FX field from plaintext before executing project reference extraction