#1147 preserving newlines when ingesting plaintext from htmls. This will eliminate some of the false positives in reference extraction algorithms
#1147 introducing first version of html->plaintext ingester utilizing jsoup library
#1047 renaming icm-iis-ingest-webcrawl SVN location to icm-iis-ingest
#1083 introducing webcrawl ingester module extracting FX field from plaintext before executing project reference extraction