Project

General

Profile

Statistics
| Revision:

# Date Author Comment
38142 09/07/2015 01:09 PM Marek Horst

#1147 preserving newlines when ingesting plaintext from htmls. This will eliminate some of the false positives in reference extraction algorithms

34906 27/02/2015 06:18 PM Marek Horst

#1147 introducing first version of html->plaintext ingester utilizing jsoup library

34897 27/02/2015 05:37 PM Marek Horst

#1047 renaming icm-iis-ingest-webcrawl SVN location to icm-iis-ingest

34431 11/02/2015 02:22 PM Marek Horst

#1083 introducing webcrawl ingester module extracting FX field from plaintext before executing project reference extraction