Project

General

Profile

Statistics
| Revision:

# Date Author Comment
38142 09/07/2015 01:09 PM Marek Horst

#1147 preserving newlines when ingesting plaintext from htmls. This will eliminate some of the false positives in reference extraction algorithms

36288 09/04/2015 07:10 PM Marek Horst

#1257 dropping schema generation related hacks in all map-reduce modules, switching to literal schema parameters

34912 27/02/2015 06:57 PM Marek Horst

#1147 renaming toplaintext wf name with plaintext to be more appriopriate

34911 27/02/2015 06:56 PM Marek Horst

#1147 renaming toplaintext dir name with plaintext to be more appriopriate

34906 27/02/2015 06:18 PM Marek Horst

#1147 introducing first version of html->plaintext ingester utilizing jsoup library

34897 27/02/2015 05:37 PM Marek Horst

#1047 renaming icm-iis-ingest-webcrawl SVN location to icm-iis-ingest

34431 11/02/2015 02:22 PM Marek Horst

#1083 introducing webcrawl ingester module extracting FX field from plaintext before executing project reference extraction