/modules/icm-iis-ingest/trunk/src/main - Changes - D-Net - D-Net project tracking tool

dnet40/modules/icm-iis-ingest/trunk/src/main @ 59226

#	Date	Author	Comment
38142	09/07/2015 01:09 PM	Marek Horst	#1147 preserving newlines when ingesting plaintext from htmls. This will eliminate some of the false positives in reference extraction algorithms
36288	09/04/2015 07:10 PM	Marek Horst	#1257 dropping schema generation related hacks in all map-reduce modules, switching to literal schema parameters
34912	27/02/2015 06:57 PM	Marek Horst	#1147 renaming toplaintext wf name with plaintext to be more appriopriate
34911	27/02/2015 06:56 PM	Marek Horst	#1147 renaming toplaintext dir name with plaintext to be more appriopriate
34906	27/02/2015 06:18 PM	Marek Horst	#1147 introducing first version of html->plaintext ingester utilizing jsoup library
34897	27/02/2015 05:37 PM	Marek Horst	#1047 renaming icm-iis-ingest-webcrawl SVN location to icm-iis-ingest
34431	11/02/2015 02:22 PM	Marek Horst	#1083 introducing webcrawl ingester module extracting FX field from plaintext before executing project reference extraction