Project

General

Profile

Statistics
| Revision:
Name Size Revision Age Author Comment
import.txt 780 Bytes 34434 over 9 years Marek Horst #1083 enabling webcrawl ingester module extract...
workflow.xml 19.8 KB 34914 about 9 years Marek Horst #1147 introducing HTML import and HTML plaintex...

Latest revisions

# Date Author Comment
34914 27/02/2015 07:34 PM Marek Horst

#1147 introducing HTML import and HTML plaintext ingestion in main workflows: primary and preprocessing

34434 11/02/2015 02:26 PM Marek Horst

#1083 enabling webcrawl ingester module extracting FX field from plaintext before executing project reference extraction

34213 02/02/2015 06:22 PM Marek Horst

#1070 updating import_project_concepts_context_ids_csv default value to "fet-fp7,fet-h2020"

34212 02/02/2015 06:21 PM Marek Horst

#1070 introducing support for multiple context identifiers, replacing import_project_concepts_context_id IIS input parameter with import_project_concepts_context_ids_csv

33184 04/12/2014 04:09 PM Marek Horst

#919 enabling concepts matching for FET projects in mainworkflows: import, export, primary and preprocessing

33098 28/11/2014 04:27 PM Marek Horst

#1022 introducing extracted document metadata collapser at importing phase.
Propagating extracted document mentadata (including PMC ingested metadata) to processing part of workflow what can be exploited by citation matching module.
Introducing citations collapser in last stage of processing phase collapsing ingested citations with matched citations.

32829 17/11/2014 03:45 PM Marek Horst

#963 propagating dataset -> mdstore from import to exporting phase: importer produces DocumentToMDStore datasetore utilized by exporter module. Updating transformer definition to handle DocumentToMDStore instead of Identifier schema

31759 27/10/2014 06:20 PM Marek Horst

renaming metadataextraction_excluded_ids to more appropriate metadataextraction_excluded_checksums

31758 27/10/2014 06:11 PM Marek Horst

#913 introducing support for max file size parameter, currently checked against Content-Lenght header

29731 31/07/2014 12:28 PM Marek Horst

#9059 reverting #717 change: shortening app_path for primary workflow due to the fix applied by Paweł on WF_JOBS MODIFY mysql table: canging varchar(255) to mediumtext.

View revisions

Also available in: Atom