The Dutch NewsReader pipeline¶
NAF layers¶
NAF annotations in the Dutch pipeline consist of the following layers:
- raw: raw text
- text: tokenized words
- terms: word senses combined with morphosyntactic information
- deps: dependency parses
- constituents: phrase-structure parses
- entities: people, locations, organizations and numeric expressions
- srl: semantic-role labels
- opinions: opinion triplets (holder, target, expression)
- factualities: annotates veracity or factuality of relevant expressions
- coreferences: marks coreferent term spans
- timeExpressions: standardized time expressions
Components¶
Our version of the Dutch NewsReader pipeline uses the following components:
- NAF formatting: text2naf
- tokenizing: ixa-pipe-tok
- POS tagging, lemmatization and parsing: vua-alpino
- named entity recognition: ixa-pipe-nerc
- named entity disambiguation: ixa-pipe-ned
- word sense disambiguation: vua-wsd
- time/date standardisation: vuheideltimewrapper
- predicate-matrix tagging: vua-ontotagging
- semantic role labelling: vua-srl-nl
- factuality: multilingual_factuality
- opinion mining: opinion_miner_deluxePP
- event coreference: EventCoreference
- nominal event detection: vua-nominal-event-detection
- nominal event srl labelling: vua-srl-dutch-nominal-events
- FrameNet labelling: vua-framenet-classifier
Component versions¶
The versions of the components used by the pipeline are stored in ./cfg/component_versions. This file is loaded by the installation script.
Component dependencies¶
Components either generate one or more layers or modify a layer. They depend on one or more input layers, and may also require specific components to be executed first, besides the components required to produce their input layers. The following table summarizes the dependencies of the Dutch NewsReader pipeline:
| component | input layers | required components | output layers |
|---|---|---|---|
| text2naf | raw | ||
| ixa-pipe-tok | raw | text | |
| vua-alpino | text | terms, deps, constituents | |
| ixa-pipe-nerc | text, terms | entities | |
| ixa-pipe-ned | entities | entities | |
| vuheideltimewrapper | text, terms | timeExpressions | |
| vua-wsd | text, terms | terms | |
| vua-ontotagging | terms | +vua-wsd | terms |
| vua-srl-nl | terms, deps, constituents | srl | |
| vua-framenet-classifier | terms, srl | +vua-srl-nl, vua-ontotagging | srl |
| vua-nominal-event-detection | srl, terms | srl | |
| vua-srl-dutch-nominal-events | terms, dependencies, srl | +vua-nominal-event-detection | srl |
| vua-eventcoreference | srl, terms | coreferences | |
| opinion-miner | text, terms, deps, constituents, entities | opinions | |
| multilingual-factuality | terms, coreferences, opinions | factualities |