salsa.corpora.processing
Class CorpusProcessor

java.lang.Object
  extended by salsa.corpora.processing.CorpusProcessor

public class CorpusProcessor
extends java.lang.Object

CorpusProcessor provides different methods to process a SalsaXML Corpus.

Author:
Fabian Shirokov

Constructor Summary
CorpusProcessor(Corpus corpus)
          Default constructor that takes the Corpus as an argument.
 
Method Summary
 java.util.ArrayList<Frame> getAllAnnotatedFrames()
          Returns a list of all Frames that stand for frame annotations.
 java.util.ArrayList<Id> getAllIds()
          Returns a list of all values of 'id' and 'idref' attributes that are represented as an Id in the Corpus.
 java.util.ArrayList<Nonterminal> getAllNonterminalsInCorpus()
          Returns a list of all Nonterminal elements that are contained in any Sentence in the Corpus.
 java.util.Set<Terminal> getAllTerminals(java.util.ArrayList<Fenode> allFenodes)
          Returns a list of Terminal elements that correspond to the given set of Fenode elements.
 java.util.ArrayList<Terminal> getAllTerminalsInCorpus()
          Returns a list of all Terminal elements that are contained in any Sentence in the Corpus.
 java.util.HashMap<java.lang.String,java.lang.String> getSentenceIdMapping()
          Returns a mapping of all old sentence id's to new sentence ids.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CorpusProcessor

public CorpusProcessor(Corpus corpus)
Default constructor that takes the Corpus as an argument.

Method Detail

getAllAnnotatedFrames

public java.util.ArrayList<Frame> getAllAnnotatedFrames()
Returns a list of all Frames that stand for frame annotations. The Frames in the 'head' part of the XML file are being ignored.

Returns:
allFrames

getAllIds

public java.util.ArrayList<Id> getAllIds()
Returns a list of all values of 'id' and 'idref' attributes that are represented as an Id in the Corpus.


getAllNonterminalsInCorpus

public java.util.ArrayList<Nonterminal> getAllNonterminalsInCorpus()
Returns a list of all Nonterminal elements that are contained in any Sentence in the Corpus.

Returns:

getAllTerminals

public java.util.Set<Terminal> getAllTerminals(java.util.ArrayList<Fenode> allFenodes)
Returns a list of Terminal elements that correspond to the given set of Fenode elements. For example, if a frame or frame element's annotation (-> one or more Fenodes) covers a complex node (e. g. an NP or VP node), then this method resolves all Terminal elements that belong to this NP or VP node.

Parameters:
allFenodes -
Returns:
allTerminals

getAllTerminalsInCorpus

public java.util.ArrayList<Terminal> getAllTerminalsInCorpus()
Returns a list of all Terminal elements that are contained in any Sentence in the Corpus.

Returns:

getSentenceIdMapping

public java.util.HashMap<java.lang.String,java.lang.String> getSentenceIdMapping()
Returns a mapping of all old sentence id's to new sentence ids. The new ids start with "s1", "s2", ...