Corpus
- The SFN Corpus, currently under construction, includes both New World and European Spanish. It is composed of texts of different genres, primarily newspapers, newswire texts, book reviews, and humanities essays.
- These texts of various origins and genres make a grand total of 937 million words.
- Spanish FrameNet wishes to acknowledge the support of Anthropos Editorial (Barcelona, Spain), Diario ABC (Madrid, Spain), and El Mundo (Madrid, Spain), which made it possible for this research project to use excerpts of their texts and publications as the evidential basis for the inquiry into the behavior of Spanish words.
- The SFN Corpus includes the following subcorpora:
- The Spanish Newswire Text, Vol. 2, made available through the Linguistic Data Consortium.
- The Spanish part of the eleven-language parallel corpus Europarl: European Parliament Proceedings Parallel Corpus, v. 6 (1996-2010);
- The Spanish portion of the trilingual Wikicorpus, v. 1.0, which was extracted from a snapshot of Wikipedia (2006); and
- The Spanish part of the seven-language parallel corpus MultiUN: Multilingual UN Parallel Text 2000-2009, a corpus made up of the resolutions of the United Nations.
- The SFN Corpus also includes Wikicorpus v. 1.0: Spanish portion of the Wikipedia.
- The IMS Corpus Workbench of the Institut für Maschinelle Sprachverarbeitung of the University of Stuttgart has been used to explore, extract, and sort example lines and sentences from the SFN Corpus.
- The Spanish corpus of the Sketch Engine has also been used to complement the subcorporation of certain lexical units.