Paper: The PAISÀ Corpus of Italian Web Texts

ACL ID W14-0406
Title The PAISÀ Corpus of Italian Web Texts
Venue Workshop On Web As Corpus
Year 2014

PAIS ` A is a Creative Commons licensed, large web corpus of contemporary Italian. We describe the design, harvesting, and processing steps involved in its creation.

  author    = {Lyding, Verena  and  Stemle, Egon  and  Borghetti, Claudia  and  Brunello, Marco  and  Castagnoli, Sara  and  Dell'Orletta, Felice  and  Dittmann, Henrik  and  Lenci, Alessandro  and  Pirrelli, Vito},
  title     = {The PAIS\`{A} Corpus of Italian Web Texts},
  booktitle = {Proceedings of the 9th Web as Corpus Workshop (WaC-9)},
  month     = {April},
  year      = {2014},
  address   = {Gothenburg, Sweden},
  publisher = {Association for Computational Linguistics},
  pages     = {36--43},
  url       = {}