Download AAN Corpora

About the AAN Corpora


The AAN project has three major datasets associated with it:

  1. LectureBank Dataset NEW!
  2. TutorialBank Dataset
  3. AAN Anthology Network Corpus

If you use either of these datasets, please follow these guidelines:

  1. For research only.
  2. Do not re-distribute.
  3. If you publish using this work, you should acknowledge its creators. Please use the bibtex provided for the respective dataset.
  4. Please inform us if you publish as we are interested in the output of this work.

LectureBank Dataset


Guidelines

Visit this link to download the most recent version of LectureBank dataset.

Please acknowledge the creators and use the following bibtex:

@inproceedings{li2019introducing,
  author    = {Irene Li and
               Alexander R. Fabbri and
               Robert R. Tung and
               Dragomir R. Radev},
  title     = {What Should {I} Learn First: Introducing LectureBank for {NLP} Education
               and Prerequisite Chain Learning},
  booktitle = {The Thirty-Third {AAAI} Conference on Artificial Intelligence, {AAAI}
               2019, The Thirty-First Innovative Applications of Artificial Intelligence
               Conference, {IAAI} 2019, The Ninth {AAAI} Symposium on Educational
               Advances in Artificial Intelligence, {EAAI} 2019, Honolulu, Hawaii,
               USA, January 27 - February 1, 2019},
  pages     = {6674--6681},
  publisher = {{AAAI} Press},
  year      = {2019}
}

TutorialBank Dataset


Guidelines

Please acknowledge the creators and use the following bibtex:

@InProceedings{fabbri2018tutorialbank,
  author = {Fabbri, Alexander R and Li, Irene and Trairatvorakul, Prawat and He, Yijiao and
           Ting, Wei Tai and Tung, Robert and Westerfield, Caitlin and Radev, Dragomir R},
  title = {TutorialBank: A Manually-Collected Corpus for Prerequisite Chains, Survey Extraction
          and Resource Recommendation},
  year = {2018},
  booktitle = {Proceedings of ACL},
  publisher = {Association for Computational Linguistics}
}

Instructions

Please git clone the GitHub Repository at: https://github.com/Yale-LILY/TutorialBank.

Please follow the instructions in the readme to download the corpora.

AAN Anthology Network Corpus

The AAN Anthology Network Corpus contains the following:


Guidelines

Please acknowledge the creators and use the following bibtex:

@article{
  year = {2013},
  issn = {1574-020X},
  journal = {Language Resources and Evaluation},
  doi = {10.1007/s10579-012-9211-2},
  title = {The ACL anthology network corpus},
  url = {http://dx.doi.org/10.1007/s10579-012-9211-2},
  publisher = {Springer Netherlands},
  keywords = {ACL Anthology Network; Bibliometrics; Scientometrics; Citation analysis; Citation summaries},
  author = {Radev, Dragomir R. and Muthukrishnan, Pradeep and Qazvinian, Vahed and Abu-Jbara, Amjad},
  pages = {1-26},
  language = {English}
}
@inproceedings{Radev&al.09a;,
  author = {Radev, Dragomir R. and Muthukrishnan, Pradeep and
                  Qazvinian, Vahed},
  title = {The {ACL} Anthology Network Corpus},
  year = "2009",
  address = "Singapore",
  booktitle = "Proceedings, ACL Workshop on Natural Language
                  Processing and Information Retrieval for Digital
                  Libraries"
}
@article{Radev&al.09;,
   author = {Dragomir R. Radev, Mark Thomas Joseph, Bryan Gibson, Pradeep Muthukrishnan},
   year   = "2009",
   title  = {{A} {B}ibliometric and {N}etwork {A}nalysis of the field of {C}omputational {L}inguistics},
   journal= {Journal of the American Society for Information Science and Technology},
   publisher = {John Wiley & Sons}
}

Instructions

To download the current AAN Anthology Network Corpus release, please fill out the following form. The download will start after you submit the form.