About AAN

Welcome to the All About NLP (AAN) project interface! This website is maintained by Yale University's Language, Information, and Learning at Yale (LILY) Group, which is led by Professor Dragomir R. Radev. AAN encompasses our corpus of resources on NLP and related fields and the research projects which build upon this corpus. You can find out more about this project on our project page.

Recently, we have published the most recent corpus: LectureBank, which consists of totally more than 1700 university lecture slides, and 322 topics. Visit the github page here. We published two papers: AAAI, 2019 What Should I Learn First: Introducing LectureBank for NLP Education and Prerequisite Chain Learning and COLING, 2020 R-VGAE: Relational-variational Graph Autoencoder for Unsupervised Prerequisite Chain Learning .

In our previous phase of the AAN project, we have collected over 10,000 surveys, tutorials and other resources and created a search engine which allows users to easily browse these resources, which are intended to help anyone learn all about Natural Language Processing (NLP) and related topics to accomplish their NLP goals. We introduced this corpus, the TutorialBank Dataset, in our ACL paper TutorialBank: Using a Manually-Collected Corpus for Prerequisite Chains, Survey Extraction and Resource Recommendation. We annotated for the tasks of pedagogical function classification, prerequisite chains and survey extraction and are researching further into each one of these tasks. Download the data! Check out our blog post!

If you use the datasets, please acknowledge the creators and use the following bibtex:

  author    = {Irene Li and
               Alexander R. Fabbri and
               Swapnil Hingmire and
               Dragomir R. Radev},
  title     = {{R-VGAE:} Relational-variational Graph Autoencoder for Unsupervised
               Prerequisite Chain Learning},
  booktitle = {Proceedings of the 28th International Conference on Computational
               Linguistics, {COLING} 2020, Barcelona, Spain (Online), December 8-13,
  pages     = {1147--1157},
  publisher = {International Committee on Computational Linguistics},
  year      = {2020}

  author    = {Irene Li and
               Alexander R. Fabbri and
               Robert R. Tung and
               Dragomir R. Radev},
  title     = {What Should {I} Learn First: Introducing LectureBank for {NLP} Education
               and Prerequisite Chain Learning},
  booktitle = {The Thirty-Third {AAAI} Conference on Artificial Intelligence, {AAAI}
               2019, The Thirty-First Innovative Applications of Artificial Intelligence
               Conference, {IAAI} 2019, The Ninth {AAAI} Symposium on Educational
               Advances in Artificial Intelligence, {EAAI} 2019, Honolulu, Hawaii,
               USA, January 27 - February 1, 2019},
  pages     = {6674--6681},
  publisher = {{AAAI} Press},
  year      = {2019}
  author = {Fabbri, Alexander R and Li, Irene and Trairatvorakul, Prawat and He, Yijiao and
            Ting, Wei Tai and Tung, Robert and Westerfield, Caitlin and Radev, Dragomir R},
  title = {TutorialBank: A Manually-Collected Corpus for Prerequisite Chains, Survey Extraction
          and Resource Recommendation},
  year = {2018},
  booktitle = {Proceedings of ACL},
  publisher = {Association for Computational Linguistics}


The current version is being maintained by Yale's LILY lab. Specifically we would like to thank the following for their work with this website:

Previous Work

In the previous phases of the AAN project, we created several networks based 20,000 papers from the ACL anthology, these networks include paper citation networks, author citation networks, and author collaboration networks. The network is currently built only using ACL papers published by June 2016 and successfully processed. Our AAN search engine also provides access to the ACL Anthology Network corpus.

A number of students from the University of Michigan's CLAIR Group helped with the work involved to create the data, network, and webpages of the original version. This first iteration of the website was created by Mark Thomas Joseph and in addition to him we would like to thank:

The previous version of this site was partially supported by the National Science Foundation grant "Collaborative Research: BlogoCenter - Infrastructure for Collecting, Mining and Accessing Blogs", jointly awarded to UCLA and UMich as IIS 0534323 to UMich and IIS 0534784 to UCLA and by the National Science Foundation grant "iOPENER: A Flexible Framework to Support Rapid Learning in Unfamiliar Research Domains", jointly awarded to UMd and UMich as IIS 0705832.

About the Data

AAN was built from the original pdf files available from the ACL Anthology. Using open source OCR technologies, in-house clean-up scripts, and often tedious manual labor, a web interface was developed that allowed for the annotation of individual references from each paper. We use the following tools for curation.

Publications using the AAN Data

Papers from the AAN Team

Papers using our data

Other Related papers

A Note About the PageRank Centrality

Because of the nature of PageRank values, we have adjusted the results to make them more human readable. The actual value of any PageRank on this website can be found by dividing the numbers given by 1,000,000. We also truncate the decimal points, leaving instead only the integer value. So, for example, if a paper has a computed PageRank of 0.003456789 , We would print that PageRank as 3456 after dropping the .789.