About AAN


Welcome to the All About NLP (AAN) project interface! This website is maintained by Yale University's Language, Information, and Learning at Yale (LILY) Group.

In addition to the search engine, AAN also includes three shareable resources:

LectureBank, which consists of totally more than 7000 university lecture slides, and 320 topics. Visit the github page here.

TutorialBank, which includes 19,765 manually collected resources with valid URLs, meta-data, organized by topic. We also release an extra batch of 5,001 resources. These resources have valid URLs but some meta-data or topic annotation are missing. Visit the github page here.

AAN Network, which is a manually curated networked database of citations, collaborations, and summaries in the field of Computational Linguistics.

If you use the datasets, please acknowledge the creators and use the following bibtex:

@inproceedings{li2019introducing,
  author    = {Irene Li and
               Alexander R. Fabbri and
               Robert R. Tung and
               Dragomir R. Radev},
  title     = {What Should {I} Learn First: Introducing LectureBank for {NLP} Education
               and Prerequisite Chain Learning},
  booktitle = {The Thirty-Third {AAAI} Conference on Artificial Intelligence, {AAAI}
               2019, The Thirty-First Innovative Applications of Artificial Intelligence
               Conference, {IAAI} 2019, The Ninth {AAAI} Symposium on Educational
               Advances in Artificial Intelligence, {EAAI} 2019, Honolulu, Hawaii,
               USA, January 27 - February 1, 2019},
  pages     = {6674--6681},
  publisher = {{AAAI} Press},
  year      = {2019}
}
    
@InProceedings{
  fabbri2018tutorialbank,
  author = {Fabbri, Alexander R and Li, Irene and Trairatvorakul, Prawat and He, Yijiao and
            Ting, Wei Tai and Tung, Robert and Westerfield, Caitlin and Radev, Dragomir R},
  title = {TutorialBank: A Manually-Collected Corpus for Prerequisite Chains, Survey Extraction
          and Resource Recommendation},
  year = {2018},
  booktitle = {Proceedings of ACL},
  publisher = {Association for Computational Linguistics}
}

You may also find these other papers relevant:

  @article{Hingmire2021CLICKERAC,
    title={CLICKER: A Computational LInguistics Classification Scheme for Educational Resources},
    author={Swapnil Hingmire and Irene Li and Rena Kawamura and Benjamin Chen and 
      Alexander R. Fabbri and Xiangru Tang and Yixin Liu and Thomas George 
      and Tammy Liao and Wai Pan Wong and Vanessa Yan and Richard Zhou 
      and Girish Keshav Palshikar and Dragomir Radev},
    journal={ArXiv},
    year={2021},
    volume={abs/2112.08578}
  }
  @inproceedings{li-etal-2021-unsupervised-cross,
      title = "Unsupervised Cross-Domain Prerequisite Chain Learning using Variational Graph Autoencoders",
      author = "Li, Irene  and
        Yan, Vanessa  and
        Li, Tianxiao  and
        Qu, Rihao  and
        Radev, Dragomir",
      booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)",
      month = aug,
      year = "2021",
      address = "Online",
      publisher = "Association for Computational Linguistics",
      url = "https://aclanthology.org/2021.acl-short.127",
      doi = "10.18653/v1/2021.acl-short.127",
      pages = "1005--1011",
      abstract = "Learning prerequisite chains is an important task for one to pick up knowledge efficiently in both known and unknown domains. For example, one may be an expert in the natural language processing (NLP) domain, but want to determine the best order in which to learn new concepts in an unfamiliar Computer Vision domain (CV). Both domains share some common concepts, such as machine learning basics and deep learning models. In this paper, we solve the task of unsupervised cross-domain concept prerequisite chain learning, using an optimized variational graph autoencoder. Our model learns to transfer concept prerequisite relations from an information-rich domain (source domain) to an information-poor domain (target domain), substantially surpassing other baseline models. In addition, we expand an existing dataset by introducing two new domains{---}-CV and Bioinformatics (BIO). The annotated data and resources as well as the code will be made publicly available.",
  }
  
@inproceedings{li-etal-2020-r,
    title = "{R}-{VGAE}: Relational-variational Graph Autoencoder for Unsupervised Prerequisite Chain Learning",
    author = "Li, Irene  and
      Fabbri, Alexander  and
      Hingmire, Swapnil  and
      Radev, Dragomir",
    booktitle = "Proceedings of the 28th International Conference on Computational Linguistics",
    month = dec,
    year = "2020",
    address = "Barcelona, Spain (Online)",
    publisher = "International Committee on Computational Linguistics",
    url = "https://aclanthology.org/2020.coling-main.99",
    doi = "10.18653/v1/2020.coling-main.99",
    pages = "1147--1157",
    abstract = "The task of concept prerequisite chain learning is to automatically determine the existence of prerequisite relationships among concept pairs. In this paper, we frame learning prerequisite relationships among concepts as an unsupervised task with no access to labeled concept pairs during training. We propose a model called the Relational-Variational Graph AutoEncoder (R-VGAE) to predict concept relations within a graph consisting of concept and resource nodes. Results show that our unsupervised approach outperforms graph-based semi-supervised methods and other baseline methods by up to 9.77{\%} and 10.47{\%} in terms of prerequisite relation prediction accuracy and F1 score. Our method is notably the first graph-based model that attempts to make use of deep learning representations for the task of unsupervised prerequisite learning. We also expand an existing corpus which totals 1,717 English Natural Language Processing (NLP)-related lecture slide files and manual concept pair annotations over 322 topics.",
}

Acknowledgements

The current version is being maintained by Yale's LILY lab. Specifically we would like to thank the following for their work with this website:


Previous Work

In the previous phases of the AAN project, we created several networks based 20,000 papers from the ACL anthology, these networks include paper citation networks, author citation networks, and author collaboration networks. The network is currently built only using ACL papers published by June 2016 and successfully processed. Our AAN search engine also provides access to the ACL Anthology Network corpus.

A number of students from the University of Michigan's CLAIR Group helped with the work involved to create the data, network, and webpages of the original version. This first iteration of the website was created by Mark Thomas Joseph and in addition to him we would like to thank:

The previous version of this site was partially supported by the National Science Foundation grant "Collaborative Research: BlogoCenter - Infrastructure for Collecting, Mining and Accessing Blogs", jointly awarded to UCLA and UMich as IIS 0534323 to UMich and IIS 0534784 to UCLA and by the National Science Foundation grant "iOPENER: A Flexible Framework to Support Rapid Learning in Unfamiliar Research Domains", jointly awarded to UMd and UMich as IIS 0705832.


About the Data

AAN was built from the original pdf files available from the ACL Anthology. Using open source OCR technologies, in-house clean-up scripts, and often tedious manual labor, a web interface was developed that allowed for the annotation of individual references from each paper. We use the following tools for curation.


Papers from the AAN Team

Papers using our data


Other Related papers


A Note About the PageRank Centrality

Because of the nature of PageRank values, we have adjusted the results to make them more human readable. The actual value of any PageRank on this website can be found by dividing the numbers given by 1,000,000. We also truncate the decimal points, leaving instead only the integer value. So, for example, if a paper has a computed PageRank of 0.003456789 , We would print that PageRank as 3456 after dropping the .789.