TutorialBank: Using a Manually-Collected Corpus for Prerequisite Chains, Survey Extraction and Resource Recommendation


The field of Natural Language Processing (NLP) is growing rapidly, with new research published daily along with an abundance of tutorials, codebases and other online resources. In order to learn this dynamic field or stay up-to-date on the latest research, students as well as educators and researchers must constantly sift through multiple sources to find valuable, relevant information. To address this situation, we introduce TutorialBank, a new, publicly available dataset which aims to facilitate NLP education and research. We have manually collected and categorized over 5,600 resources on NLP as well as the related fields of Artificial Intelligence (AI), Machine Learning (ML) and Information Retrieval (IR). Our dataset is notably the largest manually-picked corpus of resources intended for NLP education which does not include only academic papers. Additionally, we have created both a search engine and a command-line tool for the resources and have annotated the corpus to include lists of research topics, relevant resources for each topic, prerequisite relations among topics, relevant sub-parts of individual resources, among other annotations. We are releasing the dataset and present several avenues for further research.



# Title Author Topic Medium Score
1 Highlights of EMNLP 2017: Exciting Datasets, Return of the Clusters, and More! Sebastian Ruder 641 resource 270.97
2 Diversity and network coherence as indicators of interdisciplinarity: case studies in bionanoscience Ismael Rafols, Martin Meyer 999 paper 251.44
3 Do Altmetrics Work? Twitter and Ten Other Social Web Services Mike Thelwall, Stefanie Haustein, Vincent Larivière, Cassidy R. Sugimoto 999 paper 240.49
4 Codra: A Novel Discriminative Framework for Rhetorical Analysis Shafiq Joty, Giuseppe Carenini, Raymond T. Ng 999 paper 239.27
6 An index to quantify an individual’s scientific research output that takes into account the effect of multiple coauthorship J. E. Hirsch 999 paper 229.04
7 State-of-the-art neural coreference resolution for chatbots Thomas Wolf 756 tutorial 228.55
8 The data that transformed AI research—and possibly the world Dave Gershgorn 107 resource 227.92
9 Similarity-driven Semantic Role Induction via Graph Partitioning Joel Lang, Mirella Lapata 999 paper 226.82
10 Clustering cliques for graph-based summarization of the biomedical research literature Han Zhang, Marcelo Fiszman, Dongwook Shin, Bartomiej Wilkowski, Thomas Rindflesch 999 paper 223.45
11 The history and meaning of the journal impact factor Eugene Garfield 999 paper 221.29
12 Wordnet, getting your hands dirty bogdani 315 resource 218.37
13 Train Neural Machine Translation Models with Sockeye Felix Hieber, Tobias Domhan 753 tutorial 218.28
14 DEEP LEARNING FOR CHATBOTS, PART 1 - INTRODUCTION Denny Britz 445 tutorial 213.68
15 Discriminative Syntax-based Word Ordering for Text Generation Yue Zhang, Stephen Clark 999 paper 209.43
16 Neural Information Retrieval: At the End of the Early Years Kezban Dilek Onal, Ye Zhang, Ismail Sengor Altingovde, Md Mustafizur Rahman, Pinar Karagoz, Alex Braylan, Brandon Dang, Heng-Lu Chang, Henna Kim, Quinten McNamara, Aaron Angert, Edward Banner, Vivek K 713 resource 209.29
17 Recurrent Neural Networks Tutorial, Part 2 - Implementing a RNN with Python, Numpy, and Theano Denny Britz 741 tutorial 207.82
18 A general framework for analysing diversity in science, technology and society Andy Stirling 999 paper 207.11
19 Rapid Understanding of Scientific Paper Collections: Integrating Statistics, Text Analytics, and Visualization Cody Dunne, Ben Shneiderman, Robert Gove, Judith Klavans, Bonnie Dorr 999 paper 200.97
20 Lexicalization and Generative Power in Ccg Marco Kuhlmann, Alexander Koller, Giorgio Satta 999 paper 199.68
21 The h’-Index, Effectively Improving the h-Index Based on the Citation Distribution Chun-Ting Zhang 999 paper 195.56
22 Earlier Web usage statistics as predictors of later citation impact Tim Brody, Stevan Harnad, Leslie Carr 999 paper 193.27
23 A survey of transfer learning Karl Weiss, Taghi M. Khoshgoftaar and DingDing Wang 978 resource 192.60
24 DeepMind has a bigger plan for its newest Go-playing AI Dave Gershgorn 811 resource 191.62
25 A social network's changing statistical properties and the quality of human innovation Brian Uzzi 999 paper 190.56
26 Introducing Gluon - An Easy-to-Use Programming Interface for Flexible Deep Learning Vikram Madan 731 resource 188.83
27 Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano Denny Britz 742 tutorial 188.73
28 Recent Advances in Document Summarization Jin-ge Yao, Xiaojun Wan, Jianguo Xiao 421 survey 186.52
29 An Intuitive Guide to Linear Algebra Kalid Azad 121 tutorial 186.25
30 Machine Learning for Humans Vishal Maini, Samer Sabri 134 tutorial 184.60
31 Future impact: Predicting scientific success Daniel E. Acuna, Stefano Allesina, Konrad P. Kording 999 paper 183.66
32 RNNs in Tensorflow, a Practical Guide and Undocumented Features Denny Britz 741 tutorial 177.90
33 Bayesian Statistics explained to Beginners in Simple English NSS 102 tutorial 177.50
34 Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation Author Unknown 952 resource 177.34
35 Gimli: open source and high-performance biomedical name recognition David Campos, Sergio Matos, Jose Oliveira 999 paper 177.04
36 The e-Index, Complementing the h-Index for Excess Citations Chun-Ting Zhang 999 paper 175.50
37 Learning Reinforcement Learning (with Code, Exercises and Solutions) Denny Britz 713 tutorial 175.30
38 Natural Language Processing in Artificial Intelligence is almost human-level accurate. Worse yet, it gets smart! Rafal 133 tutorial 174.60
39 Recurrent Neural Networks Tutorial, Part 1 - Introduction to RNNs Denny Britz 741 tutorial 174.34
40 The Future (and Present) of Artificial Intelligence AMA Various Authors 811 resource 174.21
41 Rohan #2: Artificial intelligence, ?Progress/?Time Rohan Kapur 811 tutorial 174.02
42 A Practitioner's Guide to Natural Language Processing (Part I)?—?Processing & Understanding Text Dipanjan (DJ) Sarker 112 resource 173.44
43 Recurrent Neural Networks Tutorial, Part 3- Backpropagation Through Time and Vanishing Gradients Denny Britz 741 tutorial 172.23
44 Negated bio-events: analysis and identification Raheel Nawaz, Paul Thompson, Sophia Ananiadou 999 paper 170.98
45 Citation Analysis and Discourse Analysis Revisited Howard D. White 999 paper 170.55
46 Is science becoming more interdisciplinary? Measuring and mapping six research fields over time Alan L. Porter, Ismael Rafols 999 paper 169.62
47 Deep Learning from first principles in Python, R and Octave – Part 3 Tinniam V Ganesh 711 resource 168.61
48 Little Science, Big Science...and Beyond Derek J. Price 999 paper 168.27
49 Quora Duplicate Questions Corpus Quora 151 corpus 168.17
50 K-Means & Other Clustering Algorithms: A Quick Intro with Python Nikos Koufos 571 tutorial 167.08
51 A Complete Tutorial to Learn Data Science with Python from Scratch Kunal Jain 131 tutorial 166.69
52 25 Open Datasets for Deep Learning Every Data Scientist Must Work With Pranav Dar 731 resource 166.46
53 Transfer Learning - Machine Learnings Next Frontier Sebastian Ruder 978 tutorial 165.87
54 Natural Language Processing (NLP) for Computational Social Science Cristian Danescu-Niculescu-Mizil, Lillian Lee 133 tutorial 165.86
55 Automatic Labeling of Semantic Roles Daniel Gildea, Daniel Jurafsky 999 paper 164.70
56 The Definitive Guide to Natural Language Processing Javier Couto 133 tutorial 162.79
57 The 7 NLP Techniques That Will Change How You Communicate in the Future (Part I) James Le 112 resource 162.50
58 From Natural Language Processing to Ar4ficial Intelligence Jonathan Mugan 133 tutorial 161.84
59 Introduction to Neural Machine Translation with GPUs (part 3) Kyunghyun Cho 753 tutorial 161.38
60 A Hirsch-type index for journals Tibor Braun, Wolfgang Glänzel, András Schubert 999 paper 161.10
61 Neural networks: training with backpropagation. Jeremy Jordan 711 resource 160.99
62 A Comparative Analysis of ChatBots APIs Author Unknown 921 resource 160.85
63 Quadratic entropy and analysis of diversity C. R. Rao 999 paper 160.31
64 Introductory Guide to Artificial Intelligence Egor Dezhic 811 resource 159.56
65 Introduction to Natural Language Processing (NLP) 2016 Matt Kiser 133 tutorial 158.55
66 The Evolution and Core Concepts of Deep Learning & Neural Networks Guest Blog 711 tutorial 157.86
67 Webcrow: A Web-Based Crosswords Solver Giovanni Angelini, Marco Ernandes, Marco Gori 999 paper 157.41
68 The Alignment Template Approach to Statistical Machine Translation Franz Josef Och, Hermann Ney 999 paper 157.15
69 Ultimate Guide to Understand & Implement Natural Language Processing (with codes in Python) Shivam Bansal 131 tutorial 157.12
70 Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences Hongyuan Mei, Mohit Bansal, Matthew R. Walter 999 paper 156.79
71 100+ Interesting Data Sets for Statistics Robb Seaton 999 corpus 156.27
72 Modelling, visualising and summarising documents with a single convolutional neural network Misha Denil, Alban Demiraj, Nal Kalchbrenner, Phil Blunsom, Nando de Freitas 744 paper 156.13
73 News Article Wikipedia Dataset Author Unknown 999 library 155.85
74 Image-to-Image Translation in Tensorflow Christopher Hesse 731 tutorial 155.22
75 Web Scraping in Python using Scrapy (with multiple examples) Mohd Sanad Zaki Rizvi 999 resource 154.24
76 Doc2vec tutorial Radim Rehurek 721 tutorial 154.03
77 Hierarchical Phrase-Based Translation David Chiang 999 paper 153.80
78 A Brief Introduction to Graphical Models and Bayesian Networks Kevin Murphy 967 resource 153.16
79 Complete guide to build your own Named Entity Recognizer with Python bogdani 232 resource 152.53
80 Comparing Top Deep Learning Frameworks: Deeplearning4j, Torch, Theano, TensorFlow, Caffe, Paddle, MxNet, Keras & CNTK Skymind 731 resource 152.16
81 Learning about the world through video Moritz Mueller-Freitag 811 resource 152.15
82 Models for predicting and explaining citation count of biomedical articles Lawrence D. Fu, Constantin Aliferis 999 paper 152.03
83 Tombones Computer Vision Blog Tomasz Malisiewicz 958 resource 151.40
84 A Beginner’s Guide to Deep Reinforcement Learning Adam Gibson, Chris Nicholson, Josh Patterson 857 library 151.33
85 Machine Learning Morteza Shahriari Nia 107 tutorial 151.29
86 Four deep learning trends from ACL 2017: Part 1 Abigail See 713 resource 150.78
87 How to solve 90% of NLP problems: a step-by-step guide Emmanuel Ameisen 999 resource 150.50
88 ACL 2017 Report Yuta Kikuchi, Sosuke Kobayashi 711 resource 149.91
89 spaCy 101: Everything you need to know Author Unknown 731 resource 149.87
90 Capsule Networks and the Limitations of CNNs Soham Chatterjee 744 resource 149.58
91 What is machine learning? Everything you need to know Nick Heath 711 resource 148.73
92 Key Phrases Dataset Author Unknown 999 library 148.40