View Project


TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension


We present TriviaQA, a challenging reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions. We show that, in comparison to other recently introduced large-scale datasets, TriviaQA (1) has relatively complex, compositional questions, (2) has considerable syntactic and lexical variability between questions and corresponding answer-evidence sentences, and (3) requires more cross sentence reasoning to find answers. We also present two baseline algorithms: a feature-based classifier and a state-of-the-art neural network, that performs well on SQuAD reading comprehension. Neither approach comes close to human performance (23% and 40% vs. 80%), suggesting that TriviaQA is a challenging test bed that is worth significant future study.



Suggested Topics

Full Matches (full topic name in abstract)

Partial Matches (at least half of words topic name appear in abstract)

Suggested Resources

Uses abstract to search the content of resources available in Topics. Sorted by relevance.

# Title Author Topic Medium Score
1 awesome-qa seriousmac 411 resource 294.84
2 TriviaQA: A Large Scale Dataset for Reading Comprehension and Question Answering Mandar Joshi, Eunsol Choi, Daniel Weld, Luke Zettlemoyer 412 corpus 266.05
3 Recent Evolution of QA Datasets and Going Forward Jiwoong Im 412 tutorial 251.16
4 Similarity-driven Semantic Role Induction via Graph Partitioning Joel Lang, Mirella Lapata 999 paper 238.73
5 NLP’s generalization problem, and how researchers are tackling it Ana Marasovic 711 resource 236.84
6 Do Altmetrics Work? Twitter and Ten Other Social Web Services Mike Thelwall, Stefanie Haustein, Vincent Larivière, Cassidy R. Sugimoto 999 paper 218.58
7 Codra: A Novel Discriminative Framework for Rhetorical Analysis Shafiq Joty, Giuseppe Carenini, Raymond T. Ng 999 paper 215.20
8 Diversity and network coherence as indicators of interdisciplinarity: case studies in bionanoscience Ismael Rafols, Martin Meyer 999 paper 207.94
9 NLP's ImageNet moment has arrived Sebastian Ruder 862 resource 203.90
10 Neural Information Retrieval: At the End of the Early Years Kezban Dilek Onal, Ye Zhang, Ismail Sengor Altingovde, Md Mustafizur Rahman, Pinar Karagoz, Alex Braylan, Brandon Dang, Heng-Lu Chang, Henna Kim, Quinten McNamara, Aaron Angert, Edward Banner, Vivek K 713 resource 200.84
11 Highlights of EMNLP 2017: Exciting Datasets, Return of the Clusters, and More! Sebastian Ruder 641 resource 196.73
12 Clustering cliques for graph-based summarization of the biomedical research literature Han Zhang, Marcelo Fiszman, Dongwook Shin, Bartomiej Wilkowski, Thomas Rindflesch 999 paper 194.76
13 The Stanford Question Answering Dataset Pranav Rajpurkar 411 resource 189.40
14 Lexicalization and Generative Power in Ccg Marco Kuhlmann, Alexander Koller, Giorgio Satta 999 paper 187.68
15 Discriminative Syntax-based Word Ordering for Text Generation Yue Zhang, Stephen Clark 999 paper 186.43
16 An index to quantify an individual’s scientific research output that takes into account the effect of multiple coauthorship J. E. Hirsch 999 paper 185.37
17 Recent Advances in Document Summarization Jin-ge Yao, Xiaojun Wan, Jianguo Xiao 421 survey 179.99
18 A general framework for analysing diversity in science, technology and society Andy Stirling 999 paper 178.89
19 Bayesian Statistics explained to Beginners in Simple English NSS 102 tutorial 172.78
20 Negated bio-events: analysis and identification Raheel Nawaz, Paul Thompson, Sophia Ananiadou 999 paper 170.52
21 DEEP LEARNING FOR CHATBOTS, PART 1 - INTRODUCTION Denny Britz 445 tutorial 169.07
22 Machine Learning for Humans Vishal Maini, Samer Sabri 134 tutorial 168.88
23 The data that transformed AI research—and possibly the world Dave Gershgorn 107 resource 168.67
24 Is science becoming more interdisciplinary? Measuring and mapping six research fields over time Alan L. Porter, Ismael Rafols 999 paper 165.79
25 A social network's changing statistical properties and the quality of human innovation Brian Uzzi 999 paper 165.12
26 Natural Language Processing in Artificial Intelligence is almost human-level accurate. Worse yet, it gets smart! Rafal 133 tutorial 162.58
27 The history and meaning of the journal impact factor Eugene Garfield 999 paper 159.40
28 How do we capture structure in relational data? Matthew Das Sarma 711 resource 159.17
30 Improving Language Understanding with Unsupervised Learning Alec Radford 581 resource 157.97
31 The 7 NLP Techniques That Will Change How You Communicate in the Future (Part II) James Le 133 resource 156.76
32 State-of-the-art neural coreference resolution for chatbots Thomas Wolf 756 tutorial 155.26
33 Recurrent Neural Networks Tutorial, Part 2 - Implementing a RNN with Python, Numpy, and Theano Denny Britz 741 tutorial 154.55
34 FigureQA: an annotated figure dataset for visual reasoning Author Unknown 862 resource 154.31
35 Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation Author Unknown 952 resource 153.46
36 The h’-Index, Effectively Improving the h-Index Based on the Citation Distribution Chun-Ting Zhang 999 paper 152.73
37 Four deep learning trends from ACL 2017: Part 2 Abigail See 713 resource 151.93
38 Four deep learning trends from ACL 2017: Part 1 Abigail See 713 resource 151.68
39 Rohan #2: Artificial intelligence, ?Progress/?Time Rohan Kapur 811 tutorial 151.36
40 Train Neural Machine Translation Models with Sockeye Felix Hieber, Tobias Domhan 753 tutorial 150.46
41 Introduction to Visual Question Answering: Datasets, Approaches and Evaluation Javier Couto 411 resource 148.92
42 Tombones Computer Vision Blog Tomasz Malisiewicz 958 resource 148.65
43 100+ Interesting Data Sets for Statistics Robb Seaton 999 corpus 148.57
44 K-Means & Other Clustering Algorithms: A Quick Intro with Python Nikos Koufos 571 tutorial 147.66
45 Introducing Gluon - An Easy-to-Use Programming Interface for Flexible Deep Learning Vikram Madan 731 resource 147.53
46 Analyzing the Meaning of Sentences Steven Bird, Ewan Klein, Edward Loper 721 course 147.26
47 ACL 2017 Report Yuta Kikuchi, Sosuke Kobayashi 711 resource 147.12
48 Machine Learning Glossary Author Unknown 107 resource 146.54
49 ICML+ACL’18: Structure Back in Play, Translation Wants More Context Andre Martins 956 resource 146.54
50 Quora Duplicate Questions Corpus Quora 151 corpus 146.36
51 RNNs in Tensorflow, a Practical Guide and Undocumented Features Denny Britz 741 tutorial 144.85
52 Recommendation in Industry Xavier Amatriain 999 tutorial 144.22
53 Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano Denny Britz 742 tutorial 144.14
54 The Future (and Present) of Artificial Intelligence AMA Various Authors 811 resource 142.50
55 An Intuitive Guide to Linear Algebra Kalid Azad 121 tutorial 142.25
56 25 Open Datasets for Deep Learning Every Data Scientist Must Work With Pranav Dar 731 resource 142.02
57 DrQA Adam Fisch 755 library 141.01
58 Using Artificial Intelligence to Augment Human Intelligence Shan Carter, Michael Nielsen 811 resource 139.84
59 Summaries and notes on Deep Learning research papers Denny Britz 713 resource 139.66
60 Deep Learning from first principles in Python, R and Octave – Part 3 Tinniam V Ganesh 711 resource 139.39
61 Automatic Labeling of Semantic Roles Daniel Gildea, Daniel Jurafsky 999 paper 138.27
62 Recurrent Neural Networks Tutorial, Part 3- Backpropagation Through Time and Vanishing Gradients Denny Britz 741 tutorial 137.04
63 Explain yourself, machine. Producing simple text descriptions for AI interpretability. Luke Oakden-Rayner 811 resource 137.02
64 Deep Learning in NLP Vered Shwartz 711 resource 137.00
65 Vector Calculus: Understanding the Dot Product Kalid Azad 101 tutorial 136.79
66 Introduction to Neural Machine Translation with GPUs (part 3) Kyunghyun Cho 753 tutorial 136.51
67 A Conversational Question Answering Challenge Author Unknown 411 resource 135.51
68 Neural Text Embeddings for IR Bhaskar Mitra, Nick Craswell 721 tutorial 135.06
69 Dialog state tracking, a machine reading approach using a memory-enhanced neural network Julien Perez 999 paper 134.97
70 Question answering on the Facebook bAbi dataset using recurrent neural networks and 175 lines of Python + Keras Stephen Merity 741 resource 134.25
71 awesome-question-answering Apurv Verma 755 library 134.22
72 News Article Wikipedia Dataset Author Unknown 999 library 134.04
73 WikiTableQuestions: a Complex Real-World Question Understanding Dataset Ice Pasupat 755 corpus 133.65
74 WikiTableQuestions: a Complex Real-World Question Understanding Dataset Ice Pasupat 412 tutorial 133.65
75 DeepMind has a bigger plan for its newest Go-playing AI Dave Gershgorn 811 resource 133.22
76 TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension Mandar Joshi, Eunsol Choi, Daniel Weld, Luke Zettlemoyer 412 library 133.00
77 Recurrent Neural Networks Tutorial, Part 1 - Introduction to RNNs Denny Britz 741 tutorial 132.97
78 The NeuroEvolution of Augmenting Topologies (NEAT) Users Page Author Unknown 999 resource 131.75
79 Building a Question-Answering System from Scratch— Part 1 Alvira Swalin 411 resource 131.70
80 Deconstruction with Discrete Embeddings R2RT 711 resource 131.43
81 Fueling the Gold Rush: The Greatest Public Datasets for AI Luke de Oliveira 107 resource 130.08
82 State of the art deep learning model for question answering Victor Zhong, Caiming Xiong 755 tutorial 129.55
83 Future impact: Predicting scientific success Daniel E. Acuna, Stefano Allesina, Konrad P. Kording 999 paper 129.00
84 Open Machine Learning Course. Topic 3. Classification, Decision Trees and k Nearest Neighbors Yury Kashnitskiy 711 resource 128.96
85 Requests for Research Sebastian Ruder 921 resource 128.88
86 Machine Learning Morteza Shahriari Nia 107 tutorial 128.71
87 Allen Institute for Artificial Intelligence: datasets Allen Institute 811 corpus 128.31
88 Making computers explain themselves Larry Hardesty 713 resource 128.05
89 Natural Language Processing (NLP) for Computational Social Science Cristian Danescu-Niculescu-Mizil, Lillian Lee 133 tutorial 127.71
90 Learning when to skim and when to read Alexander Rosenberg Johansen, Bryan McCann, James Bradbury, Richard Socher 713 tutorial 127.38
91 What is machine learning? Everything you need to know Nick Heath 711 resource 127.32
92 10 Applications of Artificial Neural Networks in Natural Language Processing Olga Davydova 811 resource 127.03
93 Key Phrases Dataset Author Unknown 999 library 126.90
94 Hierarchical Phrase-Based Translation David Chiang 999 paper 126.70
95 The Unreasonable Ineffectiveness of Deep Learning in NLU Suman Deb Roy 713 tutorial 126.70