View Project


TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension


We present TriviaQA, a challenging reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions. We show that, in comparison to other recently introduced large-scale datasets, TriviaQA (1) has relatively complex, compositional questions, (2) has considerable syntactic and lexical variability between questions and corresponding answer-evidence sentences, and (3) requires more cross sentence reasoning to find answers. We also present two baseline algorithms: a feature-based classifier and a state-of-the-art neural network, that performs well on SQuAD reading comprehension. Neither approach comes close to human performance (23% and 40% vs. 80%), suggesting that TriviaQA is a challenging test bed that is worth significant future study.



Suggested Topics

Full Matches (full topic name in abstract)

Partial Matches (at least half of words topic name appear in abstract)

Suggested Resources

Uses abstract to search the content of resources available in Topics. Sorted by relevance.

# Title Author Topic Medium Score
1 TriviaQA: A Large Scale Dataset for Reading Comprehension and Question Answering Mandar Joshi, Eunsol Choi, Daniel Weld, Luke Zettlemoyer 412 corpus 271.82
2 Recent Evolution of QA Datasets and Going Forward Jiwoong Im 412 tutorial 256.39
3 Similarity-driven Semantic Role Induction via Graph Partitioning Joel Lang, Mirella Lapata 999 paper 239.71
4 Do Altmetrics Work? Twitter and Ten Other Social Web Services Mike Thelwall, Stefanie Haustein, Vincent Larivière, Cassidy R. Sugimoto 999 paper 219.28
5 Codra: A Novel Discriminative Framework for Rhetorical Analysis Shafiq Joty, Giuseppe Carenini, Raymond T. Ng 999 paper 216.02
6 Diversity and network coherence as indicators of interdisciplinarity: case studies in bionanoscience Ismael Rafols, Martin Meyer 999 paper 208.53
7 Neural Information Retrieval: At the End of the Early Years Kezban Dilek Onal, Ye Zhang, Ismail Sengor Altingovde, Md Mustafizur Rahman, Pinar Karagoz, Alex Braylan, Brandon Dang, Heng-Lu Chang, Henna Kim, Quinten McNamara, Aaron Angert, Edward Banner, Vivek K 713 resource 202.17
8 Highlights of EMNLP 2017: Exciting Datasets, Return of the Clusters, and More! Sebastian Ruder 641 resource 197.39
9 Clustering cliques for graph-based summarization of the biomedical research literature Han Zhang, Marcelo Fiszman, Dongwook Shin, Bartomiej Wilkowski, Thomas Rindflesch 999 paper 195.47
10 The Stanford Question Answering Dataset Pranav Rajpurkar 411 resource 190.96
11 Lexicalization and Generative Power in Ccg Marco Kuhlmann, Alexander Koller, Giorgio Satta 999 paper 188.28
12 Discriminative Syntax-based Word Ordering for Text Generation Yue Zhang, Stephen Clark 999 paper 187.18
13 An index to quantify an individual’s scientific research output that takes into account the effect of multiple coauthorship J. E. Hirsch 999 paper 185.86
14 Recent Advances in Document Summarization Jin-ge Yao, Xiaojun Wan, Jianguo Xiao 421 survey 181.15
15 A general framework for analysing diversity in science, technology and society Andy Stirling 999 paper 179.49
16 Bayesian Statistics explained to Beginners in Simple English NSS 102 tutorial 173.41
17 Negated bio-events: analysis and identification Raheel Nawaz, Paul Thompson, Sophia Ananiadou 999 paper 171.52
18 Machine Learning for Humans Vishal Maini, Samer Sabri 134 tutorial 170.20
19 DEEP LEARNING FOR CHATBOTS, PART 1 - INTRODUCTION Denny Britz 445 tutorial 169.64
20 The data that transformed AI research—and possibly the world Dave Gershgorn 107 resource 169.08
21 Is science becoming more interdisciplinary? Measuring and mapping six research fields over time Alan L. Porter, Ismael Rafols 999 paper 165.90
22 A social network's changing statistical properties and the quality of human innovation Brian Uzzi 999 paper 165.67
23 Natural Language Processing in Artificial Intelligence is almost human-level accurate. Worse yet, it gets smart! Rafal 133 tutorial 163.40
24 The history and meaning of the journal impact factor Eugene Garfield 999 paper 159.91
25 Improving Language Understanding with Unsupervised Learning Alec Radford 581 resource 159.17
27 State-of-the-art neural coreference resolution for chatbots Thomas Wolf 756 tutorial 155.69
28 FigureQA: an annotated figure dataset for visual reasoning Author Unknown 862 resource 155.50
29 Recurrent Neural Networks Tutorial, Part 2 - Implementing a RNN with Python, Numpy, and Theano Denny Britz 741 tutorial 154.94
30 Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation Author Unknown 952 resource 153.86
31 Four deep learning trends from ACL 2017: Part 2 Abigail See 713 resource 153.12
32 The h’-Index, Effectively Improving the h-Index Based on the Citation Distribution Chun-Ting Zhang 999 paper 152.97
33 Four deep learning trends from ACL 2017: Part 1 Abigail See 713 resource 152.76
34 Rohan #2: Artificial intelligence, ?Progress/?Time Rohan Kapur 811 tutorial 152.59
35 Train Neural Machine Translation Models with Sockeye Felix Hieber, Tobias Domhan 753 tutorial 150.79
36 Introduction to Visual Question Answering: Datasets, Approaches and Evaluation Javier Couto 411 resource 150.21
37 100+ Interesting Data Sets for Statistics Robb Seaton 999 corpus 149.67
38 Tombones Computer Vision Blog Tomasz Malisiewicz 958 resource 149.65
39 Analyzing the Meaning of Sentences Steven Bird, Ewan Klein, Edward Loper 721 course 148.25
40 ACL 2017 Report Yuta Kikuchi, Sosuke Kobayashi 711 resource 148.08
41 K-Means & Other Clustering Algorithms: A Quick Intro with Python Nikos Koufos 571 tutorial 147.99
42 Introducing Gluon - An Easy-to-Use Programming Interface for Flexible Deep Learning Vikram Madan 731 resource 147.93
43 Machine Learning Glossary Author Unknown 107 resource 147.55
44 Quora Duplicate Questions Corpus Quora 151 corpus 147.45
45 Recommendation in Industry Xavier Amatriain 999 tutorial 145.32
46 RNNs in Tensorflow, a Practical Guide and Undocumented Features Denny Britz 741 tutorial 145.16
47 Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano Denny Britz 742 tutorial 144.42
48 The Future (and Present) of Artificial Intelligence AMA Various Authors 811 resource 143.57
49 25 Open Datasets for Deep Learning Every Data Scientist Must Work With Pranav Dar 731 resource 142.98
50 An Intuitive Guide to Linear Algebra Kalid Azad 121 tutorial 142.61
51 DrQA Adam Fisch 755 library 142.16
52 Using Artificial Intelligence to Augment Human Intelligence Shan Carter, Michael Nielsen 811 resource 141.00
53 Summaries and notes on Deep Learning research papers Denny Britz 713 resource 140.56
54 Deep Learning from first principles in Python, R and Octave – Part 3 Tinniam V Ganesh 711 resource 139.78
55 Automatic Labeling of Semantic Roles Daniel Gildea, Daniel Jurafsky 999 paper 138.49
56 Explain yourself, machine. Producing simple text descriptions for AI interpretability. Luke Oakden-Rayner 811 resource 138.08
57 Vector Calculus: Understanding the Dot Product Kalid Azad 101 tutorial 137.47
58 Introduction to Neural Machine Translation with GPUs (part 3) Kyunghyun Cho 753 tutorial 137.44
59 Recurrent Neural Networks Tutorial, Part 3- Backpropagation Through Time and Vanishing Gradients Denny Britz 741 tutorial 137.28
60 TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension Mandar Joshi, Eunsol Choi, Daniel Weld, Luke Zettlemoyer 412 library 137.22
61 Neural Text Embeddings for IR Bhaskar Mitra, Nick Craswell 721 tutorial 135.98
62 Question answering on the Facebook bAbi dataset using recurrent neural networks and 175 lines of Python + Keras Stephen Merity 741 resource 135.39
63 Dialog state tracking, a machine reading approach using a memory-enhanced neural network Julien Perez 999 paper 135.29
64 awesome-question-answering Apurv Verma 755 library 135.21
65 News Article Wikipedia Dataset Author Unknown 999 library 134.98
66 WikiTableQuestions: a Complex Real-World Question Understanding Dataset Ice Pasupat 755 corpus 134.77
67 WikiTableQuestions: a Complex Real-World Question Understanding Dataset Ice Pasupat 412 tutorial 134.77
68 DeepMind has a bigger plan for its newest Go-playing AI Dave Gershgorn 811 resource 133.40
69 Recurrent Neural Networks Tutorial, Part 1 - Introduction to RNNs Denny Britz 741 tutorial 133.17
70 The NeuroEvolution of Augmenting Topologies (NEAT) Users Page Author Unknown 999 resource 132.81
71 Building a Question-Answering System from Scratch— Part 1 Alvira Swalin 411 resource 132.80
72 Deconstruction with Discrete Embeddings R2RT 711 resource 132.45
73 Fueling the Gold Rush: The Greatest Public Datasets for AI Luke de Oliveira 107 resource 131.12
74 State of the art deep learning model for question answering Victor Zhong, Caiming Xiong 755 tutorial 130.75
75 Open Machine Learning Course. Topic 3. Classification, Decision Trees and k Nearest Neighbors Yury Kashnitskiy 711 resource 130.40
76 Requests for Research Sebastian Ruder 921 resource 129.91
77 Future impact: Predicting scientific success Daniel E. Acuna, Stefano Allesina, Konrad P. Kording 999 paper 129.28
78 Allen Institute for Artificial Intelligence: datasets Allen Institute 811 corpus 129.14
79 Machine Learning Morteza Shahriari Nia 107 tutorial 129.01
80 Making computers explain themselves Larry Hardesty 713 resource 128.91
81 Natural Language Processing (NLP) for Computational Social Science Cristian Danescu-Niculescu-Mizil, Lillian Lee 133 tutorial 128.50
82 Learning when to skim and when to read Alexander Rosenberg Johansen, Bryan McCann, James Bradbury, Richard Socher 713 tutorial 128.42
83 What is machine learning? Everything you need to know Nick Heath 711 resource 128.38
84 10 Applications of Artificial Neural Networks in Natural Language Processing Olga Davydova 811 resource 127.94
85 Key Phrases Dataset Author Unknown 999 library 127.77
86 The Unreasonable Ineffectiveness of Deep Learning in NLU Suman Deb Roy 713 tutorial 127.61
87 PyTorch-GAN Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A. Efros, Oliver Wang, Eli Shechtman 731 library 127.61
88 Latent Semantic Analysis Thomas Landauer, Susan Dumais 314 paper 127.46
89 Rules of Machine Learning: Best Practices for ML Engineering Martin Zinkevich 711 resource 127.46
90 Dive into Machine Learning Michael Floering 107 tutorial 126.93
91 Hierarchical Phrase-Based Translation David Chiang 999 paper 126.90
92 Why Most Published Research Findings Are False John P. A. Ioannidis 999 paper 126.63
93 Little Science, Big Science...and Beyond Derek J. Price 999 paper 125.91
94 An end to end implementation of a Machine Learning pipeline Spandan Madan 107 tutorial 125.82
95 A survey of transfer learning Karl Weiss, Taghi M. Khoshgoftaar and DingDing Wang 978 resource 125.72