CiteME

Can Language Models Accurately Cite Scientific Claims?

CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.

Ori Press*, Andreas Hochlehnert*, Ameya Prabhu, Vishaal Udandarao,
Ofir Press, Matthias Bethge

* / ‡ shared first/last authorship

What's the task?

Each question in CiteME is made up of a text excerpt (taken from a research paper) that cites a single paper. The cited paper is marked with the word [CITATION]. For each excerpt, the goal is to find the title of the cited paper. For example, given the excerpt "We evaluate our model on [CITATION], a dataset consisting of black and white handwritten digits", the answer would be: "Backpropagation Applied to Handwritten Zip Code Recognition".

CiteME Leaderboard

Model Accuracy (%)
CiteAgent + GPT-4o 35.3
CiteAgent + Claude 3 Opus 27.7
CiteAgent + LLaMA-3-70B 21.0
SPECTER2 0
SPECTER1 0

CiteAgent in Action

We refer to the combination of an LM and our system of search/read/select commands as "CiteAgents". In the example below, we show how CiteAgent using GPT-4o finds a paper given an excerpt. To view more examples, click here.

Question

Excerpt: One reason might be the training task itself: from [CITATION] we know that ImageNet can be solved to high accuracy using only local information. In other words, it might simply suffice to integrate evidence from many local texture features rather than going through the process of integrating and classifying global shapes

CiteAgent

Thought: The cited paper seems to be about solving ImageNet using local information. I'll start by searching for papers related to 'solving ImageNet using local information'.

Action: search(sort=Relevance) ImageNet solved with local information

System
CiteAgent

Thought: The search results do not seem to include papers directly addressing the solution of ImageNet using local information. Therefore, I need to modify the search query to better match the context provided in the excerpt.

Action: search(sort=Relevance) ImageNet high accuracy using local texture features

System
CiteAgent

Thought: Paper ID 810ae4 titled 'Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet' seems directly relevant as it mentions achieving high performance on ImageNet using local features. I will read this paper.

Action: read 810ae4

System

Published as a conference paper at ICLR 2019 APPROXIMATING CNN S WITH BAG-OF-LOCAL - FEATURES MODELS WORKS SURPRISINGLY WELL ONIMAGE NET Wie...

CiteAgent

Thought: This paper directly discusses achieving high accuracy on ImageNet using local features, which is exactly what the excerpt is about.

Action: select 810ae4



If you find our work helpful, please use the following citation:

@misc{press2024citeme, title={CiteME: Can Language Models Accurately Cite Scientific Claims?}, author={Ori Press and Andreas Hochlehnert and Ameya Prabhu and Vishaal Udandarao and Ofir Press and Matthias Bethge}, year={2024}, eprint={2407.12861}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2407.12861}}