ManyNames

ManyNames is a large-scale resource of human object naming data.

The English version of ManyNames provides on average 31 name annotations for 25K objects in real world images, whereas for Mandarin Chinese it provides on average 20 name annotations for 1,319 objects.

The images for ManyNames were selected from VisualGenome (also available here). In each image, one target object is marked with a red bounding box. Annotators were asked to provide the first name that came to mind for the object tightly bound by this bounding box.

Note: A subset of the ManyNames data has also been annotated for Catalan within the AINA project (1,072 images, around 10 annotations per image). It is available here.

Explore

You can explore the ManyNames datasets by searching for images by name or by browsing the names that were collected in ManyNames.

More information and download

The datasets can be downloaded from the GitHub repository. The datasets are provided in .tsv (tab-separated values) and .json format. Python and R scripts to facilitate access to the datasets as well as detailed documentation of the datasets is provided in the repository.

Direct links to the main files of interest are provided in the download section.

Citing ManyNames

For any use of ManyNames:

Silberer, C., S. Zarrieß, M. Westera, G. Boleda. 2020. Humans Meet Models on Object Naming: A New Dataset and Analysis. COLING 2020. [bib]
In addition, if you refer to version 1 specifically:

Silberer, C., S. Zarrieß, G. Boleda. 2020. Object Naming in Language and Vision: A Survey and a New Dataset. LREC 2020. [bib]
In addition, if you use the Mandarin Chinese data:

He, Y., X. Liao, J. Liang, G. Boleda. 2023. The Impact of Familiarity on Naming Variation: A Study on Object Naming in Mandarin Chinese. CoNLL 2023. [bib]