Adding Citations to a GitHub Repository
Following the principles of open science, it is typical for computer science researchers to share source code, tools or datasets accompanying a research paper. Being one of the most popular platforms, GitHub is often used to fulfill this task. While finding the repository associated with a paper usually is not difficult (researchers share the link in the camera-ready version of their paper), the reverse task - the discovery of the associated paper - is a more complicated task. To facilitate this process, researchers often add the citation to the paper to the main README file. However, for quite a while, GitHub facilitates this task allowing one to create a special citation file. The GitHub platform checks the presence of this file in each repository and adds a dropdown button with citation options if it finds it there. In this article, I explain how to add such citation files to your repository.
Table of Contents
Example
Let’s consider our paper “Small Changes, Big Changes: An Updated View on the Android Permission System” presented at RAID 2016 as an example. In the GitHub repository, we have shared the dataset accompanying this paper.
If you open this repository in your browser, you should spot “Cite this repository” dropdown button to the right of the list of repository files. Figure 1 exemplifies how this looks in my browser. If you click on this button, a modal window appears the content of which depends on the type of the citation file.
Type of Citation Files
According to the GitHub documentation, there are two types of citation files supported by the platform:
- Non-Parselable
- Parselable
GitHub supports a large variety of files belonging to the Non-Parselable type (see documentation). Note these file names are case-insensitive (the inst/CITATION
file is typically used in R package repositories):
CITATION
CITATIONS
CITATION.bib
CITATIONS.bib
CITATION.md
CITATIONS.md
inst/CITATION
At the same time, there is only one Parselable citation file: CITATION.cff
, which should be placed in the root directory of your repository.
The difference between these two categories is quite substantial. If you put a Non-Parselable citation file into your repository, the only action available in the modal window, which appears if you click on the dropdown button, is “View citation file” (see Figure 2).
At the same time, the modal window corresponding to the Parselable citation file type looks much better (see Figure 3). As you can see, you can copy the citation in two different formats: APA and BibTeX. Moreover, as in the case of the Non-Parselable files, you can also “View citation file”.
CITATION.cff
file, and, if you click on the “View citation file” button, this file will be opened.How to Make Repository Citable
As you now understand the differences between the Parselable and Non-Parselable citation file types, let us consider how to make your repository support citation functionality.
Non-Parselable File
Basically, to add the “Cite this repository” button to your project repository page (as it is shown in Figure 2) that points to a Non-Parselable file, you just need to add one of the files mentioned in the documentation. So, just put your citation, e.g., into the CITATION.bib
file in your project repository and the dropdown button pointing to this file should appear – no other configurations are necessary.
Parselable File
Creating a Parselable file is more tricky. First, in order to parse the file, the data in it should be presented in a particular format understandable to the parser. It is called “citation file format”, which abbreviation cff
is also the extension for the corresponding citation file, and based on the yaml
format. Its schema is described in the following document. The simplest example of a repository citation could be the following (taken from the document and adapted):
abstract: Fast detection of repackaged Android applications based on the comparison of resource files included into the package.
authors:
- family-names: Zhauniarovich
given-names: Yury
orcid: "https://orcid.org/0000-0001-9116-0728"
cff-version: 1.2.0
date-released: "2013-11-30"
identifiers:
- type: url
value: "https://github.com/zyrikby/FSquaDRA/tree/dc42c93991240da0fc9f1081e72be3eeb17d2638"
description: Latest version
keywords:
- research
- "detection repackaged Android applications"
- "resouce files"
license: Apache-2.0
message: If you use this software, please cite it using these metadata.
repository-code: "https://github.com/zyrikby/FSquaDRA"
title: FSquaDRA
If you create this file in your repository, the “Cite this repository” button should appear and the corresponding BibTeX citation should be the following:
@software{Zhauniarovich_FSquaDRA_2013,
author = {Zhauniarovich, Yury},
license = {Apache-2.0},
month = {11},
title = {{FSquaDRA}},
url = {https://github.com/zyrikby/FSquaDRA},
year = {2013}
}
While this information could be the one you want others to use to cite your tool, researchers usually prefer that the corresponding is cited rather than the tool itself. In order to do this, we need to add an additional preferred-citation
section to our document and fill it with the values (see “Credit Redirection” section). Thus, after the modifications our cff
document will be the following:
title: FSquaDRA
abstract: Fast detection of repackaged Android applications based on the comparison of resource files included into the package.
authors:
- family-names: Zhauniarovich
given-names: Yury
orcid: "https://orcid.org/0000-0001-9116-0728"
cff-version: 1.2.0
date-released: "2013-11-30"
identifiers:
- type: url
value: "https://github.com/zyrikby/FSquaDRA/tree/dc42c93991240da0fc9f1081e72be3eeb17d2638"
description: Latest version
keywords:
- research
- "detection repackaged Android applications"
- "resouce files"
license: Apache-2.0
message: If you use this software, please cite it using these metadata.
repository-code: "https://github.com/zyrikby/FSquaDRA"
preferred-citation:
title: "FSquaDRA: Fast Detection of Repackaged Applications"
type: conference-paper
authors:
- family-names: "Zhauniarovich"
given-names: "Yury"
- family-names: "Gadyatskaya"
given-names: "Olga"
- family-names: "Crispo"
given-names: "Bruno"
- family-names: "La Spina"
given-names: "Francesco"
- family-names: "Moser"
given-names: "Ermanno"
collection-title: "28th Annual IFIP WG 11.3 Working Conference on Data and Applications Security and Privacy" # booktitle
collection-type: "proceedings"
conference:
name: "DBSec" # series
doi: "10.1007/978-3-662-43936-4_9"
start: 131 # First page number
end: 146 # Last page number
year: 2014
Now, the BibTeX should look as follows:
@inproceedings{Zhauniarovich_FSquaDRA_Fast_Detection_2014,
author = {Zhauniarovich, Yury and Gadyatskaya, Olga and Crispo, Bruno and La Spina, Francesco and Moser, Ermanno},
booktitle = {28th Annual IFIP WG 11.3 Working Conference on Data and Applications Security and Privacy},
doi = {10.1007/978-3-662-43936-4_9},
pages = {131--146},
series = {DBSec},
title = {{FSquaDRA: Fast Detection of Repackaged Applications}},
year = {2014}
}
Spotted Issues
While I was working on this article, I spotted some issues that may be not obvious when you create a citation file. First, according to my understanding, the name
of the conference should correspond to the booktitle
bibtex’s field. Unfortunately, this is not true: name
corresponds to bibtex’s series
and in order to add bibtex’s booktitle
as per standard, we need to add the collection-title
value.
Second, I found out that the message
value is not respected – irrespectively of its value, the message in the “Cite this repository” dropdown will be: “If you use this software in your work, please cite it using the following metadata.” I have reported an issue to the developers.
Additional Recommendations
If you would like to know more about this functionality, I recommend visiting the supporting website. There you can find more information about this file format and its support. In addition, there you can also find the tool that allows you to create a CITATION.cff
file. Unfortunately, this tool does not properly support extra cff fields, e.g., preferred-citation
, therefore, you still need to write their values manually as we did in this article.