Adding Citations to a GitHub Repository

Following the principles of open science, it is typical for computer science researchers to share source code, tools or datasets accompanying a research paper. Being one of the most popular platforms, GitHub is often used to fulfill this task. While finding the repository associated with a paper usually is not difficult (researchers share the link in the camera-ready version of their paper), the reverse task - the discovery of the associated paper - is a more complicated task. To facilitate this process, researchers often add the citation to the paper to the main README file. However, for quite a while, GitHub facilitates this task allowing one to create a special citation file. The GitHub platform checks the presence of this file in each repository and adds a dropdown button with citation options if it finds it there. In this article, I explain how to add such citation files to your repository.

Table of Contents

Example

Let’s consider our paper “Small Changes, Big Changes: An Updated View on the Android Permission System” presented at RAID 2016 as an example. In the GitHub repository, we have shared the dataset accompanying this paper.

If you open this repository in your browser, you should spot “Cite this repository” dropdown button to the right of the list of repository files. Figure 1 exemplifies how this looks in my browser. If you click on this button, a modal window appears the content of which depends on the type of the citation file.

Citation Dropdown Button
Citation Dropdown Button

Type of Citation Files

According to the GitHub documentation, there are two types of citation files supported by the platform:

  1. Non-Parselable
  2. Parselable

GitHub supports a large variety of files belonging to the Non-Parselable type (see documentation). Note these file names are case-insensitive (the inst/CITATION file is typically used in R package repositories):

  1. CITATION
  2. CITATIONS
  3. CITATION.bib
  4. CITATIONS.bib
  5. CITATION.md
  6. CITATIONS.md
  7. inst/CITATION

At the same time, there is only one Parselable citation file: CITATION.cff, which should be placed in the root directory of your repository.

The difference between these two categories is quite substantial. If you put a Non-Parselable citation file into your repository, the only action available in the modal window, which appears if you click on the dropdown button, is “View citation file” (see Figure 2).

Modal Window Corresponding to Non-Parselable File
Modal Window Corresponding to Non-Parselable File

At the same time, the modal window corresponding to the Parselable citation file type looks much better (see Figure 3). As you can see, you can copy the citation in two different formats: APA and BibTeX. Moreover, as in the case of the Non-Parselable files, you can also “View citation file”.

Modal Window Corresponding to Parselable File
Modal Window Corresponding to Parselable File
Note that if you have both Parselable and Non-Parselable files in the same repository, the precedence has the information provided in the CITATION.cff file, and, if you click on the “View citation file” button, this file will be opened.

How to Make Repository Citable

As you now understand the differences between the Parselable and Non-Parselable citation file types, let us consider how to make your repository support citation functionality.

Non-Parselable File

Basically, to add the “Cite this repository” button to your project repository page (as it is shown in Figure 2) that points to a Non-Parselable file, you just need to add one of the files mentioned in the documentation. So, just put your citation, e.g., into the CITATION.bib file in your project repository and the dropdown button pointing to this file should appear – no other configurations are necessary.

Parselable File

Creating a Parselable file is more tricky. First, in order to parse the file, the data in it should be presented in a particular format understandable to the parser. It is called “citation file format”, which abbreviation cff is also the extension for the corresponding citation file, and based on the yaml format. Its schema is described in the following document. The simplest example of a repository citation could be the following (taken from the document and adapted):

abstract: Fast detection of repackaged Android applications based on the comparison of resource files included into the package.
authors:
  - family-names: Zhauniarovich
    given-names: Yury
    orcid: "https://orcid.org/0000-0001-9116-0728"
cff-version: 1.2.0
date-released: "2013-11-30"
identifiers:
  - type: url
    value: "https://github.com/zyrikby/FSquaDRA/tree/dc42c93991240da0fc9f1081e72be3eeb17d2638"
    description: Latest version
keywords:
  - research
  - "detection repackaged Android applications"
  - "resouce files"
license: Apache-2.0
message: If you use this software, please cite it using these metadata.
repository-code: "https://github.com/zyrikby/FSquaDRA"
title: FSquaDRA

If you create this file in your repository, the “Cite this repository” button should appear and the corresponding BibTeX citation should be the following:

@software{Zhauniarovich_FSquaDRA_2013,
  author = {Zhauniarovich, Yury},
  license = {Apache-2.0},
  month = {11},
  title = {{FSquaDRA}},
  url = {https://github.com/zyrikby/FSquaDRA},
  year = {2013}
}

While this information could be the one you want others to use to cite your tool, researchers usually prefer that the corresponding is cited rather than the tool itself. In order to do this, we need to add an additional preferred-citation section to our document and fill it with the values (see “Credit Redirection” section). Thus, after the modifications our cff document will be the following:

title: FSquaDRA
abstract: Fast detection of repackaged Android applications based on the comparison of resource files included into the package.
authors:
  - family-names: Zhauniarovich
    given-names: Yury
    orcid: "https://orcid.org/0000-0001-9116-0728"
cff-version: 1.2.0
date-released: "2013-11-30"
identifiers:
  - type: url
    value: "https://github.com/zyrikby/FSquaDRA/tree/dc42c93991240da0fc9f1081e72be3eeb17d2638"
    description: Latest version
keywords:
  - research
  - "detection repackaged Android applications"
  - "resouce files"
license: Apache-2.0
message: If you use this software, please cite it using these metadata.
repository-code: "https://github.com/zyrikby/FSquaDRA"
preferred-citation:
  title: "FSquaDRA: Fast Detection of Repackaged Applications"
  type: conference-paper
  authors:
  - family-names: "Zhauniarovich"
    given-names: "Yury"
  - family-names: "Gadyatskaya"
    given-names: "Olga"
  - family-names: "Crispo"
    given-names: "Bruno"
  - family-names: "La Spina"
    given-names: "Francesco"
  - family-names: "Moser"
    given-names: "Ermanno"
  collection-title: "28th Annual IFIP WG 11.3 Working Conference on Data and Applications Security and Privacy" # booktitle
  collection-type: "proceedings"
  conference:
    name: "DBSec" # series
  doi: "10.1007/978-3-662-43936-4_9"
  start: 131 # First page number
  end: 146 # Last page number
  year: 2014

Now, the BibTeX should look as follows:

@inproceedings{Zhauniarovich_FSquaDRA_Fast_Detection_2014,
  author = {Zhauniarovich, Yury and Gadyatskaya, Olga and Crispo, Bruno and La Spina, Francesco and Moser, Ermanno},
  booktitle = {28th Annual IFIP WG 11.3 Working Conference on Data and Applications Security and Privacy},
  doi = {10.1007/978-3-662-43936-4_9},
  pages = {131--146},
  series = {DBSec},
  title = {{FSquaDRA: Fast Detection of Repackaged Applications}},
  year = {2014}
}

Spotted Issues

While I was working on this article, I spotted some issues that may be not obvious when you create a citation file. First, according to my understanding, the name of the conference should correspond to the booktitle bibtex’s field. Unfortunately, this is not true: name corresponds to bibtex’s series and in order to add bibtex’s booktitle as per standard, we need to add the collection-title value.

Second, I found out that the message value is not respected – irrespectively of its value, the message in the “Cite this repository” dropdown will be: “If you use this software in your work, please cite it using the following metadata.” I have reported an issue to the developers.

Additional Recommendations

If you would like to know more about this functionality, I recommend visiting the supporting website. There you can find more information about this file format and its support. In addition, there you can also find the tool that allows you to create a CITATION.cff file. Unfortunately, this tool does not properly support extra cff fields, e.g., preferred-citation, therefore, you still need to write their values manually as we did in this article.

Related