Keemei

Cloud-based validation of tabular bioinformatics file formats in Google Sheets

Jai Ram Rideout, John H. Chase, Evan Bolyen, Gail Ackermann, Antonio González, Rob Knight, James G Caporaso

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Background: Bioinformatics software often requires human-generated tabular text files as input and has specific requirements for how those data are formatted. Users frequently manage these data in spreadsheet programs, which is convenient for researchers who are compiling the requisite information because the spreadsheet programs can easily be used on different platforms including laptops and tablets, and because they provide a familiar interface. It is increasingly common for many different researchers to be involved in compiling these data, including study coordinators, clinicians, lab technicians and bioinformaticians. As a result, many research groups are shifting toward using cloud-based spreadsheet programs, such as Google Sheets, which support the concurrent editing of a single spreadsheet by different users working on different platforms. Most of the researchers who enter data are not familiar with the formatting requirements of the bioinformatics programs that will be used, so validating and correcting file formats is often a bottleneck prior to beginning bioinformatics analysis. Main text: We present Keemei, a Google Sheets Add-on, for validating tabular files used in bioinformatics analyses. Keemei is available free of charge from Google's Chrome Web Store. Keemei can be installed and run on any web browser supported by Google Sheets. Keemei currently supports the validation of two widely used tabular bioinformatics formats, the Quantitative Insights into Microbial Ecology (QIIME) sample metadata mapping file format and the Spatially Referenced Genetic Data (SRGD) format, but is designed to easily support the addition of others. Conclusions: Keemei will save researchers time and frustration by providing a convenient interface for tabular bioinformatics file format validation. By allowing everyone involved with data entry for a project to easily validate their data, it will reduce the validation and formatting bottlenecks that are commonly encountered when human-generated data files are first used with a bioinformatics system. Simplifying the validation of essential tabular data files, such as sample metadata, will reduce common errors and thereby improve the quality and reliability of research outcomes.

Original languageEnglish (US)
Article number27
JournalGigaScience
Volume5
Issue number1
DOIs
StatePublished - 2016

Fingerprint

Bioinformatics
Computational Biology
Spreadsheets
Research Personnel
Information Storage and Retrieval
Metadata
Web Browser
Frustration
Web browsers
Ecology
Tablets
Data acquisition
Software
Outcome Assessment (Health Care)
Research

Keywords

  • Cloud
  • Data validation
  • Metadata
  • Plugin
  • QIIME
  • Spreadsheet
  • Tabular file format

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics

Cite this

Rideout, J. R., Chase, J. H., Bolyen, E., Ackermann, G., González, A., Knight, R., & Caporaso, J. G. (2016). Keemei: Cloud-based validation of tabular bioinformatics file formats in Google Sheets. GigaScience, 5(1), [27]. https://doi.org/10.1186/s13742-016-0133-6

Keemei : Cloud-based validation of tabular bioinformatics file formats in Google Sheets. / Rideout, Jai Ram; Chase, John H.; Bolyen, Evan; Ackermann, Gail; González, Antonio; Knight, Rob; Caporaso, James G.

In: GigaScience, Vol. 5, No. 1, 27, 2016.

Research output: Contribution to journalArticle

Rideout, JR, Chase, JH, Bolyen, E, Ackermann, G, González, A, Knight, R & Caporaso, JG 2016, 'Keemei: Cloud-based validation of tabular bioinformatics file formats in Google Sheets', GigaScience, vol. 5, no. 1, 27. https://doi.org/10.1186/s13742-016-0133-6
Rideout JR, Chase JH, Bolyen E, Ackermann G, González A, Knight R et al. Keemei: Cloud-based validation of tabular bioinformatics file formats in Google Sheets. GigaScience. 2016;5(1). 27. https://doi.org/10.1186/s13742-016-0133-6
Rideout, Jai Ram ; Chase, John H. ; Bolyen, Evan ; Ackermann, Gail ; González, Antonio ; Knight, Rob ; Caporaso, James G. / Keemei : Cloud-based validation of tabular bioinformatics file formats in Google Sheets. In: GigaScience. 2016 ; Vol. 5, No. 1.
@article{ac9faa4ca03a4a12b3eb2c80bb67ddad,
title = "Keemei: Cloud-based validation of tabular bioinformatics file formats in Google Sheets",
abstract = "Background: Bioinformatics software often requires human-generated tabular text files as input and has specific requirements for how those data are formatted. Users frequently manage these data in spreadsheet programs, which is convenient for researchers who are compiling the requisite information because the spreadsheet programs can easily be used on different platforms including laptops and tablets, and because they provide a familiar interface. It is increasingly common for many different researchers to be involved in compiling these data, including study coordinators, clinicians, lab technicians and bioinformaticians. As a result, many research groups are shifting toward using cloud-based spreadsheet programs, such as Google Sheets, which support the concurrent editing of a single spreadsheet by different users working on different platforms. Most of the researchers who enter data are not familiar with the formatting requirements of the bioinformatics programs that will be used, so validating and correcting file formats is often a bottleneck prior to beginning bioinformatics analysis. Main text: We present Keemei, a Google Sheets Add-on, for validating tabular files used in bioinformatics analyses. Keemei is available free of charge from Google's Chrome Web Store. Keemei can be installed and run on any web browser supported by Google Sheets. Keemei currently supports the validation of two widely used tabular bioinformatics formats, the Quantitative Insights into Microbial Ecology (QIIME) sample metadata mapping file format and the Spatially Referenced Genetic Data (SRGD) format, but is designed to easily support the addition of others. Conclusions: Keemei will save researchers time and frustration by providing a convenient interface for tabular bioinformatics file format validation. By allowing everyone involved with data entry for a project to easily validate their data, it will reduce the validation and formatting bottlenecks that are commonly encountered when human-generated data files are first used with a bioinformatics system. Simplifying the validation of essential tabular data files, such as sample metadata, will reduce common errors and thereby improve the quality and reliability of research outcomes.",
keywords = "Cloud, Data validation, Metadata, Plugin, QIIME, Spreadsheet, Tabular file format",
author = "Rideout, {Jai Ram} and Chase, {John H.} and Evan Bolyen and Gail Ackermann and Antonio Gonz{\'a}lez and Rob Knight and Caporaso, {James G}",
year = "2016",
doi = "10.1186/s13742-016-0133-6",
language = "English (US)",
volume = "5",
journal = "GigaScience",
issn = "2047-217X",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Keemei

T2 - Cloud-based validation of tabular bioinformatics file formats in Google Sheets

AU - Rideout, Jai Ram

AU - Chase, John H.

AU - Bolyen, Evan

AU - Ackermann, Gail

AU - González, Antonio

AU - Knight, Rob

AU - Caporaso, James G

PY - 2016

Y1 - 2016

N2 - Background: Bioinformatics software often requires human-generated tabular text files as input and has specific requirements for how those data are formatted. Users frequently manage these data in spreadsheet programs, which is convenient for researchers who are compiling the requisite information because the spreadsheet programs can easily be used on different platforms including laptops and tablets, and because they provide a familiar interface. It is increasingly common for many different researchers to be involved in compiling these data, including study coordinators, clinicians, lab technicians and bioinformaticians. As a result, many research groups are shifting toward using cloud-based spreadsheet programs, such as Google Sheets, which support the concurrent editing of a single spreadsheet by different users working on different platforms. Most of the researchers who enter data are not familiar with the formatting requirements of the bioinformatics programs that will be used, so validating and correcting file formats is often a bottleneck prior to beginning bioinformatics analysis. Main text: We present Keemei, a Google Sheets Add-on, for validating tabular files used in bioinformatics analyses. Keemei is available free of charge from Google's Chrome Web Store. Keemei can be installed and run on any web browser supported by Google Sheets. Keemei currently supports the validation of two widely used tabular bioinformatics formats, the Quantitative Insights into Microbial Ecology (QIIME) sample metadata mapping file format and the Spatially Referenced Genetic Data (SRGD) format, but is designed to easily support the addition of others. Conclusions: Keemei will save researchers time and frustration by providing a convenient interface for tabular bioinformatics file format validation. By allowing everyone involved with data entry for a project to easily validate their data, it will reduce the validation and formatting bottlenecks that are commonly encountered when human-generated data files are first used with a bioinformatics system. Simplifying the validation of essential tabular data files, such as sample metadata, will reduce common errors and thereby improve the quality and reliability of research outcomes.

AB - Background: Bioinformatics software often requires human-generated tabular text files as input and has specific requirements for how those data are formatted. Users frequently manage these data in spreadsheet programs, which is convenient for researchers who are compiling the requisite information because the spreadsheet programs can easily be used on different platforms including laptops and tablets, and because they provide a familiar interface. It is increasingly common for many different researchers to be involved in compiling these data, including study coordinators, clinicians, lab technicians and bioinformaticians. As a result, many research groups are shifting toward using cloud-based spreadsheet programs, such as Google Sheets, which support the concurrent editing of a single spreadsheet by different users working on different platforms. Most of the researchers who enter data are not familiar with the formatting requirements of the bioinformatics programs that will be used, so validating and correcting file formats is often a bottleneck prior to beginning bioinformatics analysis. Main text: We present Keemei, a Google Sheets Add-on, for validating tabular files used in bioinformatics analyses. Keemei is available free of charge from Google's Chrome Web Store. Keemei can be installed and run on any web browser supported by Google Sheets. Keemei currently supports the validation of two widely used tabular bioinformatics formats, the Quantitative Insights into Microbial Ecology (QIIME) sample metadata mapping file format and the Spatially Referenced Genetic Data (SRGD) format, but is designed to easily support the addition of others. Conclusions: Keemei will save researchers time and frustration by providing a convenient interface for tabular bioinformatics file format validation. By allowing everyone involved with data entry for a project to easily validate their data, it will reduce the validation and formatting bottlenecks that are commonly encountered when human-generated data files are first used with a bioinformatics system. Simplifying the validation of essential tabular data files, such as sample metadata, will reduce common errors and thereby improve the quality and reliability of research outcomes.

KW - Cloud

KW - Data validation

KW - Metadata

KW - Plugin

KW - QIIME

KW - Spreadsheet

KW - Tabular file format

UR - http://www.scopus.com/inward/record.url?scp=84991529124&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84991529124&partnerID=8YFLogxK

U2 - 10.1186/s13742-016-0133-6

DO - 10.1186/s13742-016-0133-6

M3 - Article

VL - 5

JO - GigaScience

JF - GigaScience

SN - 2047-217X

IS - 1

M1 - 27

ER -