Identification and characterization of multi-species conserved sequences

Elliott H. Margulies, Mathieu Blanchette, Jim Thomas, Jeff Touchman, Bob Blakesley, Gerry Bouffard, Stephen M Beckstrom-Sternberg, Pam Thomas, Jenny McDowell, Baishali Maskeri, Nancy Hansen, Jackie Idol, Valerie Maduro, Shih Queen Lee-Lin, Arjun Prasad, Matt Portnoy, David Haussler, Eric D. Green

Research output: Contribution to journalArticle

241 Citations (Scopus)

Abstract

Comparative sequence analysis has become an essential component of studies aiming to elucidate genome function. The increasing availability of genomic sequences from multiple vertebrates is creating the need for computational methods that can detect highly conserved regions in a robust fashion. Towards that end, we are developing approaches for identifying sequences that are conserved across multiple species; we call these "Multi-species Conserved Sequences" (or MCSs). Here we report two strategies for MCS identification, demonstrating their ability to detect virtually all known actively conserved sequences (specifically, coding sequences) but very little neutrally evolving sequence (specifically, ancestral repeats). Importantly, we find that a substantial fraction of the bases within MCSs (∼70%) resides within non-coding regions; thus, the majority of sequences conserved across multiple vertebrate species has no known function. Initial characterization of these MCSs has revealed sequences that correspond to clusters of transcription factor-binding sites, non-coding RNA transcripts, and other candidate functional elements. Finally, the ability to detect MCSs represents a valuable metric for assessing the relative contribution of a species' sequence to identifying genomic regions of interest, and our results indicate that the currently available genome sequences are insufficient for the comprehensive identification of MCSs in the human genome.

Original languageEnglish (US)
Pages (from-to)2507-2518
Number of pages12
JournalGenome Research
Volume13
Issue number12
DOIs
StatePublished - Dec 2003
Externally publishedYes

Fingerprint

Conserved Sequence
Vertebrates
Genome
Untranslated RNA
Human Genome
Sequence Analysis
Transcription Factors
Binding Sites

ASJC Scopus subject areas

  • Genetics

Cite this

Margulies, E. H., Blanchette, M., Thomas, J., Touchman, J., Blakesley, B., Bouffard, G., ... Green, E. D. (2003). Identification and characterization of multi-species conserved sequences. Genome Research, 13(12), 2507-2518. https://doi.org/10.1101/gr.1602203

Identification and characterization of multi-species conserved sequences. / Margulies, Elliott H.; Blanchette, Mathieu; Thomas, Jim; Touchman, Jeff; Blakesley, Bob; Bouffard, Gerry; Beckstrom-Sternberg, Stephen M; Thomas, Pam; McDowell, Jenny; Maskeri, Baishali; Hansen, Nancy; Idol, Jackie; Maduro, Valerie; Lee-Lin, Shih Queen; Prasad, Arjun; Portnoy, Matt; Haussler, David; Green, Eric D.

In: Genome Research, Vol. 13, No. 12, 12.2003, p. 2507-2518.

Research output: Contribution to journalArticle

Margulies, EH, Blanchette, M, Thomas, J, Touchman, J, Blakesley, B, Bouffard, G, Beckstrom-Sternberg, SM, Thomas, P, McDowell, J, Maskeri, B, Hansen, N, Idol, J, Maduro, V, Lee-Lin, SQ, Prasad, A, Portnoy, M, Haussler, D & Green, ED 2003, 'Identification and characterization of multi-species conserved sequences', Genome Research, vol. 13, no. 12, pp. 2507-2518. https://doi.org/10.1101/gr.1602203
Margulies EH, Blanchette M, Thomas J, Touchman J, Blakesley B, Bouffard G et al. Identification and characterization of multi-species conserved sequences. Genome Research. 2003 Dec;13(12):2507-2518. https://doi.org/10.1101/gr.1602203
Margulies, Elliott H. ; Blanchette, Mathieu ; Thomas, Jim ; Touchman, Jeff ; Blakesley, Bob ; Bouffard, Gerry ; Beckstrom-Sternberg, Stephen M ; Thomas, Pam ; McDowell, Jenny ; Maskeri, Baishali ; Hansen, Nancy ; Idol, Jackie ; Maduro, Valerie ; Lee-Lin, Shih Queen ; Prasad, Arjun ; Portnoy, Matt ; Haussler, David ; Green, Eric D. / Identification and characterization of multi-species conserved sequences. In: Genome Research. 2003 ; Vol. 13, No. 12. pp. 2507-2518.
@article{b9b892d04b64436d9f70d25001c485f7,
title = "Identification and characterization of multi-species conserved sequences",
abstract = "Comparative sequence analysis has become an essential component of studies aiming to elucidate genome function. The increasing availability of genomic sequences from multiple vertebrates is creating the need for computational methods that can detect highly conserved regions in a robust fashion. Towards that end, we are developing approaches for identifying sequences that are conserved across multiple species; we call these {"}Multi-species Conserved Sequences{"} (or MCSs). Here we report two strategies for MCS identification, demonstrating their ability to detect virtually all known actively conserved sequences (specifically, coding sequences) but very little neutrally evolving sequence (specifically, ancestral repeats). Importantly, we find that a substantial fraction of the bases within MCSs (∼70{\%}) resides within non-coding regions; thus, the majority of sequences conserved across multiple vertebrate species has no known function. Initial characterization of these MCSs has revealed sequences that correspond to clusters of transcription factor-binding sites, non-coding RNA transcripts, and other candidate functional elements. Finally, the ability to detect MCSs represents a valuable metric for assessing the relative contribution of a species' sequence to identifying genomic regions of interest, and our results indicate that the currently available genome sequences are insufficient for the comprehensive identification of MCSs in the human genome.",
author = "Margulies, {Elliott H.} and Mathieu Blanchette and Jim Thomas and Jeff Touchman and Bob Blakesley and Gerry Bouffard and Beckstrom-Sternberg, {Stephen M} and Pam Thomas and Jenny McDowell and Baishali Maskeri and Nancy Hansen and Jackie Idol and Valerie Maduro and Lee-Lin, {Shih Queen} and Arjun Prasad and Matt Portnoy and David Haussler and Green, {Eric D.}",
year = "2003",
month = "12",
doi = "10.1101/gr.1602203",
language = "English (US)",
volume = "13",
pages = "2507--2518",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "12",

}

TY - JOUR

T1 - Identification and characterization of multi-species conserved sequences

AU - Margulies, Elliott H.

AU - Blanchette, Mathieu

AU - Thomas, Jim

AU - Touchman, Jeff

AU - Blakesley, Bob

AU - Bouffard, Gerry

AU - Beckstrom-Sternberg, Stephen M

AU - Thomas, Pam

AU - McDowell, Jenny

AU - Maskeri, Baishali

AU - Hansen, Nancy

AU - Idol, Jackie

AU - Maduro, Valerie

AU - Lee-Lin, Shih Queen

AU - Prasad, Arjun

AU - Portnoy, Matt

AU - Haussler, David

AU - Green, Eric D.

PY - 2003/12

Y1 - 2003/12

N2 - Comparative sequence analysis has become an essential component of studies aiming to elucidate genome function. The increasing availability of genomic sequences from multiple vertebrates is creating the need for computational methods that can detect highly conserved regions in a robust fashion. Towards that end, we are developing approaches for identifying sequences that are conserved across multiple species; we call these "Multi-species Conserved Sequences" (or MCSs). Here we report two strategies for MCS identification, demonstrating their ability to detect virtually all known actively conserved sequences (specifically, coding sequences) but very little neutrally evolving sequence (specifically, ancestral repeats). Importantly, we find that a substantial fraction of the bases within MCSs (∼70%) resides within non-coding regions; thus, the majority of sequences conserved across multiple vertebrate species has no known function. Initial characterization of these MCSs has revealed sequences that correspond to clusters of transcription factor-binding sites, non-coding RNA transcripts, and other candidate functional elements. Finally, the ability to detect MCSs represents a valuable metric for assessing the relative contribution of a species' sequence to identifying genomic regions of interest, and our results indicate that the currently available genome sequences are insufficient for the comprehensive identification of MCSs in the human genome.

AB - Comparative sequence analysis has become an essential component of studies aiming to elucidate genome function. The increasing availability of genomic sequences from multiple vertebrates is creating the need for computational methods that can detect highly conserved regions in a robust fashion. Towards that end, we are developing approaches for identifying sequences that are conserved across multiple species; we call these "Multi-species Conserved Sequences" (or MCSs). Here we report two strategies for MCS identification, demonstrating their ability to detect virtually all known actively conserved sequences (specifically, coding sequences) but very little neutrally evolving sequence (specifically, ancestral repeats). Importantly, we find that a substantial fraction of the bases within MCSs (∼70%) resides within non-coding regions; thus, the majority of sequences conserved across multiple vertebrate species has no known function. Initial characterization of these MCSs has revealed sequences that correspond to clusters of transcription factor-binding sites, non-coding RNA transcripts, and other candidate functional elements. Finally, the ability to detect MCSs represents a valuable metric for assessing the relative contribution of a species' sequence to identifying genomic regions of interest, and our results indicate that the currently available genome sequences are insufficient for the comprehensive identification of MCSs in the human genome.

UR - http://www.scopus.com/inward/record.url?scp=10744222156&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=10744222156&partnerID=8YFLogxK

U2 - 10.1101/gr.1602203

DO - 10.1101/gr.1602203

M3 - Article

VL - 13

SP - 2507

EP - 2518

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 12

ER -