XENLA GeneModel2012

From Marcotte Lab
Revision as of 11:54, 24 October 2012 by Taejoon (Talk | contribs)

Jump to: navigation, search

Contents

Data summary

Disclaimer

  • Data users may freely download and analyze sequences posted here.
  • Data users may use data to analyze their own data, i.e. reference database for MS/MS proteomics data, and/or RNA-seq data.
  • The publication and presentation of global analysis of data with these sequences are not allowed until 'data owner' (Dr. Masanori Taira) published the paper. As soon as the paper is accepted, we will post that info on this website.
  • If you have more question about this data, please contact Dr. Masanori Taira, Dr. Edward Marcotte, or Dr. Taejoon Kwon.

Taira201203_XENLA_tissue

Collect total RNA from 14 Tissue of Xenopus laevis J strain.

  • Brain, eye, heart, intestine, kidney, liver, lung, muscle, ovary, pancreas, skin, spleen, stomach, testis
  • Sons & daughters of single pair of frogs (Their mother frog was used for 1st BAC-end sequencing)
  • Standard Illumina sample prep. (poly-A capture)
  • Illumina HiSeq 2000, 2x100 bp
  • 108.5 billions of nucleotide calls in total.
  • 55M ~ 130M reads/tissue (27M ~ 65M pairs)
  • Brief report for data processing

Taira201203_XENLA_stage

Collect total RNA from 11 different developmental stages of Xenopus laevis J strain embryo.

  • Stage 01, 08, 09, 10.5, 12, 15, 20, 25, 30, 35, 40
  • Sons & daughters of single pair of frogs (their mother frog was used for 1st BAC-end sequencing)
  • Standard Illumina sample prep. (poly-A capture)
  • Illumina HiSeq 2000, 2x100 bp
  • 163.8 billions of nucleotide calls in total.
  • 40M ~ 110M reads/tissue (20M ~ 55M pairs)
  • Brief report for data processing

Assembled transcripts

Raw sequences

Orthologous sequences

  1. Take all orthologous candidate genes from BLASTP results (top-3 in max. See [#] for the details.).
  2. Through the order of 'XENLA'->'HUMAN'->'XENTR'->'MOUSE'->'DANRE'->'CHICK'->'CAEEL'->'DROME' in species, report assembled transcript Id with following conditions.
    • An assembled transcript has orthologous candidates in a given species, both as target (database in BLAST search) and query.
    • There is at least one overlap between query list and target list. For example, the same gene in other organism should be identified as one of top 3 hits in bi-directional BLAST search.
    • If there are more than one overlapped genes, report all of them.
    • If an assembled transcript has candidate orthologous gene in one species, stop searching orthologs and move on to next assembled transcript. So, if a transcript has orthologous gene satisfied this criteria in HUMAN, orthologs in other species next in order, i.e. MOUSE, DANRE, CHICK, etc., are not searched. Main reason for this is to remove redundancy of highly conserved across all species.

Here's candidate orthologs for each assembled transcripts:

Based on this table, we selected transcripts/peptides as non-redundant sequence set. 'orthoGeneAll' set contains all sequences reported on 'nr_gene_list' table, and 'orthoGeneOne' set contains the longest sequence per orthologous gene group. For example, in tissue sample set, the following three transcripts are reported as known X. laevis rfx2 gene.

Taira201203_XENLA_tissue_00066978	XENLA	rfx2|XB-GENE-991777,rfx6|XB-GENE-6488525
Taira201203_XENLA_tissue_00144530	XENLA	rfx2|XB-GENE-991777
Taira201203_XENLA_tissue_00191686	XENLA	rfx2|XB-GENE-991777

In 'orthoGeneAll', all three sequences are reported, although in 'orthoGeneOne', Taira201203_XENLA_tissue_00144530 is not reported (it is shorter than Taira201203_XENLA_tissue_00191686). We should mention that, in this example, we did not pick one of three, because Taira201203_XENLA_tissue_00066978 has another canddiate gene, rfx6, that is not presented in other two genes.

Annotation

Orthologous genes

We used EnsEMBL-66 as main protein sequences. For X. laevis, we used protein sequences from XenBase (downloaded on Dec-2011). These are top-3 genes (based on E-value), with > 40% aligned length (based on query sequence). It should be mentioned that this is based on simple BLASTP search. We are currently working on more accurate orthology analysis based on phylgenetic tree based method.

XENLA (X. laevis) HUMAN XENTR (X. tropicalis) MOUSE DANRE (zebrafish) CHICK (chicken) CAEEL (worm) DROME (fly)
Stage pep as query Stage pep --> XENLA Stage pep --> HUMAN Stage pep --> XENTR Stage pep --> MOUSE Stage pep --> DANRE Stage pep --> CHICK Stage pep --> CAEEL Stage pep --> DROME
Stage pep as target Stage pep --> XENLA HUMAN --> Stage pep XENTR --> Stage pep MOUSE --> Stage pep DANRE --> Stage pep CHICK --> Stage pep CAEEL --> Stage pep DROME --> Stage pep
Tissue pep as query Tissue pep --> XENLA Tissue pep --> HUMAN Tissue pep --> XENTR Tissue pep --> MOUSE Tissue pep --> DANRE Tissue pep --> CHICK Tissue pep --> CAEEL Tissue pep --> DROME
Tissue pep as target Tissue pep --> XENLA HUMAN --> Tissue pep XENTR --> Tissue pep MOUSE --> Tissue pep DANRE --> Tissue pep CHICK --> Tissue pep CAEEL --> Tissue pep DROME --> Tissue pep

Micriarray

Contributors

  • Masanori Taira (Graduate School of Science, University of Tokyo)
  • Shuji Takahashi (Komaba Organization for Educational Excellence, College of Arts and Sciences, University of Tokyo)
  • Toshiaki Tanaka (Tokyo Institute of Technology)
  • Atsushi Toyoda and Asao Fujiyama (National Institute of Genetics)
  • Yutaka Suzuki (Graduate School of Frontier Sciences, University of Tokyo)