Difference between revisions of "Xenopus Genome Project"
(→Assembled transcripts from RNA-seq data) |
|||
Line 39: | Line 39: | ||
** 14,007 pub_long sequences are mapped to 7,977 v3_ref sequences. | ** 14,007 pub_long sequences are mapped to 7,977 v3_ref sequences. | ||
+ | == Amin201106_XENLA == | ||
+ | (Courtesy of Nirav Amin & Frank Conlon, University of North Carolina at Chapel Hill) | ||
+ | Contact info: [mailto:nmamin@email.unc.edu Nirav Amin], [mailto:frank_conlon@med.unc.edu Frank Conlon] | ||
+ | |||
+ | * [[xdata:/tx/Amin201106_XENLA.pub_all.fasta]] (115,640 sequences) | ||
+ | ** 26,763 pub_all sequences are mapped to 8,332 v2_ref sequences. | ||
+ | ** 39,703 pub_all sequences are mapped to 17,372 v3_ref sequences. | ||
+ | |||
+ | * [[xdata:/tx/Amin201106_XENLA.pub_long.fasta]] (31,316 sequences) | ||
+ | ** 6,921 pub_long sequences are mapped to 11,643 v2_ref sequences. | ||
+ | ** 11,488 pub_long sequences are mapped to 16,593 v3_ref sequences. | ||
+ | |||
+ | == Chung201110_XENLA == | ||
+ | (Courtesy of Meii Chung & John Wallingford, University of Texas at Austin) | ||
+ | Contact info: [mailto:meii@utexas.edu Meii Chung], [mailto:wallingford@mail.utexas.edu John Wallingford] | ||
+ | * [[xdata:/tx/Chung201110_XENLA.pub_all.fasta]] (109,258 sequences) | ||
+ | ** 31,287 pub_all sequences are mapped to 7,126 v2_ref sequences. | ||
+ | ** 44,577 pub_all sequences are mapped to 15,240 v3_ref sequences. | ||
+ | |||
+ | * [[xdata:/tx/Chung201110_XENLA.pub_long.fasta]] (20,682 sequences) | ||
+ | ** 4,817 pub_long sequences are mapped to 8,614 v2_ref sequences. | ||
+ | ** 7,680 pub_long sequences are mapped to 11,735 v3_ref sequences. | ||
= CHORI-219 BAC sequencing = | = CHORI-219 BAC sequencing = |
Revision as of 08:33, 25 October 2011
Xenopus laevis is an essential model organism in several areas of biology. In addition to the key attributes of these embryos for in vivo imaging, cell-free extracts from Xenopus provide among the most powerful in vitro systems for studies of cell and molecular biology. A complete sequence of the X. laevis genome is an essential resource for accurate identification of peptides for mass-spec analyses, for cloning of an ORFeome, for identifying evolutionarily conserved regulatory regions, and for design of morpholino-oligonucleotides for gene knockdowns.
The Wallingford and Marcotte labs have obtained funding from the Texas Institute for Drug and Diagnostic Development (TI3D) to begin sequencing of the X. laevis genome. We are primarily working with Scott Hunicke-Smith at the University of Texas Genome Sequencing and Analysis facility, with funding sufficient for ~20x coverage of the X. laevis genome using ABI SOLiD next-generation sequencing.
Contents |
Assembled transcripts from RNA-seq data
Disclaimer
|
If you have any question about this data in general, please contact to Taejoon Kwon.
TXGP201107_XENLA_EGG
Contact info:Edward Marcotte,Taejoon Kwon.
- xdata:/tx/TXGP201107_XENLA_EGG.pub_all.fasta (37,470 sequences)
- 9,780 pub_all sequences are mapped to 6,705 v2_ref sequences.
- 13,362 pub_all sequences are mapped to 9,920 v3_ref sequences.
- xdata:/tx/TXGP201107_XENLA_EGG.pub_long.fasta (20,005 sequences; only > 400 bp)
- 7,309 pub_long sequences are mapped to 6,082 v2_ref sequences.
- 9,439 pub_long sequences are mapped to 8,137 v3_ref sequences.
Park201106_XENLA
(Courtesy of Tae Joo Park & Richard Harland, University of California at Berkeley) Contact info: Tae Joo Park, Richard Harland.
- xdata:/tx/Park201106_XENLA.pub_all.fasta (109,667 sequences)
- 41,847 pub_all sequences are mapped to 7,522 v2_ref sequences.
- 59,419 pub_all sequences are mapped to 16,332 v3_ref sequences.
- xdata:/tx/Park201106_XENLA.pub_long.fasta (19,716 sequences)
- 10,790 pub_long sequences are mapped to 5,283 v2_ref sequences.
- 14,007 pub_long sequences are mapped to 7,977 v3_ref sequences.
Amin201106_XENLA
(Courtesy of Nirav Amin & Frank Conlon, University of North Carolina at Chapel Hill) Contact info: Nirav Amin, Frank Conlon
- xdata:/tx/Amin201106_XENLA.pub_all.fasta (115,640 sequences)
- 26,763 pub_all sequences are mapped to 8,332 v2_ref sequences.
- 39,703 pub_all sequences are mapped to 17,372 v3_ref sequences.
- xdata:/tx/Amin201106_XENLA.pub_long.fasta (31,316 sequences)
- 6,921 pub_long sequences are mapped to 11,643 v2_ref sequences.
- 11,488 pub_long sequences are mapped to 16,593 v3_ref sequences.
Chung201110_XENLA
(Courtesy of Meii Chung & John Wallingford, University of Texas at Austin) Contact info: Meii Chung, John Wallingford
- xdata:/tx/Chung201110_XENLA.pub_all.fasta (109,258 sequences)
- 31,287 pub_all sequences are mapped to 7,126 v2_ref sequences.
- 44,577 pub_all sequences are mapped to 15,240 v3_ref sequences.
- xdata:/tx/Chung201110_XENLA.pub_long.fasta (20,682 sequences)
- 4,817 pub_long sequences are mapped to 8,614 v2_ref sequences.
- 7,680 pub_long sequences are mapped to 11,735 v3_ref sequences.
CHORI-219 BAC sequencing
We have started the first runs by sequencing 96 BACs from the CHORI-219 library (vector: pBACGK1.1) at ~100X coverage. The selected BACs include ~70 genes of interest (Shroom3, Wnt5a, Glypican-4, Noggin, Gremlin, Pax6, Formin, etc., as initially identified by the group of Jan-Fang Cheng via probing the CHORI-219 library), as well as 10 BACs that have already been sequenced by the DOE Joint Genome Institute/HudsonAlpha Genome Sequencing Center to serve as positive controls for the sequencing and assembly pipeline.
- CHORI-219 BACs: List of 96 test BACs (MS Excel file)
See /XENLA_SA09023 for more details. Three mate paired libraries were sequenced:
- X_laevis_WG - the X. laevis whole genome library, 5kb insert size - about 4.4GB raw data, 0.4GB high quality data
- X_laevis_2kb - The set of 96 BACs, with 2kb insert size - about 3.6GB raw data, 0.3GB high quality data
- X_laevis_5kb - The set of 96 BACs, with 5kb insert size - about 2.8GB raw data, 0.2GB high quality data
This (very roughly) corresponds to >600X coverage by raw data, ~50X coverage by high quality data, of the BAC set.
- Given that we currently see better mapping of the shotgun SA09023 reads to X. tropicalis than to X. laevis (both to BACs and mRNAs), we're confirming the sample identity before continuing with whole genome sequencing. See the 'sanity check' /Species_Identification for details.
J-strain whole genome sequencing
In addition, we are generating several mate pair libraries of different sizes from genomic DNA prepared by Mustafa Khokha from J strain frogs obtained from Jacques Robert, sequencing each to multiple-fold coverage of the genome.
The primary data from this project will be made available as soon as possible for use by the community. We plan to periodically post reports on our progress below.
References
- TXGP_reference - Public resources compiled to be used in TXGP.
- TXGP_ens63_reference - Some statistics derived from EnsEMBL-63 (used as a reference in TXGP).
- TXGP_Data_Description - Data collected in TXGP.