The C. savignyi Reference Genome and Genetic Map
The genome of a single C. savignyi individual from San Francisco Bay was shotgun-sequenced by the Whitehead Institute's Center for Genome Research (now of the Broad Institute). Assembly of the C. savignyi genome was complicated considerably by the extreme heterozygosity of the sequenced individual, which is due to the enormous rate of polymorphism in the population. The Broad group generated a high-quality assembly using the Arachne2 assembler with parameters that ensured locally independent assemblies of the two alleles. The resulting assembly was roughly twice the expected genome size, reflecting the presence of two alleles of every locus.
We used the initial Arachne2 assembly as a starting point to generate a nonredundant reference sequence. A pairwise whole genome alignment of the C. savignyi assembly with itself was first generated in which the allelic regions were aligned with each other. This allowed the removal of assembly chaff such as overlaps of contig ends, and the bridging of contig and supercontig gaps in one allele by the other. From the alignment, the reference sequence was parsed into a nonredundant assembly of "Reftigs". The long-range contiguity of the assembly was thus improved dramatically as can be seen in the following statistics.
Download the C. savignyi Reftigs (Version 2.1, March 2006), lowercase repeat-masked: Zip archive, 50MB.
Update May 16, 2008: The C. savignyi Genetic Map
We have integrated the reference sequence with a comprehensive genetic map. A large fraction of the reftigs of the reference sequence are now in linkage groups; many reftigs are ordered within the linkage groups; and many ordered reftigs are also oriented. C. savignyi has 14 linkage groups.
The salient data are in the following tables (Note: these are Excel files).
We also assigned unmapped C. intestinalis scaffolds to C. intestinalis chromosomes:
Further items of interest:
Update May 7, 2007
Two papers (both Small et al., 2007), one describing the generation of the reference sequence and another exploring the extreme polymorphism of C. savignyi, have been published and can be found off the papers link on the left.
A whole genome alignment between C. intestinalis and C. savignyi, generated at LBNL, is available for browsing at VISTA. The alignment is also available for download. NOTE: These alignments were generated with the Sept 2005 sequence (Version 2.0).
Update September 15, 2006
The reference genome is now in Ensembl. NOTE: Ensembl shows the Sept 2005 sequence (Version 2.0).
Update April 24th, 2006
Update March 1st, 2006
A minor bug in the previous reference sequence (Version 2.0, September 2005) was fixed:
The reference sequence was generated at Stanford by Kerrin Small from Arend Sidow's group with help from Michael Brudno (then of Serafim Batzoglou's group, now in Toronto).
David Johnson from Arend Sidow's group collected the biggest squirt he could find in the San Francisco Bay and prepared its DNA. Nicole Stange-Thomann and colleagues from Eric Lander's genome center sequenced it, and Jade Vinson (with the help of David Jaffe and colleagues, also at the Whitehead) generated the initial Arachne assembly.
Zhirong Bao from Sean Eddy's group generated the RECON library (from which non-mobile element encoded protein-coding genes, tRNAs, and rRNAs were removed by hand by Kerrin Small) that was used for the masking of the sequence.