Stanford University logo
Path Dept logo
Genetics Dept logo
SOM logo


The C. savignyi Reference Genome and Genetic Map

The genome of a single C. savignyi individual from San Francisco Bay was shotgun-sequenced by the Whitehead Institute's Center for Genome Research (now of the Broad Institute). Assembly of the C. savignyi genome was complicated considerably by the extreme heterozygosity of the sequenced individual, which is due to the enormous rate of polymorphism in the population. The Broad group generated a high-quality assembly using the Arachne2 assembler with parameters that ensured locally independent assemblies of the two alleles. The resulting assembly was roughly twice the expected genome size, reflecting the presence of two alleles of every locus.

We used the initial Arachne2 assembly as a starting point to generate a nonredundant reference sequence. A pairwise whole genome alignment of the C. savignyi assembly with itself was first generated in which the allelic regions were aligned with each other. This allowed the removal of assembly chaff such as overlaps of contig ends, and the bridging of contig and supercontig gaps in one allele by the other. From the alignment, the reference sequence was parsed into a nonredundant assembly of "Reftigs". The long-range contiguity of the assembly was thus improved dramatically as can be seen in the following statistics.

Number of Reftigs


Size in Megabases


Megabases in 100 largest Reftigs


Reftig N50 (Kb)


Contig N50 (Kb)


Download the C. savignyi Reftigs (Version 2.1, March 2006), lowercase repeat-masked: Zip archive, 50MB.

Update May 16, 2008: The C. savignyi Genetic Map

We have integrated the reference sequence with a comprehensive genetic map. A large fraction of the reftigs of the reference sequence are now in linkage groups; many reftigs are ordered within the linkage groups; and many ordered reftigs are also oriented. C. savignyi has 14 linkage groups.

The salient data are in the following tables (Note: these are Excel files).
Table 1: List of reftigs by linkage group with order and orientation
Table 2: Complete genetic map data
Table 3: C. savignyi reftigs accociated to linkage groups by comparison to C. inestinalis

We also assigned unmapped C. intestinalis scaffolds to C. intestinalis chromosomes:
Table 4: C. intestinalis scaffolds accociated to chromosomes by comparison with C. savignyi

Further items of interest:
Dotplots of alignments between C. savignyi reftigs and C. intestinalis scaffold, by linkage group (pdf).
Supplemental Methods: construction of a C. savignyi linkage map (pdf).

Update May 7, 2007

Two papers (both Small et al., 2007), one describing the generation of the reference sequence and another exploring the extreme polymorphism of C. savignyi, have been published and can be found off the papers link on the left.

A whole genome alignment between C. intestinalis and C. savignyi, generated at LBNL, is available for browsing at VISTA. The alignment is also available for download. NOTE: These alignments were generated with the Sept 2005 sequence (Version 2.0).

Update September 15, 2006

The reference genome is now in Ensembl. NOTE: Ensembl shows the Sept 2005 sequence (Version 2.0).

Update April 24th, 2006

Supplementary Data for Small et al. (2007), PNAS: Extreme genomic variation in a natural population
RECON library (text file, 1MB)
Haplome alignments (tgz file, 99MB)

Update March 1st, 2006

A minor bug in the previous reference sequence (Version 2.0, September 2005) was fixed:
Bug documentation (zip Archive, three files; 0.9MB)
Sept 2005 Sequence (do not use, archival purpose only)


The reference sequence was generated at Stanford by Kerrin Small from Arend Sidow's group with help from Michael Brudno (then of Serafim Batzoglou's group, now in Toronto).

David Johnson from Arend Sidow's group collected the biggest squirt he could find in the San Francisco Bay and prepared its DNA. Nicole Stange-Thomann and colleagues from Eric Lander's genome center sequenced it, and Jade Vinson (with the help of David Jaffe and colleagues, also at the Whitehead) generated the initial Arachne assembly.

Zhirong Bao from Sean Eddy's group generated the RECON library (from which non-mobile element encoded protein-coding genes, tRNAs, and rRNAs were removed by hand by Kerrin Small) that was used for the masking of the sequence.