Surveying the complex polyploid sugarcane genome sequence

Memorando técnico

Today there is a lack of publicly available genomics data for sugarcane; availability of such information will eventually strengthen and accelerate breeding programs. This lack of information is partly due to sugarcane’s complex genome structure, which is not amenable to current high-­‐throughput short-­‐read sequencing technologies.
Current sugarcane cultivars are interspecific hybrids, with a ploidy level between 8 and 12, and with an estimated haploid genome size ofapprox. 760-­‐930Mbps. We have used the new TruSeq Synthetic Long Read sequencing technology from Illumina, In a pilot project, to sequence the sugarcane genome (variety SP80-­‐3280) at a shallow coverage.
We have generated 9 libraries, accounting for more than 5Gbp of sequence data, thus giving an estimated coverage of around 5x of the haploid genome. The current assembly has over 1Gbp of assemble long reads, and we could annotate 300,000 protein-­‐coding genes exploiting RNASeq data previously generated in the group.

Identification of a highly conserved gene set in eukaryotes, have revealed a coverage of approx. 90% of the gene space. The genome assembly, gene prediction and further data are available via .
We have released a BLAST server to access the draft genome sequence, available at:


