Coffea canephora

Organism Details
Common Name coffea
Genus Coffea
Species canephora
Abbreviation C.canephora

Canephora is a genus of flowering plants in the Rubiaceae family. Among Coffea species, C. canephora has the widest natural distribution which extends west to east from Guinea to Uganda, and north to south from Cameroon to Angola. C. canephora (2n=2x=22) is an allogamous diploid tree consisting of polymorphic populations.

The Coffea canephora genome represents an important stepping-stone for plant gene and genome evolution studies.
The Coffea canephora reference genome sequence results from collaboration between Genoscope, IRD and Cirad (UMRs AGAP, DIADE and RPB) funded by ANR. The sequenced genotype (2n=22, 1C=710 Mb) is a doubled-haploid plant (accession DH200-94) produced by IRD from the clone IF200 based on the haploid plants occurring spontaneously in association with polyembryony.

The sequence was completed and analyzed in collaboration with several teams in particular of the International Coffee Genome Sequencing Consortium and was published in: “The Coffee Genome Provides Insight into Convergent Evolution of Caffeine Biosynthesis”. Denoeud et al., 2014 Science.


A high-quality draft coffee genome was generated and deep transcriptome sequencing performed. These data allow us to explore the coffee genome structure and organization, as well as important aspects of the biology and evolution of this major crop.

C. canephora genome v1.0
Analysis NameWhole Genome Assembly and Annotation of Coffea canephora (Genoscope)
SoftwareGaze (1.0)
Sourcemanual_curation
Date performed2012-10-17
Materials & MethodsThe Coffea canephora reference genome sequence results from collaboration between Genoscope, CIRAD and IRD (UMRs AGAP, DIADE and RPB) funded by ANR. The sequenced genotype (2n=22, 1C=710 Mb) is a doubled-haploid plant (accession DH200-94) produced by IRD from the clone IF200 based on the haploid plants occurring spontaneously in association with polyembryony.

This version (v.1) of the assembly is 580 Mb spread over 13,345 scaffolds. 25,574 protein-coding loci have been predicted, each with a primary transcript.

Publication :
The sequence was completed and analyzed in collaboration with several teams in particular of the International Coffee Genome Sequencing Consortium and was published in: “The Coffee Genome Provides Insight into Convergent Evolution of Caffeine Biosynthesis”. Denoeud et al., 2014 Science.

 

Assembly

The 349 anchored scaffolds were joined to generate 11 pseudomolecules that were named according to the linkage group nomenclature. Each scaffold join was denoted with 100 N base pairs. 139 mapped scaffolds have known orientation along the pseudomolecules while the remaining 210 mapped scaffolds were assigned with a random orientation.

12,996 scaffolds (totalling 204 Mb) remain unmapped in the current genome release and were grouped arbitrarily into a pseudomolecule named chr Un (for “unknown”), each scaffold being joined by 100 Ns.

This assembly is currently viewable in GBrowse.

Download

Complete Assembly

By Chromosome

Statistics

1. Sequencing Method

The genome was sequenced using a Whole Genome Shotgun strategy. All data were generated using Next generation sequencers (Roche/454 GSFLX and Illumina GAIIx), except for sequences of BAC ends that were produced by paired-end sequencing of cloned inserts using Sanger technology on ABI3730xl sequencers.

 Number of readsNumber of basesCoverageFragment size (bp)
Roche/454
single end reads
28,725,26710,759,034,87415.15NA
Roche/454 single
end long reads
12,403,671 5,855,380,7268.25NA
Roche/454 mate
pairs reads
4,113,3251,325,854,3391.8713,600
Roche/454 mate
pairs reads
4,803,3631,563,848,4562.207,800
Roche/454 mate
pairs reads
4,370,2961,227,506,9741.732,800
Sanger BAC-ends143,605193,347,1460.27130,000-170,000
Illumina single
end reads
56,483,0594,114,934,0005.80NA
Illumina pairedend
reads
226,531,63638,641,750,00054.43300-600

Table 1. Raw sequencing data overview.

2. Assembly Method

454 reads and Sanger BAC ends were assembled using Newbler version MapAsmResearch-04/09/2010-patch-18/17/2010. From the initial 54,415,922 reads, about 86.31% were assembled. We obtained 91,439 contigs that were linked into 13,345 scaffolds. The contig N50 was 14.8 kb, and the scaffold N50 was 1.3Mb.

 Raw assemblyFinal Assembly
 ContigsScaffoldsContigsScaffolds
Number91,43913,34525,21613,345
Cumulative size (Mb)475.6569.4471.3568.6
Average size (kb)5.242.618.742.6
N50 size (kb)14.81,26151.11,261
N50 number8,5091082,290108
N80 size (kb)4.365.215.565.3
N80 number26,1456377,259635
Largest size (kb)193.89,035817.69,028

Table 2. Assembly statistics.

3. Anchoring the assembly on a genetic map

All available sequence-based markers from the consensus genetic linkage map were BLASTaligned against the scaffolds. Sequence-based markers were filtered out and only markers presenting a single hit were retained. More precisely, a hit was taken into account if its HSPs showed a minimal identity per cent of 90%, conformed to a maximal distance of 3000 bp between HSPs, and displayed a cumulated size greater than or superior to 60% of the markersequence length. 1295 markers were unambiguously located on the assembly and used in combination with 1644 RADseq markers to anchor and orient the scaffolds along the C. canephora pseudomolecules.

A total of 349 scaffolds covering approximately 364 megabases (Mb) (64% of the assembled genome sequence) were anchored to the 11 C. canephora chromosomes, among which 139 representing 290 Mb (51% of the assembled genome) were both anchored and oriented. 98% of the 100 largest scaffolds and 96.4% of scaffolds larger than 1Mb were anchored on chromosomes.The overview of the assembly anchoring on the genetic map is reported in Table 3.

Pseudomolecule
(Linkage group)
Number of
scaffolds
Size (Mb)No. of genes modelGene density
(genes/Mb)
1 (A)3438.2219857.5
2 (B)4254.5400073.4
3 (C)2932.0163251.0
4 (D)3528.2172761.2
5 (E)3329.1166157.1
6 (F)3137.3283976.1
7 (G)2129.8214672.0
8 (H)3931.6171854.4
9 (I)2622.3109449.1
10 (J)3427.6165359.9
11 (K)2533.5175352.3
Un12996205.6360317.5

Table 3. Overview of the anchoring of the assembly on the C.canephora linkage groups.

Gene Predictions

Protein coding genes in the C. canephora genome were automatically annotated using various sources of evidence (cDNAs, RNA-Seq, protein alignments, and ab initio predictions) that were combined into gene models. We obtained 25,574 protein-coding gene models.

Download:

 

Expression data resources

Contigs

Track NameSpeciesTissue/physiological conditionPlatform usedNb contigsNb reads assembledSingle/PairedAssembly software
Canephora Solexa contigsC.canephorastem,leaf,flowerSolexa52,683172,963.686SingleOases
Coffea Gmorse modelsC.canephorastem,leaf,flowerSolexa70,124172,963.686SingleG-Mo.R-Se
56216 home-made unigenesC.canephoramixmix56,216#Single2 runs of Cap3 applied on contigs
Catura 454 contigsC.arabicaleaf45422,776493,984SingleNewbler
Arabica 454 contigsC.arabicaembryo, endosperm45424,799421,307SingleNewbler

ESTs / RNASeq reads

Track NameSpeciesAccessionTissue/physiological conditionPlatform usedNb readsRead lengthSingle/Paired
Public ESTC.canephora---mixSanger255,032 Single
Old leaves 1C.canephoraDH 200-
-
94
old leavesSolexa28,179,77776Single
Old leaves 2C.canephoraDH 200
-
-94
old leavesSolexa31,049,44976Single
Stem and flower 1C.canephoraDH 200
-
-94
stem, flowerSolexa28,566,58176Single
Stem and flower 2C.canephoraDH 200
-
-94
stem, flowerSolexa30,498,33276Single
Young leaves 1C.canephoraDH 200
-
-94
young leavesSolexa25,899,75976Single
Young leaves 2C.canephoraDH 200
-
-94
young leavesSolexa28,769,78876Single
PistilC.canephoracv BP961pistilSolexa27,710,036100Single
RootC.canephoraacc. T3518rootSolexa31,038,821100Single
StamenC.canephoracv BP961stamenSolexa22,087,867100Single
LeafC.canephoraacc. T3518leafSolexa53,018,844100Single

UNIPROT proteins

Track NameSpeciesNb matches
Selected protein matchesGentianales, Arabidopsis, Potato, Tomato, Vitis155,283
Other Uniprot matchesmix7,498,085

 

Anchoring

Data Type Summary
The following data types are currently present for this organism
Feature Type Count
chromosome 12
gene 25,572
mRNA 25,573
Protein 25,574
CDS 130,503
GO Analysis Reports
Any analysis with GO results related to this organism are available for viewing. For further information, see the analysis information page.
Close [X]
Term Information
Resources