A De Novo Robust Clustering Approach for Amplicon-Based Sequence Data

Abstract : When analyzing microbial communities, an active and computational challenge concerns the categorization of 16S rRNA gene sequences into operational taxonomic units (OTUs). Established clustering tools use a one pass algorithm in order to tackle high numbers of gene sequences and produce OTUs in reasonable time. However, all of the current tools are based on a crisp clustering approach, where a gene sequence is assigned to one cluster. The weak quality of the output compared to more complex clustering algorithms, forces the user to post-process the obtained OTUs. Providing a membership degree when assigning a gene sequence to an OTU, will help the user during the post-processing task. Moreover it is possible to use this membership degree to automatically evaluate the quality of the obtained OTUs. So the goal of this work is to propose a new clustering approach that takes into account uncertainty when producing OTUs, and improves both the quality and the presentation of the OTUs results.
Document type :
Preprints, Working Papers, ...
Complete list of metadatas

Cited literature [15 references]  Display  Hide  Download

https://hal-clermont-univ.archives-ouvertes.fr/hal-01447699
Contributor : Alexandre Bazin <>
Submitted on : Monday, May 8, 2017 - 7:57:00 PM
Last modification on : Friday, October 5, 2018 - 1:10:09 PM
Long-term archiving on : Wednesday, August 9, 2017 - 3:34:05 PM

File

ArticleECCB.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01447699, version 2

Citation

Alexandre Bazin, Didier Debroas, Engelbert Mephu Nguifo. A De Novo Robust Clustering Approach for Amplicon-Based Sequence Data. 2017. ⟨hal-01447699v2⟩

Share

Metrics

Record views

299

Files downloads

312