宏基因组中可移动序列的精确检测问题研究
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:


Accurate Detection of Mobile Sequence in Metagenome
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    基因组组装是宏基因组分析的主要挑战之一。通常假设所有测序序列均来源于同一个基因组,微生物中非常活跃的可移动元件给这个前提假设提出了重大质疑。文章将该质疑抽象为可移动元件与宿主染色体之间的二分类问题,准确的二分类性能将进一步促进宏基因组学方面的研究。基于宏基因组测序数据的数值化特征,详细考察特征选择算法 ReliefF、卡方检验和 Fisher 判别 t 检验,并结合分类模型逻辑回归、极限学习机、支持向量机和随机森林,验证最优可移动元件检测模型的性能。实验结果表明,ReliefF 特征选择算法和随机森林分类算法的融合模型,使用 100 个特征即可正确分类95% 以上的宏基因组测序数据,优于使用全部的 690 个特征。

    Abstract:

    Genome assembling is one of the challenges in metagenomic analysis. It is usually assumed that the sequencing reads are from the same genome. However, the mobile elements active in microbial genomes raise a critical question mark on this assumption. This work formulated this issue as a binary classification problem. The accurate discrimination of mobile elements from chromosomes could greatly facilitate the metagenomic analysis. After quantifying the sequencing reads in metagenome, the collaboration of binary classification algorithms with feature selection algorithms, including ReliefF, chi-squared test, and Fisher’s t-test was investigated. All feature subsets were tested using the classification algorithms such as logisitic regression, extreme learning machine, support vector machine and random forest. Experimental results demonstrate that the model based on ReliefF algorithm and Random Forest algorithm achieves over 95% in accuracy with only 100 features, which outperforms the model utilizing all 690 features.

    参考文献
    相似文献
    引证文献
引用本文

引文格式
彭 超,王 普,葛瑞泉,等.宏基因组中可移动序列的精确检测问题研究 [J].集成技术,2016,5(2):85-96

Citing format
PENG Chao, WANG Pu, GE Ruiquan, et al. Accurate Detection of Mobile Sequence in Metagenome[J]. Journal of Integration Technology,2016,5(2):85-96

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2016-04-01
  • 出版日期:
文章二维码