胡奕绅,朱木春,殷鹏.基于多步筛选法的心脑血管疾病全基因组关联研究[J].集成技术,2019,8(5):72-85
基于多步筛选法的心脑血管疾病全基因组关联研究
Genome-Wide Association Study of Cardiovascular and CerebrovascularDiseases Based on Multi-Step Screening
  
DOI:10.12146/j.issn.2095-3135.20190702002
中文关键词:  心脑血管疾病;特征选择;单核苷酸多态性;多步筛选
英文关键词:cardio-cerebrovascular disease; feature selection; single-nucleotide polymorphism; multi-step selection
基金项目:国家自然青年科学基金项目(11801542);深圳市科创委学科布局项目(JCYJ20180703145002040)
作者单位
胡奕绅 中国科学院深圳先进技术研究院 深圳 518055;深圳大学 深圳 518061 
朱木春 中国科学院深圳先进技术研究院 深圳 518055 
殷鹏 中国科学院深圳先进技术研究院 深圳 518055 
摘要点击次数: 64
全文下载次数: 423
中文摘要:
      全基因组关联研究是研究复杂疾病和性状遗传效应的一种有效手段。现有关联分析主要用的是边缘统计检验的方法,但未考虑特征间相关性、阈值选取不稳定等问题。该文以心脑血管疾病为研究对象,提出了一种基于多步筛选法的全基因组关联分析新方法。该方法可以简要概括为以下 两步:首先利用 Gini 指数做特征初始筛选,获得一个候选单核苷酸多态性子集,再用基于随机森林的递归聚类消除法从单核苷酸多态性子集中发现关联单核苷酸多态性。实验结果表明,多步筛选法比单步特征选择的效果更好,基于 Gini 指数的基于随机森林的递归聚类消除法筛选的单核苷酸多态性子集与疾病的关联性更高。
英文摘要:
      Genome-wide association study (GWAS) is an effective method to study genetic variants associated with complex diseases or traits. Marginal statistical test is the common method of GWAS, however there following weakness such as lack of consideration of correlation between the features and unstable threshold selection. In this paper, we discuss a new method of GWAS based on multi-step tests model for cardiocerebrovascular disease. The method can be divided into the following two steps: Gini index is used for first step feature selection to achieve a subset of single-nucleotide polymorphisms (SNPs), and then random forest recursive cluster elimination (RF-RCE) filters the associated SNPs subset from first-step candidate SNP set. Experiment results show that the multi-step feature selection is better than the single-step feature selection, and the selected SNPs are more suitable for cardio-cerebrovascular disease prediction.
查看全文  查看/发表评论  下载PDF阅读器
关闭
微信关注二维码 用微信扫一扫 用微信扫一扫 用微信扫一扫

美女

美女图片

美女

美女图片