基于自适应密度聚类的多准则主动学习方法
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TH741

基金项目:

河北省自然科学基金(F2020501040)项目资助


A multi-criteria active learning method based on adaptive density clustering
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    主动学习能够以更少的标注成本训练出更好的机器学习模型。 现有的 RD 算法与 QBC 算法的结合有效地解决了只考 虑单一标准的问题。 然而,RD 所基于的 K-means 聚类会将离群点也包括在内进而造成模型性能降低,而 QBC 则需要维护于多 个模型而间接返回样本的信息性. 针对上述问题,本文提出了一种基于自适应密度聚类的高斯过程回归(ADC-GPR)算法,通过 先聚类后直接利用不确定性进而高效选择样本。 该算法中的 ADC 聚类不仅对离群点鲁棒,还能根据数据集分布特性自适应聚 类,并为后续的 AL 提供了代表性样本点和其对应的簇,该方法在无监督选择时保证了代表性和多样性,在有监督选择时考虑 了信息性、代表性和多样性。 实验结果表明,在相同的抽样次数下将 ADC-GPR 算法与 RS、KS 以及 RD-GPR 算法相比,其平均 性能分别提升了 37. 3% 、8% 和 2. 8% ,ADC-GPR 算法的选择效率更高。

    Abstract:

    Active learning proves instrumental in training superior machine learning models while minimizing labeling costs. The combination of RD and QBC algorithms effectively addresses issues associated with considering only a single criterion. However, the K-means clustering upon which RD is based may include outliers, leading to a decrease in model performance, and QBC requires maintaining multiple models and indirectly provides sample information. To address these issues, we propose an adaptive density clustering-based Gaussian process regression ( ADC-GPR) algorithm, which efficiently selects samples by first clustering and then utilizing uncertainty directly. The ADC clustering in this algorithm is not only robust against outliers but also adapts to the distribution characteristics of the dataset, providing representative sample points and their corresponding clusters for subsequent AL. This method ensures both representativeness and diversity in unsupervised selection and considers informativeness, representativeness, and diversity in supervised selection. The experimental results demonstrate that compared to the RS, KS, and RD-GPR algorithms, the ADC-GPR algorithm exhibits an average performance improvement of 37. 3% , 8% , and 2. 8% respectively, with the same number of sampling iterations. Furthermore, the ADC-GPR algorithm demonstrates higher selection efficiency.

    参考文献
    相似文献
    引证文献
引用本文

贺忠海,朱温涵,陈旭旺,张晓芳.基于自适应密度聚类的多准则主动学习方法[J].仪器仪表学报,2024,45(3):179-187

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-05-31
  • 出版日期: