种群优化联合鲁棒距离度量的公平性K-means算法
DOI:
CSTR:
作者:
作者单位:

1.南京信息工程大学人工智能学院(未来技术学院)南京210044;2.南京信息工程大学软件学院南京210044

作者简介:

通讯作者:

中图分类号:

TN911.3

基金项目:

江苏省基础研究计划基金(BK20220452)项目资助


Population optimization combined with robust distance metric for fair K-means clustering algorithm
Author:
Affiliation:

1.School of Artificial Intelligence (School of Future Technologies), Nanjing University of Information Science & Technology, Nanjing 210044, China; 2.School of Software, Nanjing University of Information Science & Technology, Nanjing 210044, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    随着聚类算法在智能测量系统、多源传感数据分析与嵌入式状态识别等场景中的广泛应用,如何在保证聚类质量的同时兼顾敏感属性的公平性,已成为制约聚类算法在关键测量任务中应用效果的瓶颈问题。为解决上述问题,提出了一种创新的种群优化联合鲁棒距离度量的公平性K-means聚类算法(PODM-Kmeans)。该方法在构建过程中,充分考虑到敏感属性的公平性与聚类质量之间的平衡性,引入改进的布谷鸟搜索算法以实现初始聚类中心选择过程中的全局搜索能力和局部搜索能力的平衡,有效增强了聚类效果的稳定性。在此基础上,在聚类迭代目标函数的构建上,该方法有效采用了公平性约束和簇大小约束机制,并融合了灵活的加权欧氏范数作为距离度量方法,合理抑制了异常值所带来的消极影响,助力了公平性的提升。通过在5个合成数据集和5个真实数据集上进行的大量实验结果表明,PODM-Kmeans在同类方法中具有较优的性能表现,尤其在Adult、Bank、Census1990和CreditCard 4个数据集上,在维持一定的聚类效果的同时,PODM-Kmeans的公平性比率(FR)指标均超过0.95。

    Abstract:

    With the widespread application of clustering algorithms in intelligent measurement systems, multi-source sensor data analysis, and embedded state recognition, ensuring fairness with respect to sensitive attributes while maintaining clustering quality has become a key challenge that limits their effectiveness in critical measurement tasks. To address this issue, we propose a population optimization combined with robust distance metric for fair K-means clustering method (PODM-Kmeans). The proposed method balances clustering quality and fairness by incorporating an enhanced Cuckoo Search algorithm to achieve a trade-off between global and local search capabilities during the initialization of cluster centers, thereby improving clustering stability. Furthermore, fairness constraints and cluster size constraints are effectively integrated into the iterative clustering objective function. A flexible weighted Euclidean norm is adopted as the distance metric to mitigate the negative impact of outliers, contributing to improved fairness. Extensive experiments conducted on five synthetic and five real-world datasets demonstrate the superior performance of PODM-Kmeans compared to existing methods. Notably, on the Adult, Bank, Census1990, and CreditCard datasets, PODM-Kmeans achieves a fairness ratio (FR) exceeding 0.95 while maintaining high clustering quality.

    参考文献
    相似文献
    引证文献
引用本文

谢一涵,毕鹏飞,王爱萍.种群优化联合鲁棒距离度量的公平性K-means算法[J].电子测量与仪器学报,2025,39(6):121-133

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-09-16
  • 出版日期:
文章二维码
×
《电子测量与仪器学报》
关于防范虚假编辑部邮件的郑重公告