ЗАСТОСУВАННЯ БІКЛАСТЕРНОГО АНАЛІЗУ  ДЛЯ ФОРМУВАННЯ ПІДМНОЖИН КОЕРЕНТНИХ ДАНИХ

Sergii Babichev; Tetiana Goncharenko

doi:10.14308/ite000780

Authors

Sergii Babichev
Tetiana Goncharenko Kherson State University

DOI:

https://doi.org/10.14308/ite000780

Keywords:

Biclustering analysis, artificial biclusters, quality criteria for biclustering, mutual information, mean squared error

Abstract

This paper introduces a new approach to data analysis using biclustering, which significantly differs from traditional clustering methods. Available scientific works were analyzed, which characterized the method of bicluster analysis and the features of its application. The authors focus on identifying coherent subsets within complex data, extending beyond typical data such as gene expression. They emphasize exploring how biclustering analysis can uncover hidden connections in data, often overlooked by conventional methods. Quality criteria for biclustering of gene expression data were formed and the effectiveness of internal criteria was evaluated. The quality of biclusters is thoroughly examined using mean squared error (MSE) and mutual information, ensuring the reliability and objectivity of the results. A distinctive feature of biclustering analysis is its ability to identify biclusters of various sizes and shapes, crucial for understanding complex and heterogeneous data. This approach not only highlights local patterns in data subsets but also reveals more intricate interrelations. The article also stresses the importance of optimizing hyperparameters and using quality criteria to achieve the most accurate results. The research aims not only to identify coherent data subsets but also to gain a deeper understanding of structural features and interconnections revealed by biclustering analysis. This work opens new prospects for analyzing complex data, offering a deeper insight into their structure and dynamics. Particularly valuable is the method’s ability to detect overlapping biclusters, aiding in uncovering more complex and profound dependencies in the data.