-
Essay / Improving Data Clustering Using Gravitational Algorithm Pso
AbstractIn recent years, data clustering by metaheuristic method has become popular in the field of data mining. All these methods suffer from an optimization problem discussed in this article. The problem arises when the cluster centroids come from an individual in the population (particle in this article) and do not play the role of cluster center. We use the law of gravity to solve this problem. After each clustering of particle data, the centroids move toward the center of mass of the data into a desired cluster by the process of the law of gravity. In the law of gravity, process each data in a cluster force towards the center of gravity of the cluster and pull it towards the center of mass of the cluster. The particles are evaluated after this improvement by a selected internal Clustering Validation Index (CVI). We looked at some CVIs and found that Xu, Du and WB are the most accurate CVIs. The proposed method with respect to some clustering methods includes the Particle Swarm clustering methods and the familiar Jacard index clustering methods. The result shows that our method works more accurately. Say no to plagiarism. Get a tailor-made essay on “Why violent video games should not be banned”?Get the original essayIntroductionThe purpose of clustering is to group the same samples into one cluster and different samples into different clusters. Various methods have been proposed for data clustering. These methods are divided into different branches. Partitioning, hierarchical, density-based and network-based approaches can be used as the main clustering methods. Partitioning methods are very popular, and the most popular method is clustering is the K-mean method (Jain, 2010). The K-means method has some disadvantages, the most fundamental of which is that not every desired objective function can be used, there is a possibility of reaching local optima, and the number of clusters must be specified from the beginning. The objective function of K-Means only considers the distance within the cluster, but does not care about the distance between clusters. On the other hand, many cluster validity indices (CVIs) have been introduced that take into account both inter-cluster and inter-cluster distance. We can therefore use these CVI as the objective function of a clustering method for the first problem mentioned above. For the second problem, we can use a general optimizer that rarely gets stuck in local optima. If an optimizer can choose the best number of clusters based on the objective function, the last problem is solved. Meta-heuristic methods such as particle swarm optimization (PSO)(van der Merwe & Engelbrecht, 2003) and its variants (Cura, 2012; Valente de Oliveira, Szabo & de Castro, 2017), genetic algorithm ( GA) (Maulik & Bandyopadhyay, 2000) and its variants, bee colony optimization (ACO) (Ozturk, Hancer, & Karaboga, 2015; Yan, Zhu, Zou, & Wang, 2012) and search algorithm gravity (GSA) (Dowlatshahi and Nezamabadi-pour, 2014). proposed for these problems. All of these methods suffer from another problem that we covered in this article. The problem occurs when cluster centroids come from an individual in the population and do not act as cluster centers. For example, in Figure 1, you can see 3 clusters and 2 types of centroids (squares and circles) which are extracted from 2 different particles in PSO. If we use these particles, the result ofgrouping of two particles is exactly similar, but the finenesses of two particles are different. This means that the particles performing a particular grouping may have a different physical shape and this poses an optimization problem. This problem affects population diversity, even when exploring and exploiting the optimizer.1-2 PscThe basic form of the PSO algorithm was introduced in (Kennedy & Eberhart, 1995) and then modified in (Shi & Eberhart, 1998). In the algorithm, a swarm of S particles flies stochastically through an N-dimensional search space, where the position of each particle represents a potential solution to an optimization problem. Each particle p with the current position xand the current speed vp remembers its personal best solution so far, pb. The swarm remembers the best solution achieved so far globally, bS. Particles are attracted to the best solutions, and after a while the swarm usually converges to an optimal one. Due to its stochastic nature, the PSO can avoid certain local optima. However, for the basic form of the PSO algorithm, premature convergence to a local optimum is a common problem. Therefore, several modifications or extensions of the basic form have been introduced (Poli, Kennedy, & Blackwell, 2007), such as perturbed PSO (Xinchao, 2010), orthogonal learning PSO (Zhan, Zhang, Li, & Shi, 2011), or different local neighborhood topologies, for example the fully informed PSO (Mendes, Kennedy and Neves, 2004). In clustering, as in other PSO applications, the position of each particle should represent a potential solution to the problem. Most often, this is achieved by encoding the position of particle p as xp = {mp,1, …, mp,j, …, mp,K}, where mp,j represents the jth (potential) cluster centroid in an N-dimensional. the data space and K is the number of clusters. Each element of the particle's K-dimensional position, xp, is now an N-dimensional position in the data space. Furthermore, different particle codings have been proposed, such as partition-based coding (Jarboui, Cheikh, Siarry, & Rebai, 2007), where each particle is a vector of n integers, n is the number of data elements to be group and the ith element. represents the cluster label assigned to element i, i ∈ {1, …, n}. The main limitation of the proposed method was the need to manually define the number of clusters, K, a priori. Another clustering technique proposed in (Omran, Salman, & Engelbrecht, 2006) overcame this limitation by using binary PSO to select which of the potential particle centroids should be included in the final solution, but in this technique the K algorithm -means was used to refine the centroid positions. Particle coding used for PSO clustering was proposed in (Das, Abraham, & Konar, 2008). Given a user-defined maximum number of clusters, Kmax, the position of particle p is encoded as a vector Kmax + Kmax * N xp ={Tp,1, …, Tp,kmax, mp, 1, …, mp,j, … , mp,Kmax }, where Tp,j, j∈ {1, …, Kmax } is an activation threshold included in the range [0, 1] and mp,j represents the jth centroid (potential) of the cluster. If Tp,j> 0.5, the corresponding jth centroid is included in the solution. Otherwise, the cluster defined by the jth centroid is inactive. The minimum number of clusters is defined as two. If there are less than two active clusters in a solution, one or two randomly selected activation thresholds, Tp,j1-3 gravitational clusteringA method is presented in an article (Bahrololoum, Nezamabadi-Pour and Saryazdi, 2015) which uses Newton's universal law. gravitation for grouping. We.