Data Clustering: Algorithms and Applications by Charu C. Aggarwal

Machine Theory

By Charu C. Aggarwal

Research at the challenge of clustering has a tendency to be fragmented around the trend popularity, database, information mining, and computer studying groups. Addressing this challenge in a unified method, Data Clustering: Algorithms and Applications presents whole assurance of the whole zone of clustering, from easy easy methods to extra subtle and complicated facts clustering methods. It will pay distinct cognizance to contemporary matters in graphs, social networks, and different domains.

The booklet specializes in 3 fundamental features of knowledge clustering:

  • Methods, describing key innovations universal for clustering, comparable to function choice, agglomerative clustering, partitional clustering, density-based clustering, probabilistic clustering, grid-based clustering, spectral clustering, and nonnegative matrix factorization
  • Domains, masking equipment used for various domain names of knowledge, corresponding to express facts, textual content info, multimedia info, graph facts, organic information, circulation information, doubtful information, time sequence clustering, high-dimensional clustering, and massive facts
  • Variations and Insights, discussing vital diversifications of the clustering method, similar to semisupervised clustering, interactive clustering, multiview clustering, cluster ensembles, and cluster validation

In this publication, most sensible researchers from all over the world discover the features of clustering difficulties in various software components. in addition they clarify tips on how to glean designated perception from the clustering process—including how one can be sure the standard of the underlying clusters—through supervision, human intervention, or the automatic iteration of other clusters.

Show description

Read Online or Download Data Clustering: Algorithms and Applications PDF

Similar machine theory books

Genetic Programming: First European Workshop, EuroGP’98 Paris, France, April 14–15, 1998 Proceedings

This publication constitutes the refereed complaints of the 1st eu Workshop on Genetic Programming, EuroGP'98, held in Paris, France, in April 1998, less than the sponsorship of EvoNet, the eu community of Excellence in Evolutionary Computing. the quantity provides 12 revised complete papers and 10 brief shows rigorously chosen for inclusion within the e-book.

Operators for Similarity Search: Semantics, Techniques and Usage Scenarios

This e-book offers a entire instructional on similarity operators. The authors systematically survey the set of similarity operators, basically targeting their semantics, whereas additionally touching upon mechanisms for processing them successfully. The ebook starts through offering introductory fabric on similarity seek structures, highlighting the valuable position of similarity operators in such platforms.

Graph-based social media analysis

Inquisitive about the mathematical foundations of social media research, Graph-Based Social Media research presents a entire creation to using graph research within the learn of social and electronic media. It addresses a massive medical and technological problem, particularly the confluence of graph research and community idea with linear algebra, electronic media, computing device studying, giant facts research, and sign processing.

The Digital Dionysus: Nietzsche and the Network-Centric Condition

Patricia Ticineto Clough: 'a remarkable collaboration between severe theorists from various disciplines to discover the import of Nietzschean suggestion for modern matters in media, applied sciences and digitization. the result's The electronic Dionysus, a must-read for students in media, aesthetics, politics, and philosophy'

Extra info for Data Clustering: Algorithms and Applications

Sample text

In the context of text data, each document can therefore be approximately expressed (because of the factorization process) as a nonnegative linear combination of at most k word-cluster vectors. The specific weight of that component represents the importance of that component, which makes the decomposition highly interpretable. Note that this interpretability is highly dependent on nonnegativity. Conversely, consider the word-membership vector across the corpus. This can be expressed in terms of at most k document-cluster vectors.

In this context, Google’s MapReduce framework [28] provides an effective method for analysis of large amounts of data, especially when the nature of the computations involves linearly computable statistical functions over the elements of the data streams. One desirable aspect of this framework is that it abstracts out the precise details of where different parts of the data are stored to the application programmer. As stated in [28]: “The run-time system takes care of the details of partitioning the input data, scheduling the program’s execution across a set of machines, handling machine failures, and managing the required inter-machine communication.

This problem is inherent in high-dimensional distance functions and nearest neighbor search. As stated in [43]: “. . One of the problems of the current notion of nearest neighbor search is that it tends to give equal treatment to all features (dimensions), which are however not of equal importance. Furthermore, the importance of a given dimension may not even be independent of the query point itself” p. 506. These noise and concentration effects are therefore a problematic symptom of (locally) irrelevant, uncorrelated, or noisy attributes, which tend to impact the effectiveness and statistical significance of full-dimensional algorithms.

Download PDF sample

Rated 4.05 of 5 – based on 31 votes