test
Search publications, data, projects and authors

Text

English

ID: <

http://hdl.handle.net/2142/98245

>

Where these data come from
Statistical algorithms using multisets and statistical inference of heterogeneous networks

Abstract

Computational statistics, including methods such as Markov chain Monte Carlo (MCMC), bootstrap, approximate Bayesian computation, is an important part in modern statistics and has been widely used in many areas, such as Bayesian statistics, computational biology, and computational physics. In this thesis, we study three problems: improvement of the efficiency for the EM algorithm and the MCMC method, and statistical analysis for heterogeneous networks. The expectation-maximization (EM) algorithm is widely used in computing the maximum likelihood estimates when the observations can be viewed as incomplete data. However, the convergence rate of the EM algorithm can be slow especially when a large portion of the data is missing. In Chapter 2, we propose the multiset EM algorithm that can help the convergence of the EM algorithm. The key idea is to augment the system with a multiset of the missing component, and construct an appropriate joint distribution of the augmented complete data. We demonstrate that the multiset EM algorithm can outperform the EM algorithm, especially when EM has difficulties in convergence and the E-step involves Monte Carlo approximation. The multiset sampler proposed by Leman et al. (2009) has been shown to be an effective algorithm to sample from complex multimodal distributions, but the multiset sampler requires that the parameters in the target distribution can be divided into two parts: the parameters of interest and the nuisance parameters. In Chapter 3, we propose a new self-multiset sampler (SMSS) which extends the multiset sampler to distributions without nuisance parameters. We also generalize our method to distributions with unbounded or infinite support. Numerical results show that the SMSS and its generalization have a substantial advantage in sampling multimodal distributions compared to the ordinary Markov chain Monte Carlo algorithm and some popular variants. Heterogeneous networks are useful for modeling complex systems, which consist of different types of objects. However, there are limited statistical models to deal with heterogeneous networks. In Chapter 4, we propose a statistical model for community detection in heterogeneous networks. To allow heterogeneity in the data and the content dependent property of the pairwise relationship, we formulate the heterogeneous version of the mixed membership stochastic blockmodel. We also apply a variational algorithm for posterior inference. We demonstrate the advantage of the proposed method, in modeling overlapping communities and multiple memberships, through simulation studies and applications to the DBLP data.

Your Feedback

Please give us your feedback and help us make GoTriple better.
Fill in our satisfaction questionnaire and tell us what you like about GoTriple!