Statistical Perspectives on Modern Network Embedding Methods

Statistical Perspectives on Modern Network Em ...
Andrew Davison, Andrew Davison
Locate

My Reading Lists:

Create a new list

Check-In

×Close
Add an optional check-in date. Check-in dates are used to track yearly reading goals.
Today


Buy this book

Last edited by MARC Bot
December 9, 2022 | History

Statistical Perspectives on Modern Network Embedding Methods

Network data are ubiquitous in modern machine learning, with tasks of interest including node classification, node clustering and link prediction being performed on diverse data sets, including protein-protein interaction networks, social networks and citation networks. A frequent approach to approaching these tasks begins by learning an Euclidean embedding of the network, to which machine learning algorithms developed for vector-valued data are applied. For large networks, embeddings are learned using stochastic gradient methods where the sub-sampling scheme can be freely chosen. This distinguishes it from the setting of traditional i.i.d data where there is essentially only one way of subsampling the data - selecting the data points uniformly and without replacement. Despite the strong empirical performance when using embeddings produced in such a manner, they are not well understood theoretically, particularly with regards to the role of the sampling scheme. Here, we develop a unifying framework which encapsulates representation learning methods for networks which are trained via performing gradient updates obtained by subsampling the network, including random-walk based approaches such as node2vec.

In particular, we prove, under the assumption that the network has an exchangeable law, that the distribution of the learned embedding vectors asymptotically decouples. We characterize the asymptotic distribution of the learned embedding vectors, and give the corresponding rates of convergence, which depend on factors such as the sampling scheme, the choice of loss function, and the choice of embedding dimension. This provides a theoretical foundation to understand what the embedding vectors represent and how well these methods perform on downstream tasks; in particular, we apply our results to argue that the embedding vectors produced by node2vec can be used to perform weakly consistent community detection.

Publish Date
Language
English

Buy this book

Book Details


Edition Notes

Department: Statistics.

Thesis advisor: Tian Zheng.

Thesis (Ph.D.)--Columbia University, 2022.

Published in
[New York, N.Y.?]

The Physical Object

Pagination
1 online resource.

Edition Identifiers

Open Library
OL43504186M
OCLC/WorldCat
1333964231

Work Identifiers

Work ID
OL31806535W

Source records

marc_columbia MARC record

Community Reviews (0)

No community reviews have been submitted for this work.

Lists

This work does not appear on any lists.

History

Download catalog record: RDF / JSON
December 9, 2022 Created by MARC Bot import new book