An edition of Using Second-Order Information in Training Deep Neural Networks (2022)

Using Second-Order Information in Training Deep Neural Networks

by Yi Ren

Using Second-Order Information in Training De ...

Yi Ren, Yi Ren

Locate

My Reading Lists:

Use this Work

Create a new list

Check nearby libraries

WorldCat

Buy this book

Last edited by MARC Bot

December 9, 2022 | History

Edit

An edition of Using Second-Order Information in Training Deep Neural Networks (2022)

Using Second-Order Information in Training Deep Neural Networks

In this dissertation, we are concerned with the advancement of optimization algorithms for training deep learning models, and in particular about practical second-order methods that take into account the structure of deep neural networks (DNNs). Although first-order methods such as stochastic gradient descent have long been the predominant optimization algorithm used in deep learning, second-order methods are of interest because of their ability to use curvature information to accelerate the optimization process. After the presentation of some background information in Chapter 1, Chapters 2 and 3 focus on the development of practical quasi-Newton methods for training DNNs. We analyze the Kronecker-factored structure of the Hessian matrix of multi-layer perceptrons and convolutional neural networks and consequently propose block-diagonal Kronecker-factored quasi-Newton methods named K-BFGS and K-BFGS(L). To handle the non-convexity nature of DNNs, we also establish new double damping techniques for our proposed methods.

Our K-BFGS and K-BFGS(L) methods have memory requirements comparable to first-order methods and experience only mild overhead in terms of per-iteration time complexity. In Chapter 4, we develop a new approximate natural gradient method named Tensor Normal Training (TNT), in which the Fisher matrix is viewed as the covariance matrix of a tensor normal distribution (a generalized form of the normal distribution). The tractable Kronecker-factored approximation to the Fisher information matrix that results from this approximation enables TNT to enjoy memory requirements and per-iteration computational costs that are only slightly higher than those for first-order methods. Notably, unlike KFAC and K-BFGS/K-BFGS(L), TNT only requires the knowledge of the shape of the trainable parameters of a model and does not depend on the specific model architecture. In Chapter 5, we consider the subsampled versions of Gauss-Newton and natural gradient methods applied to DNNs. Because of the low-rank nature of the subsampled matrices, we make use of the Sherman-Morrison-Woodbury formula along with backpropagation to efficiently compute their inverse.

We also show that, under rather mild conditions, the algorithm converges to a stationary point if Levenberg-Marquardt damping is used. The results of a substantial number of numerical experiments are reported in Chapters 2, 3, 4 and 5, in which we compare the performance of our methods to state-of-the-art methods used to train DNNs, that demonstrate the efficiency and effectiveness of our proposed new second-order methods.

Publish Date

2022

Publisher

[publisher not identified]

Language

English

Check nearby libraries

WorldCat

Buy this book

Edition	Availability
1 Using Second-Order Information in Training Deep Neural Networks 2022, [publisher not identified] in English	aaaa Locate

Add another edition?

Book Details

Edition Notes

Department: Industrial Engineering and Operations Research.

Thesis advisor: Donald Goldfarb.

Thesis (Ph.D.)--Columbia University, 2022.

Published in: [New York, N.Y.?]

The Physical Object

Pagination: 1 online resource.

ID Numbers

Open Library: OL43504172M
OCLC/WorldCat: 1333964740

Source records

marc_columbia MARC record

Community Reviews (0)

Feedback?

No community reviews have been submitted for this work.

Lists

This work does not appear on any lists.

History

Created December 9, 2022
1 revision

Download catalog record: RDF / JSON

December 9, 2022

Created by MARC Bot

import new book

Using Second-Order Information in Training Deep Neural Networks

by Yi Ren