Introduction to linguistic annotation and text analytics

Locate

My Reading Lists:

Create a new list

Check-In

×Close
Add an optional check-in date. Check-in dates are used to track yearly reading goals.
Today


Buy this book

Last edited by ImportBot
February 25, 2022 | History

Introduction to linguistic annotation and text analytics

Linguistic annotation and text analytics are active areas of research and development, with academic conferences and industry events such as the Linguistic Annotation Workshops and the annual Text Analytics Summits. This book provides a basic introduction to both fields, and aims to show that good linguistic annotations are the essential foundation for good text analytics. After briefly reviewing the basics of XML,with practical exercises illustrating in-line and stand-off annotations, a chapter is devoted to explaining the different levels of linguistic annotations. The reader is encouraged to create example annotations using the WordFreak linguistic annotation tool. The next chapter shows how annotations can be created automatically using statistical NLP tools, and compares two sets of tools, the OpenNLP and Stanford NLP tools. The second half of the book describes different annotation formats and gives practical examples of how to interchange annotations between different formats using XSLT transformations. The two main text analytics architectures,GATE and UIMA, are then described and compared, with practical exercises showing how to configure and customize them. The final chapter is an introduction to text analytics, describing the main applications and functions including named entity recognition, coreference resolution and information extraction, with practical examples using both open source and commercial tools. Copies of the example files, scripts, and stylesheets used in the book are available from the companion website, located at http://sites.morganclaypool.com/wilcock.

Publish Date
Language
English

Buy this book

Previews available in: English

Edition Availability
Cover of: Introduction to Linguistic Annotation and Text Analytics
Introduction to Linguistic Annotation and Text Analytics
2009, Springer International Publishing AG
in English
Cover of: Introduction to linguistic annotation and text analytics
Introduction to linguistic annotation and text analytics
2009, Morgan & Claypool Publishers
electronic resource / in English

Add another edition?

Book Details


Table of Contents

Working with XML
Introduction
XML basics
XML parsing and validation
XML transformations
In-line annotations
Stand-off annotations
Annotation standards
Further reading
Linguistic annotation
Levels of linguistic annotation
WordFreak annotation tool
Sentence boundaries
Tokenization
Part-of-speech tagging
Syntactic parsing
Semantics and discourse
WordFreak with OpenNLP
Further reading
Using statistical NLP tools
Statistical models
OpenNLP and Stanford NLP tools
Sentences and tokenization
Statistical tagging
Chunking and parsing
Named entity recognition
Coreference resolution
Further reading
Annotation interchange
XSLT transformations
WordFreak-OpenNLP transformation
Gate XML format
Gate-WordFreak transformation
XML metadata interchange: XMI
WordFreak-XMI transformation
Towards interoperability
Further reading
Annotation architectures
Gate
Gate information extraction tools
Annotations with JAPE rules
Customizing GATE gazetteers
UIMA
UIMA wrappers for OpenNLP tools
Annotations with regular expressions
Customizing UIMA dictionaries
Further reading
Text analytics
Text analytics tools
Named entity recognition
Training statistical models
Coreference resolution
Information extraction
Text mining and searching
New directions
Further reading
Bibliography.

Edition Notes

Part of: Synthesis digital library of engineering and computer science.

Title from PDF t.p. (viewed on June 4, 2009).

Series from website.

Includes bibliographical references (p. 147-149).

Abstract freely available; full-text restricted to subscribers or individual document purchasers.

Also available in print.

Mode of access: World Wide Web.

System requirements: Adobe Acrobat reader.

Published in
San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA)
Series
Synthesis lectures on human language technologies -- # 3
Other Titles
Synthesis digital library of engineering and computer science.

Classifications

Dewey Decimal Class
410.285
Library of Congress
P98.3 .W555 2009

The Physical Object

Format
[electronic resource] /

ID Numbers

Open Library
OL25546289M
Internet Archive
introductiontoli00wilc_227
ISBN 13
9781598297393, 9781598297386

Community Reviews (0)

Feedback?
No community reviews have been submitted for this work.

Lists

This work does not appear on any lists.

History

Download catalog record: RDF / JSON
February 25, 2022 Edited by ImportBot import existing book
July 1, 2019 Edited by MARC Bot import existing book
July 29, 2014 Created by ImportBot import new book