Corpus Encoding Standard - Document CES
1. Title page. Version 1.5. Last modified 20 March 2000.
Corpus
Encoding
Standard
Related Links
Abstract
This document is the first version of the Corpus Encoding Standard (CES),
which are a part of the EAGLES
Guidelinesdeveloped by the Expert
Advisory Group on Language Engineering Standards (EAGLES). The CES
is designed to be optimally suited for use in language engineering research
and applications, in order to serve as a widely accepted set of encoding
standards for corpus-based work in natural language processing applications.
The CES is an application of SGML (ISO 8879:1986, Information Processing--Text
and Office Systems--Standard Generalized Markup Language) compliant with
the specifications of the TEI
Guidelines for Electronic Text Encoding and Interchange of the
Text
Encoding Initiative.
The CES specifies a minimal encoding level that corpora must achieve
to be considered standardized in terms of descriptive representation (marking
of structural and typographic information) as well as general architecture
(so as to be maximally suited for use in a text database). It also provides
encoding specifications for linguistic annotation, together with a data
architecture for linguistic corpora.
The CES is being developed in a bottom up fashion, starting with
minimal specifications and expanding based upon feedback resulting from
its use, and the input of the research community in general. We invite
and encourage all comments and discussion of any aspect of the CES.
Contents
Acknowledgements
This document results from work supported by the U.S. National Science
Foundation under Grant No. IRI-9413451. The work was also supported by
the European Commission, under the projects
EAGLES,
MULTEXT,
and MULTEXT-EAST.
Contacts
-
Nancy Ide
-
Department of Computer Science
Vassar College
Poughkeepsie, New York 12601 USA
tel : (+1) 914 437 5988
fax : (+1) 914 437 7498
e-mail : ide@cs.vassar.edu
-
Greg Priest-Dorman
-
Department of Computer Science
Vassar College
Poughkeepsie, New York 12601 USA
tel : (+1) 914 437 5990
fax : (+1) 914 437 7498
e-mail : priestdo@cs.vassar.edu
Please report suggestions or problems to priestdo@cs.vassar.edu.
This document is also available as a Tar
file (approx. 200k, tar.gz format)
| Top | Next
| EAGLES |