- tag minimization: SGML provides many means for minimizing the amount of 
markup in a text via mechanisms such as start and end tag omission, short start and 
end-tag, minimization of attribute values, etc. For example, the following 
definitions allow end tag omission:
<!ELEMENT w     - O  (orth, pos,lem) >
 <!ELEMENT orth  - O (#PCDATA) >
 <!ELEMENT pos   - O (#PCDATA) >
 <!ELEMENT lem   - O (#PCDATA) >
 
 
  
The following is a full markup for the sentence fragment "The boat sinks...":
 
 
<s>
 <w><orth>The</orth><pos>DET</pos><lem>the</le
m></w>
 <w><orth>boat</orth><pos>NNS</pos><lem>boat</
lem></w>
 <w><orth>sinks</orth><pos>VBZ</pos><lem>sink<
/lem></w>
 ...
 </s>
 
With end tag omission this could be replaced by 
 
<s>
 <w><orth>The<pos>DET<lem>the
 <w><orth>boat<pos>NNS<lem>boat
 <w><orth>sinks<pos>VBZ<lem>sink
 ...
 </s>
 
 which in this case is a nearly 50% reduction in the number of characters.
 
- SGML entities:SGML allows string substitution via entity replacement. 
Entity references can be used in place of any string, possibly including markup. So, 
for example, a complex feature structure specification which occurs frequently  in 
the text can be replaced by an entity reference consisting of only a few characters. 
The TEI feature structure
  <fs type='word structure' id=vbidprx0sgp3>
     <f name=category><sym value=verb></f>
 <f name=mood><sym value=indic></f>
 <f name=tense><sym value=pres></f>
 <f name=auxiliary><minus></f>
 <f name=agreement>
 
        <fs type='agreement structure' id=sgp3>
 
            <f name=number><sym value=sg></f>
 <f name=person><sym value=3></f>
 
 </fs>
 </f>
 </fs>
 
 
could be replaced by the entity reference &VBZ;. Analogous substitutions 
for other word categories could yield the following encoding: 
 
 
<s>
 <w><orth>the&DET;<lem>the
 <w><orth>boat&NNS;<lem>boat
 <w><orth>sinks&VBZ;<lem>sink
 ...
 </s>
 
 
- DATATAG feature: When certain tag sequences occur with regularity, it is 
possible to define a certain character to be interpreted as the end tag of an 
element. For example,  the following declarations specify that the character "|" can 
be interpreted as the end tag for <orth> and <pos>:
 
<!ELEMENT w     - O  ([orth,"|"], [pos,"|"], lem)  >
 <!ELEMENT orth  O O (#PCDATA)                      >
 <!ELEMENT pos   O O (#PCDATA)                      >
 <!ELEMENT lem   O O (#PCDATA)                      >
 
 <orth>, <pos>, and <lem> are also defined so 
as to allow omission of both the start and end tags. This yields the following 
possible encoding:
  
<s>
 <w>the|DET|the
 <w>wash|NNS|wash
 <w>sinks|VBZ|sink
 ...
 </s>
 
If we also specify that the carriage return implies the end-tag of element 
<w>, the encoding could be reduced even further to 
 
 
<s>
 the|DET|the
 wash|NNS|wash
 sinks|VBZ|sink
 ...
 </s>
 
 
- non-SGML notations: It is also possible to use private, less verbose non-
SGML schemes within tags or as attribute values.  For example, the encoder could 
decide to use a private notation within the <s> element in the example 
above--if that notation uses the pipe sign as a separator between word, part of 
speech, and lemma, the encoding would be exactly as given above. However, the DTD 
would simply specify 
<!ELEMENT s     - - (#PCDATA)                      >
 
which means that the SGML parser will not process the content of the 
<s> tag in any way.  The content would have to be processed by other 
software. This is in contrast to the use of DATATAG above, where the SGML parser 
(assuming the optional feature DATATAG is implemented) will understand and process 
the content of the <s> tag as consisting of three elements.