This article was removed form the HTML Wiki because we do not wish to provide similar content as that found here on Wikipedia. It is made available here for the possibility that it might be useful in the corresponding article.
A markup language combines text and extra information about the text. The extra information, for example about the text's structure or presentation, is expressed using markup, which is intermingled with the primary text. The best-known markup language is in modern use is HTML (Hypertext Markup Language), one of the foundations of the World Wide Web. Historically, markup was (and is) used in the publishing industry in the communication of printed work between authors, editors, and printers.
Classes of markup languages
editMarkup languages are often divided into three classes: presentational, procedural, and descriptive.
Presentational markup describes the visual appearance of the whole text of a particular fragment. For example, in a Word processor file, the title of a document might have associated markup asserting that the text is centered, in bold-face, and a larger typeface. Virtually all word-processing and Desktop Publishing products support presentational markup; in normal operation it is hidden from the user to produce the "WYSIWYG" effect.
Procedural markup is typically also concerned with the presentation of text, but is usually visible to the user editing the text file, and is expected to be interpreted by software in the order in which it appears. To format a title, a succession of formatting directives would be inserted into the file immediately before the title's text, instructing software to switch into centered display mode, then enlarge and embolden the typeface. The title text would be followed by directives to reverse these effects. In most cases, the procedural markup capabilities comprise a Turing-Complete programming language. Examples of procedural-markup systems include nroff, troff, TeX, and PostScript. Procedural markup has been widely used in professional publishing applications.
Descriptive Markup applies labels to fragments of text without necessarily mandating any particular display or other processing semantics. For example, the Atom syndication language provides markup to label the "updated" time-stamp which is an assertion from the publisher as to when some item of information was last changed. While the Atom specification discusses the meaning of the "updated" timestamp, and the markup used to identify it, in great detail, it makes no assertions about whether or how it might be presented to a user. Software might put this markup to a variety of uses, including many not foreseen by the designers of the Atom language. SGML and XML are systems explicitly designed to support the design of descriptive markup languages; examples of such languages include Atom, MathML, and XBRL.
The dividing line between classes of markup is often blurred. For example, HTML contains markup elements which are purely presentational (for example <b> for bold) and others which are purely descriptive (the "href=" attribute).
The main virtue of descriptive markup considered to be its flexibility; if the fragments of text are labeled as to "what they are" as opposed to "how they should be displayed", software may be written to produce to process these fragments in useful ways not anticipated by the designers of the languages. For example, HTML's hyperlinks, originally designed for activation by a human following a link, are also widely used by Web search engines both in discovering new material to index and in estimating the popularity of Web resources.
Presentational-markup systems usually include "named styles" or equivalent, which to some degree replicate the effect of descriptive markup. Similarly, procedural-markup languages usually include "macros", to a similar end.
Features
editA common feature of many markup languages is that they intermix the text of a document with markup instructions in the same data stream or file. Here, for example, is a small section of text marked up in HTML:
<h1> Anatidae </h1> <p> The family <i>Anatidae</i> includes ducks, geese, and swans, but <em>not</em> the closely-related screamers. </p>
The codes enclosed in angle-brackets <like this> are markup instructions (known as tags), while the text between these instructions is the actual text of the document. The codes "h1", "p", and "em" are examples of structural markup, in that they describe the intended purpose or meaning of the text they include. Specifically, "h1" means "this is a first-level heading", "p" means "this is a paragraph", and "em" means "this is an emphasized word". A device reading such structural markup may apply its own rules or styles for presenting it, using larger type, boldface, indentation, or whatever style it prefers. The "i" instruction is an example of presentational markup. It specifies the exact appearance of the text (in this case, the use of an italic typeface) without specifying the reason for that appearance.
For the humanities, the Text Encoding Initiative (TEI) has published some guidelines about how to encode texts.
References
edit- TEI guidelines
- Markup systems and the future of scholarly text processing by James H. Coombs, Allen H. Renear, and Steven J. DeRose. Originally published in the November 1987 CACM, this article introduced many of the concepts now used in discussing markup languages.