|
|
|
SCWD home | SCJP home | XML books
XML Fundamentals
-
XML markup describes and provides structure to the content of an XML document or data packet.
-
Unlike HTML, XML is case-sensitive including element-tags and attribute values.
-
XML uses most of the characters defined in the 16-bit unicode character set.
-
2 unicode formats are the basis of XML characters - UTF-8 and UTF-16.
- 3 control characters are:
XML Control characters
| Horizontal Tab(HT) |
09 |
| Line Feed (LF) |
0A |
| Carriage-Return (CR) |
0D |
-
5 special markup characters are: < > & " ' . These characters have alternate representations in the form of entity references.
-
When representing a legal XML name, the first character must be either a unicode character, an underscore or a colon. The other characters may be one of these - unicode character, unicode number, underscore, colon, hyphen or a period.
-
Colon char should not be used except as a namespace delimiter
-
Colon char should not be used except as a namespace delimiter
-
Elements are the basic building blocks of XML markup. Tags consist of element type names.
-
Everything between the start-tag and the end-tag of an element is contained within that element.
- Examples
Examples of legal XML names
| < ElementName> |
Not Allowed |
|
Allowed |
| < /Name> |
Not Allowed |
| Name> |
Not Allowed |
|
Allowed |
|
Not Allowed |
|
Allowed |
|
Not Allowed |
-
Empty element tags may have associated attributes
-
XML documents have three parts -prolog (optional), body (required) and epilog(optional)
-
Document root/ Document entity is the root element of the XML document (which is not visible), this has a subtree(body), the root element of that subtree is called Document element/Root element.
-
Prolog may contain - XML declaration, comments, PIs, DOCTYPE declaration
-
Epilog may contain - PIs or comments.
-
XML data is in the form of a simple hierarchical tree.
-
All elements must be properly nested, no overlapping of tags is allowed.
-
String literals are used for the values of attributes, internal entities and external identifiers.
-
All string literals are enclosed by apos (') or quot (")
-
Attributes are comprised of name-value pairs.
- Permissible values for attributes are -- Text characters, Entity references, character references. Forbidden characters in attribute values: < and >. Use the entity references instead. Only one instance of attribute name is allowed within a given tag.
-
All whitespace characters in the content are preserved and whitespace within element tags and attribute values may be removed.
-
3 combinations of chars for end-of-line are: CR-LF, CR only, LF only. All these strings are converted to a single LF character.
-
Except for the 5 built-in entity references, all entities must be defined prior to their use.
-
Comments in XML must follow these rules -
- Cannot have double hyphen within the string
- Cannot be nested
- Cannot be put in the start or end tag
- Extra hyphen at the end is illegal
-
CDATA Section in XML must follow these rules -
- Cannot be empty
- Cannot be nested
- Text in the CDATA section cannnot contain "]]>"
-
CDATA Section in XML must follow these rules -
- Order of attributes: version, encoding, standalone is fixed.
- Version attribute is required, encoding and standalone are optional.
- Default value for standalone is "no"
- If encoding is other than UTF-8 or UTF-16, it must be specified.
- Encoding values are not case-sensitive
-
Special meaning attributes - xml:lang and xml:space( can have values preserve or default)
-
XML document has logical and physical structure. Physical - document has storage units: entities. Logical - document is composed of declarations, elements, comments, char references and PIs
-
Document Type Declaration contains or points to markup declaration that provides a grammar for a class of documents. This grammar is known as Document Type Definition.
-
DOCTYPE declaration must appear before the first element in the document
-
No attribute name may appear more than once in the same start tag or empty element tag.
-
Attribute values cannot contain direct or indirect entity references to external entities.
-
Markup takes the form of start-tags, end-tags, empty-element-tags, entity references, character references, comments, CDATA sections, DOCTYPE declaration, PIs. All text that is not markup is the character data of the document
-
Each XML document has one entity -document entity that serves as the starting point for the XML processor.
-
< and & characters may only appear as such in comments, PIs or CDATA Sections, otherwise these are substituted by respective entity references.
-
ID, IDREF, IDREFS, ENTITY names - all must be legal XML names
NMTOKEN, NMTOKENS, enumerated values should be legal NmTokens
SCWD home | SCJP home | XML books
|