doxml Manual Version 0.5: doxml_parse()

$Id: parse.html,v 1.11 1999/07/24 22:04:05 francis Exp $

doxml_document* doxml_parse(doxml_context* context);

doxml_parse() is the API function to invoke the parser. It reads an XML document from the given context and constructs an document to represent it. If it encounters an error, it returns NULL.

It is the caller's responsibility to delete the returned document with doxml_delete_document().

Note that, since the parser does not work directly with Unicode characters, it is blind to some distinctions drawn by the XML spec. For example, the spec defines productions "Letter" and "Digit", used in name tokens. This sounds reasonable, except that the spec defines these productions by listing the ranges of character codes in all the various character sets. (Voice of reason time: what happens if a new character set is defined? Are its characters not letters?) To enforce these rules, the parser would have to see the Unicode characters. Instead, for the time being at least, doxml will simply accept all non-ASCII characters as being valid Letters and Digits. This may cause trouble if someone creates a document that doxml accepts and stricter parsers reject.