doxml Manual Version 0.5: doxml_atom

$Id: atom.html,v 1.12 1999/07/24 22:04:04 francis Exp $

An atom is represented in the API by the datatype doxml_atom. Its definition is as follows:

typedef struct {
  const char* name;
  const char* space;
  const char* prefix;
} doxml_name;

typedef enum {
  doxml_atom_text,
  doxml_atom_element,
  doxml_atom_pi,
  doxml_atom_external_entity
} doxml_atom_type;

struct doxml_atom_tag {

  /* The atom represents either a sequence of text or an XML
   *  element or processing instruction.
   */

  doxml_atom_type type;

  union {
    struct {
      doxml_name name;
      struct {
        doxml_attribute* first;
        doxml_attribute* last;
      } attrs;

      struct {
        doxml_atom* first;
        doxml_atom* last;
      } atoms;
    } element;
    
    struct {
      const char* text;
    } text;
    
    struct {
      doxml_name name;
      const char* value;
    } pi;

    struct {
      const doxml_markupdecl* decl;
    } external_entity;
  } data;

  /* parent points to the atom's enclosing element.  If parent is
   *  NULL, then the atom is the document's root element.
   */
  doxml_atom* parent;
  doxml_atom* next;
  doxml_atom* prev;
};

So, if a points to a doxml_atom, then we can tell what kind of atom a represents (and, hence, which field of the union is valid) by examining a->type.

In addition, the atom has pointers to other atoms to provide the tree structure of the document: parent points to the atom that encloses this atom (if any); next and prev point to the atoms' next and previous siblings.

doxml_atoms are constructed by the function doxml_parse(), which parses an XML document and returns a doxml_document, which in turn contains atoms. doxml_atoms are destroyed by the function doxml_delete_atom().

Note: it is not guaranteed that the parser will not generate two atoms in a row that both represent runs of text. If you want such a guarantee, speak up. :-)

The doxml_name struct encapsulates a name with an optional namespace and namespace prefix. (The prefix is retained because namespaces in the DTD are matched only by prefix, since there's no opportunity to get a namespace definition that early.)