Class | REXML::Parsers::BaseParser |
In: |
temp/parsers/baseparser.rb
|
Parent: | Object |
This API is experimental, and subject to change.
parser = PullParser.new( "<a>text<b att='val'/>txet</a>" ) while parser.has_next? res = parser.next puts res[1]['att'] if res.start_tag? and res[0] == 'b' end
See the PullEvent class for information on the content of the results. The data is identical to the arguments passed for the various events to the StreamListener API.
Notice that:
parser = PullParser.new( "<a>BAD DOCUMENT" ) while parser.has_next? res = parser.next raise res[1] if res.error? end
Nat Price gave me some good ideas for the API.
LETTER | = | '[[:alpha:]]' | Oniguruma / POSIX [understands unicode] | |
DIGIT | = | '[[:digit:]]' | ||
LETTER | = | 'a-zA-Z' | Ruby < 1.9 [doesn‘t understand unicode] | |
DIGIT | = | '\d' | ||
COMBININGCHAR | = | '' | ||
EXTENDER | = | '' | ||
NCNAME_STR | = | "[#{LETTER}_:][-#{LETTER}#{DIGIT}._:#{COMBININGCHAR}#{EXTENDER}]*" | ||
NAME_STR | = | "(?:(#{NCNAME_STR}):)?(#{NCNAME_STR})" | ||
UNAME_STR | = | "(?:#{NCNAME_STR}:)?#{NCNAME_STR}" | ||
NAMECHAR | = | '[\-\w\d\.:]' | ||
NAME | = | "([\\w:]#{NAMECHAR}*)" | ||
NMTOKEN | = | "(?:#{NAMECHAR})+" | ||
NMTOKENS | = | "#{NMTOKEN}(\\s+#{NMTOKEN})*" | ||
REFERENCE | = | "&(?:#{NAME};|#\\d+;|#x[0-9a-fA-F]+;)" | ||
REFERENCE_RE | = | /#{REFERENCE}/ | ||
DOCTYPE_START | = | /\A\s*<!DOCTYPE\s/um | ||
DOCTYPE_PATTERN | = | /\s*<!DOCTYPE\s+(.*?)(\[|>)/um | ||
ATTRIBUTE_PATTERN | = | /\s*(#{NAME_STR})\s*=\s*(["'])(.*?)\4/um | ||
COMMENT_START | = | /\A<!--/u | ||
COMMENT_PATTERN | = | /<!--(.*?)-->/um | ||
CDATA_START | = | /\A<!\[CDATA\[/u | ||
CDATA_END | = | /^\s*\]\s*>/um | ||
CDATA_PATTERN | = | /<!\[CDATA\[(.*?)\]\]>/um | ||
XMLDECL_START | = | /\A<\?xml\s/u; | ||
XMLDECL_PATTERN | = | /<\?xml\s+(.*?)\?>/um | ||
INSTRUCTION_START | = | /\A<\?/u | ||
INSTRUCTION_PATTERN | = | /<\?(.*?)(\s+.*?)?\?>/um | ||
TAG_MATCH | = | /^<((?>#{NAME_STR}))\s*((?>\s+#{UNAME_STR}\s*=\s*(["']).*?\5)*)\s*(\/)?>/um | ||
CLOSE_MATCH | = | /^\s*<\/(#{NAME_STR})\s*>/um | ||
VERSION | = | /\bversion\s*=\s*["'](.*?)['"]/um | ||
ENCODING | = | /\bencoding\s*=\s*["'](.*?)['"]/um | ||
STANDALONE | = | /\bstandalone\s*=\s["'](.*?)['"]/um | ||
ENTITY_START | = | /^\s*<!ENTITY/ | ||
IDENTITY | = | /^([!\*\w\-]+)(\s+#{NCNAME_STR})?(\s+["'](.*?)['"])?(\s+['"](.*?)["'])?/u | ||
ELEMENTDECL_START | = | /^\s*<!ELEMENT/um | ||
ELEMENTDECL_PATTERN | = | /^\s*(<!ELEMENT.*?)>/um | ||
SYSTEMENTITY | = | /^\s*(%.*?;)\s*$/um | ||
ENUMERATION | = | "\\(\\s*#{NMTOKEN}(?:\\s*\\|\\s*#{NMTOKEN})*\\s*\\)" | ||
NOTATIONTYPE | = | "NOTATION\\s+\\(\\s*#{NAME}(?:\\s*\\|\\s*#{NAME})*\\s*\\)" | ||
ENUMERATEDTYPE | = | "(?:(?:#{NOTATIONTYPE})|(?:#{ENUMERATION}))" | ||
ATTTYPE | = | "(CDATA|ID|IDREF|IDREFS|ENTITY|ENTITIES|NMTOKEN|NMTOKENS|#{ENUMERATEDTYPE})" | ||
ATTVALUE | = | "(?:\"((?:[^<&\"]|#{REFERENCE})*)\")|(?:'((?:[^<&']|#{REFERENCE})*)')" | ||
DEFAULTDECL | = | "(#REQUIRED|#IMPLIED|(?:(#FIXED\\s+)?#{ATTVALUE}))" | ||
ATTDEF | = | "\\s+#{NAME}\\s+#{ATTTYPE}\\s+#{DEFAULTDECL}" | ||
ATTDEF_RE | = | /#{ATTDEF}/ | ||
ATTLISTDECL_START | = | /^\s*<!ATTLIST/um | ||
ATTLISTDECL_PATTERN | = | /^\s*<!ATTLIST\s+#{NAME}(?:#{ATTDEF})*\s*>/um | ||
NOTATIONDECL_START | = | /^\s*<!NOTATION/um | ||
PUBLIC | = | /^\s*<!NOTATION\s+(\w[\-\w]*)\s+(PUBLIC)\s+(["'])(.*?)\3(?:\s+(["'])(.*?)\5)?\s*>/um | ||
SYSTEM | = | /^\s*<!NOTATION\s+(\w[\-\w]*)\s+(SYSTEM)\s+(["'])(.*?)\3\s*>/um | ||
TEXT_PATTERN | = | /\A([^<]*)/um | ||
PUBIDCHAR | = | "\x20\x0D\x0Aa-zA-Z0-9\\-()+,./:=?;!*@$_%#" | Entity constants | |
SYSTEMLITERAL | = | %Q{((?:"[^"]*")|(?:'[^']*'))} | ||
PUBIDLITERAL | = | %Q{("[#{PUBIDCHAR}']*"|'[#{PUBIDCHAR}]*')} | ||
EXTERNALID | = | "(?:(?:(SYSTEM)\\s+#{SYSTEMLITERAL})|(?:(PUBLIC)\\s+#{PUBIDLITERAL}\\s+#{SYSTEMLITERAL}))" | ||
NDATADECL | = | "\\s+NDATA\\s+#{NAME}" | ||
PEREFERENCE | = | "%#{NAME};" | ||
ENTITYVALUE | = | %Q{((?:"(?:[^%&"]|#{PEREFERENCE}|#{REFERENCE})*")|(?:'([^%&']|#{PEREFERENCE}|#{REFERENCE})*'))} | ||
PEDEF | = | "(?:#{ENTITYVALUE}|#{EXTERNALID})" | ||
ENTITYDEF | = | "(?:#{ENTITYVALUE}|(?:#{EXTERNALID}(#{NDATADECL})?))" | ||
PEDECL | = | "<!ENTITY\\s+(%)\\s+#{NAME}\\s+#{PEDEF}\\s*>" | ||
GEDECL | = | "<!ENTITY\\s+#{NAME}\\s+#{ENTITYDEF}\\s*>" | ||
ENTITYDECL | = | /\s*(?:#{GEDECL})|(?:#{PEDECL})/um | ||
EREFERENCE | = | /&(?!#{NAME};)/ | ||
DEFAULT_ENTITIES | = | { 'gt' => [/>/, '>', '>', />/], 'lt' => [/</, '<', '<', /</], 'quot' => [/"/, '"', '"', /"/], "apos" => [/'/, "'", "'", /'/] | ||
MISSING_ATTRIBUTE_QUOTES | = | /^<#{NAME_STR}\s+#{NAME_STR}\s*=\s*[^"']/um | These are patterns to identify common markup errors, to make the error messages more informative. |
source | [R] |
Peek at the depth event in the stack. The first element on the stack is at depth 0. If depth is -1, will parse to the end of the input stream and return the last event, which is always :end_document. Be aware that this causes the stream to be parsed up to the depth event, so you can effectively pre-parse the entire document (pull the entire thing into memory) using this method.