Skip to content

Latest commit

 

History

History
45 lines (31 loc) · 1.16 KB

README.md

File metadata and controls

45 lines (31 loc) · 1.16 KB

GumboParser - a Ruby binding to the Gumbo HTML5 parser.

This binding is deprecated, and will be removed in September 2013. Use nokogumbo instead.

Usage:

require 'gumbo-parser'
doc = GumboParser.parse(string)

Notes:

  • The parse function takes a string and passes it to the gumbo_parse_with_options method, using the default options. The resulting Gumbo parse tree is the walked, producing a Nokogiri parse tree. The original Gumbo parse tree is then destroyed, and the Nokogiri parse tree is returned.

  • Instead of uppercase element names, lowercase element names are produced.

  • Instead of returning 'unknown' as the element name for unknown tags, the original tag name is returned verbatim.

  • Nothing meaningful is done with the GumboDocument struct, i.e., no Nokogiri EntityDecl is produced.

Installation:

  • Build and install the gumbo-parser C library

  • Change directory into the ruby subdirectory

  • Execute rake

Related efforts:

  • ruby-gumbo - a ruby binding for the Gumbo HTML5 parser.