Class HTMLPurifier_Lexer

Forgivingly lexes HTML (SGML-style) markup into tokens.

A lexer parses a string of SGML-style markup and converts them into corresponding tokens. It doesn't check for well-formedness, although its internal mechanism may make this automatic (such as the case of HTMLPurifier_Lexer_DOMLex). There are several implementations to choose from.

A lexer is HTML-oriented: it might work with XML, but it's not recommended, as we adhere to a subset of the specification for optimization reasons. This might change in the future. Also, most tokenizers are not expected to handle DTDs or PIs.

This class should not be directly instantiated, but you may use create() to retrieve a default copy of the lexer. Being a supertype, this class does not actually define any implementation, but offers commonly used convenience functions for subclasses.

HTMLPurifier_Lexer
- HTMLPurifier_Lexer_DirectLex
- HTMLPurifier_Lexer_DOMLex
  - HTMLPurifier_Lexer_PH5P

Located at xoops_lib\modules\protector\library\HTMLPurifier\Lexer.php

Methods


					
	public
					static

create(HTMLPurifier_Config $config): HTMLPurifier_Lexer

Retrieves or sets the default Lexer as a Prototype Factory.

By default HTMLPurifier_Lexer_DOMLex will be returned. There are a few exceptions involving special features that only DirectLex implements.

Throws

HTMLPurifier_Exception


					
	public

__construct()

Overriden by

HTMLPurifier_Lexer_DOMLex::__construct()


					
	public

parseText($string, $config)


					
	public

parseAttr($string, $config)


					
	public

parseData(string $string, $is_attr, $config): string

Parses special entities into the proper characters.

This string will translate escaped versions of the special characters into the correct ones.

Parameters

$string

String character data to be parsed.

Returns

Parsed character data.


					
	public

tokenizeHTML($string, HTMLPurifier_Config $config, HTMLPurifier_Context $context): HTMLPurifier_Token[]

Lexes an HTML string into tokens.

Returns

array representation of HTML.

Overriden by


					
	protected
					static

escapeCDATA(string $string): string

Translates CDATA sections into regular sections (through escaping).

Parameters

$string

HTML string to process.

Returns

HTML with CDATA sections escaped.


					
	protected
					static

escapeCommentedCDATA(string $string): string

Special CDATA case that is especially convoluted for <script>

Parameters

$string

HTML string to process.

Returns

HTML with CDATA sections escaped.


					
	protected
					static

removeIEConditional(string $string): string

Special Internet Explorer conditional comments should be removed.

Parameters

$string

HTML string to process.

Returns

HTML with conditional comments removed.


					
	protected
					static

CDATACallback(array $matches): string

Callback function for escapeCDATA() that does the work.

Parameters

$matches

PCRE matches array, with index 0 the entire match and 1 the inside of the CDATA section.

Returns

Escaped internals of the CDATA section.


					
	public

normalize(string $html, HTMLPurifier_Config $config, HTMLPurifier_Context $context): string

Takes a piece of HTML and normalizes it by converting entities, fixing encoding, extracting bits, and other good stuff.

Parameters

$html

HTML.


					
	public

extractBody($html)

Takes a string of HTML (fragment or document) and returns the content

Properties
`public`		`$tracksLineNumbers = false` Whether or not this lexer implements line-number/column-number tracking. If it does, set to true. Whether or not this lexer implements line-number/column-number tracking. If it does, set to true.	#
`protected`		`$_special_entity2str = [ '"' => '"', '&' => '&', '<' => '<', '>' => '>', ''' => "'", ''' => "'", ''' => "'", ]` Most common entity to raw value conversion table for special entities. Most common entity to raw value conversion table for special entities.	#

Namespaces

Classes

Interfaces

Exceptions

Functions

Class HTMLPurifier_Lexer

Throws

Overriden by

Parameters

Returns

Returns

Overriden by

Parameters

Returns

Parameters

Returns

Parameters

Returns

Parameters

Returns

Parameters