A UTF-8 specific character encoder that handles cleaning and transforming.
| Methods | ||||||||
|---|---|---|---|---|---|---|---|---|
| 
					
	public
					static
					
				 | muteErrorHandler()
		Error-handler that mutes errors, alternative to shut-up operator.
	 Error-handler that mutes errors, alternative to shut-up operator. | # | ||||||
| 
					
	public
					static
					
				 | unsafeIconv(string $in, string $out, string $text): string
		iconv wrapper which mutes errors, but doesn't work around bugs.
	 iconv wrapper which mutes errors, but doesn't work around bugs. Parameters
 | # | ||||||
| 
					
	public
					static
					
				 | iconv(string $in, string $out, string $text, int $max_chunk_size = 8000): string
		iconv wrapper which mutes errors and works around bugs.
	 iconv wrapper which mutes errors and works around bugs. Parameters
 | # | ||||||
| 
					
	public
					static
					
				 | cleanUTF8(string $str, bool $force_php = false): string
		Cleans a UTF-8 string for well-formedness and SGML validity
	 Cleans a UTF-8 string for well-formedness and SGML validity It will parse according to UTF-8 and return a valid UTF8 string, with non-SGML codepoints excluded. Specifically, it will permit: \x{9}\x{A}\x{D}\x{20}-\x{7E}\x{A0}-\x{D7FF}\x{E000}-\x{FFFD}\x{10000}-\x{10FFFF} Source: https://www.w3.org/TR/REC-xml/#NT-Char Arguably this function should be modernized to the HTML5 set of allowed characters: https://www.w3.org/TR/html5/syntax.html#preprocessing-the-input-stream which simultaneously expand and restrict the set of allowed characters. Parameters
 | # | ||||||
| 
					
	public
					static
					
				 | unichr($code)
		Translates a Unicode codepoint into its corresponding UTF-8 character.
	 Translates a Unicode codepoint into its corresponding UTF-8 character. | # | ||||||
| 
					
	public
					static
					
				 | iconvAvailable(): bool | # | ||||||
| 
					
	public
					static
					
				 | convertToUTF8(string $str, HTMLPurifier_Config $config, HTMLPurifier_Context $context): string
		Convert a string to UTF-8 based on configuration.
	 Convert a string to UTF-8 based on configuration. Parameters
 | # | ||||||
| 
					
	public
					static
					
				 | convertFromUTF8(string $str, HTMLPurifier_Config $config, HTMLPurifier_Context $context): string
		Converts a string from UTF-8 based on configuration.
	 Converts a string from UTF-8 based on configuration. Parameters
 | # | ||||||
| 
					
	public
					static
					
				 | convertToASCIIDumbLossless(string $str): string
		Lossless (character-wise) conversion of HTML to ASCII
	 Lossless (character-wise) conversion of HTML to ASCII Parameters
 ReturnsASCII encoded string with non-ASCII character entity-ized | # | ||||||
| 
					
	public
					static
					
				 | testIconvTruncateBug(): int
		glibc iconv has a known bug where it doesn't handle the magic
//IGNORE stanza correctly.  In particular, rather than…
	 glibc iconv has a known bug where it doesn't handle the magic //IGNORE stanza correctly. In particular, rather than ignore characters, it will return an EILSEQ after consuming some number of characters, and expect you to restart iconv as if it were an E2BIG. Old versions of PHP did not respect the errno, and returned the fragment, so as a result you would see iconv mysteriously truncating output. We can work around this by manually chopping our input into segments of about 8000 characters, as long as PHP ignores the error code. If PHP starts paying attention to the error code, iconv becomes unusable. ReturnsError code indicating severity of bug. | # | ||||||
| 
					
	public
					static
					
				 | testEncodingSupportsASCII(string $encoding, bool $bypass = false): Array
		This expensive function tests whether or not a given character
encoding supports ASCII. 7/8-bit encodings like Shift…
	 This expensive function tests whether or not a given character encoding supports ASCII. 7/8-bit encodings like Shift_JIS will fail this test, and require special processing. Variable width encodings shouldn't ever fail. Parameters
 Returnsof UTF-8 characters to their corresponding ASCII, which can be used to "undo" any overzealous iconv action. | # | ||||||
| Constants | ||
|---|---|---|
| 
					
	public
				 | ICONV_OK = 0
		No bugs detected in iconv.
	 No bugs detected in iconv. | # | 
| 
					
	public
				 | ICONV_TRUNCATES = 1
		Iconv truncates output if converting from UTF-8 to another
 character set with //IGNORE, and a non-encodable character…
	 Iconv truncates output if converting from UTF-8 to another character set with //IGNORE, and a non-encodable character is found | # | 
| 
					
	public
				 | ICONV_UNUSABLE = 2
		Iconv does not support //IGNORE, making it unusable for
 transcoding purposes
	 Iconv does not support //IGNORE, making it unusable for transcoding purposes | # |