extends |
ProtectorFilterAbstract |
---|
Class protector_postcommon_post_language_match
Check post content conformance to the system language. Requires UTF-8 environment with mbstring.
This filter compares post data to the characters that define the current system language. If the number of characters that are not normally used in the language exceeds a threshold, the post will be rejected.
The threshold can be adjusted in $maximumTolerance.
A value of 0.02 (2% non-language characters) can often discriminate between multiple Latin languages, while values approaching 1.0 (100% non-language) indicate totally different alphabets, such as comparing English (Latin) to Russian (Cyrillic.) Some commonalities are always possible with "loanwords," so this number always represents tendency, not absolutes.
Certain ranges are common to all languages, whitespace, punctuations, currency symbols, emoji, etc. These are automatically excluded from the analysis.
If site requirements are for multiple languages concurrently, a $customRange can be set to include the requirements of both languages.
Ranges are in regular expression format as used in preg_replace()
If the language filter detects a mismatch, the post is denied. If the mismatch is more that double (2 times) the configured threshold, the account is deactivated.
Methods | ||||
---|---|---|---|---|
protected
|
stripEmoji(string $string): string
stripEmoji - remove pictographic characters, i.e. emoji and dingbats from a string
stripEmoji - remove pictographic characters, i.e. emoji and dingbats from a string Parameters
Returnswithout pictographs |
# | ||
public
|
execute(): bool
Execute the filter
Execute the filter |
# |
Methods inherited from ProtectorFilterAbstract |
---|
__construct(), isMobile() |
Properties | |||
---|---|---|---|
protected
|
int
|
$minPosts = 10
after this number of posts by the user, skip this filter
after this number of posts by the user, skip this filter |
# |
protected
|
float
|
$maximumTolerance = 0.02
maximum proportion of off-language characters to accept
maximum proportion of off-language characters to accept |
# |
protected
|
string|null
|
$customRange = null
custom character range to match, null to use default for current language
custom character range to match, null to use default for current language |
# |
protected
|
int
|
$minLength = 15
do not run analysis if input length is less than this
do not run analysis if input length is less than this |
# |
protected
|
string[]
|
$skipThese = ['edituser.php', 'register.php', 'search.php', 'user.php', 'lostpass.php']
script names we do NOT want to process
script names we do NOT want to process |
# |
protected
|
|
$scriptCodes = [
'arabic' => '\p{Arabic}',
'brazilian' => 'A-Za-zÁáÂâĀãÀàÇçÉéÊêÍíÓóÔôŌõÚú',
'bulgarian' => '\p{Cyrillic}',
'chinese_zh' => '\p{Han}',
'croatian' => 'A-PR-Va-pr-vĆćČčĐ𩹮ž',
'czech' => 'A-Za-zÁáČčĎďÉéĚěÍíŇňÓóŘřŠšŤťÚúŮůÝýŽž',
'danish' => 'A-Za-zÆØÅæøå',
'dutch' => 'A-Za-zIJij',
'english' => 'A-Za-z',
'french' => 'A-Za-zÀàÂâÆæÇçÈèÉéÊêËëÎîÏïÔôŒœÙùÛûÜü',
'german' => 'A-Za-zÄäÉéÖöÜüß',
'greek' => '\p{Greek}',
'hebrew' => '\p{Hebrew}',
'hungarian' => '\p{Latin}',
'italian' => 'A-IL-VZa-il-vzÀÈÉÌÒÙàèéìòù',
'japanese' => '\p{Han}\p{Hiragana}\p{Katakana}',
'korean' => '\p{Hangul}',
'malaysian' => 'A-Za-z',
'norwegian' => 'A-Za-zÆØÅæøå',
'persian' => '\p{Arabic}',
'polish' => 'A-Za-zĄąĘęÓóĆ棳ŃńŚśŹźŻż',
'portuguesebr' => 'A-Za-zÁáÂâĀãÀàÇçÉéÊêÍíÓóÔôŌõÚú',
'portuguese' => 'A-Za-zÁáÂâĀãÀàÇçÉéÊêÍíÓóÔôŌõÚú',
'russian' => '\p{Cyrillic}',
'schinese' => '\p{Han}',
'slovak' => 'A-Za-zÁáČčĎďÉéÍíĹ弾ŇňÓóÔôŔ੹ŤťÚúÝýŽž',
'slovenian' => 'A-PR-VZa-pr-vzČčŠšŽž',
'spanish' => 'A-Za-zÁáÉéÍíÑñÓóÚúÜü',
'swedish' => 'A-Za-zÅåÄäÖö',
'tchinese' => '\p{Han}',
'thai' => '\p{Thai}',
'turkish' => 'A-PR-VYZÇĞİÖŞÜÂÎÛa-pr-vyzçğiöşüâîû',
'vietnamese' => 'A-Za-zàÀảẢãÃáÁạẠăĂằẰẳẲẵẴắẮặẶâÂầẦẩẨẫẪấẤậẬđĐèÈẻẺẽẼéÉẹẸêÊềỀểỂễỄếẾệỆìÌỉỈĩĨíÍịỊòÒỏỎõÕóÓọỌôÔồỒổỔỗỖốỐộỘơƠờỜởỞỡỠớỚợỢùÙủỦũŨúÚụỤưƯừỪửỬữỮứỨựỰỳỲỷỶỹỸýÝỵỴ',
]
|
# |
Properties inherited from ProtectorFilterAbstract |
---|
$protector |