Class Protector_postcommon_post_language_match
Class protector_postcommon_post_language_match
Check post content conformance to the system language. Requires UTF-8 environment with mbstring.
This filter compares post data to the characters that define the current system language. If the number of characters that are not normally used in the language exceeds a threshold, the post will be rejected.
The threshold can be adjusted in $maximumTolerance.
A value of 0.02 (2% non-language characters) can often discriminate between multiple Latin languages, while values approaching 1.0 (100% non-language) indicate totally different alphabets, such as comparing English (Latin) to Russian (Cyrillic.) Some commonalities are always possible with "loanwords," so this number always represents tendency, not absolutes.
Certain ranges are common to all languages, whitespace, punctuations, currency symbols, emoji, etc. These are automatically excluded from the analysis.
If site requirements are for multiple languages concurrently, a $customRange can be set to to include the requirements of both languages.
Ranges are in regular expression format as used in preg_replace()
If the language filter detects a mismatch, the post is denied. If the mismatch is more that double (2 times) the configured threshold, the account is deactivated.
- ProtectorFilterAbstract
- Protector_postcommon_post_language_match
Category: Protector\Filter
Copyright: 2016 XOOPS Project (http://xoops.org)
License: GPL 2 or later (http://www.gnu.org/licenses/gpl-2.0.html)
Author: Richard Griffith richard@geekwright.com
Link: http://xoops.org
Located at oops_lib/modules/protector/filters_disabled/postcommon_post_language_match.php
protected
string
|
#
stripEmoji( string $string )
stripEmoji - remove pictographic characters, i.e. emoji and dingbats from a string |
public
boolean
|
__construct(),
isMobile()
|
protected
integer
|
$minPosts
after this number of posts by the user, skip this filter |
#
10
|
protected
float
|
$maximumTolerance
maximum proportion of off-language characters to accept |
#
0.02
|
protected
string|null
|
$customRange
custom character range to match, null to use default for current language |
#
null
|
protected
integer
|
$minLength
do not run analysis if input length is less than this |
#
15
|
protected
string[]
|
$skipThese
script names we do NOT want to process |
#
array('edituser.php', 'register.php', 'search.php', 'user.php', 'lostpass.php')
|
protected
array
|
$scriptCodes
|
#
array(
'arabic' => '\p{Arabic}',
'brazilian' => 'A-Za-zÁáÂâĀãÀàÇçÉéÊêÍíÓóÔôŌõÚú',
'bulgarian' => '\p{Cyrillic}',
'chinese_zh' => '\p{Han}',
'croatian' => 'A-PR-Va-pr-vĆćČčĐ𩹮ž',
'czech' => 'A-Za-zÁáČčĎďÉéĚěÍíŇňÓóŘřŠšŤťÚúŮůÝýŽž',
'danish' => 'A-Za-zÆØÅæøå',
'dutch' => 'A-Za-zIJij',
'english' => 'A-Za-z',
'french' => 'A-Za-zÀàÂâÆæÇçÈèÉéÊêËëÎîÏïÔôŒœÙùÛûÜü',
'german' => 'A-Za-zÄäÉéÖöÜüß',
'greek' => '\p{Greek}',
'hebrew' => '\p{Hebrew}',
'hungarian' => '\p{Latin}',
'italian' => 'A-IL-VZa-il-vzÀÈÉÌÒÙàèéìòù',
'japanese' => '\p{Han}\p{Hiragana}\p{Katakana}',
'korean' => '\p{Hangul}',
'malaysian' => 'A-Za-z',
'norwegian' => 'A-Za-zÆØÅæøå',
'persian' => '\p{Arabic}',
'polish' => 'A-Za-zĄąĘęÓóĆ棳ŃńŚśŹźŻż',
'portuguesebr' => 'A-Za-zÁáÂâĀãÀàÇçÉéÊêÍíÓóÔôŌõÚú',
'portuguese' => 'A-Za-zÁáÂâĀãÀàÇçÉéÊêÍíÓóÔôŌõÚú',
'russian' => '\p{Cyrillic}',
'schinese' => '\p{Han}',
'slovak' => 'A-Za-zÁáČčĎďÉéÍíĹ弾ŇňÓóÔôŔ੹ŤťÚúÝýŽž',
'slovenian' => 'A-PR-VZa-pr-vzČ芚Žž',
'spanish' => 'A-Za-zÁáÉéÍíÑñÓóÚúÜü',
'swedish' => 'A-Za-zÅåÄäÖö',
'tchinese' => '\p{Han}',
'thai' => '\p{Thai}',
'turkish' => 'A-PR-VYZÇĞİÖŞÜÂÎÛa-pr-vyzçğiöşüâîû',
'vietnamese' => 'A-Za-zàÀảẢãÃáÁạẠăĂằẰẳẲẵẴắẮặẶâÂầẦẩẨẫẪấẤậẬđĐèÈẻẺẽẼéÉẹẸêÊềỀểỂễỄếẾệỆìÌỉỈĩĨíÍịỊòÒỏỎõÕóÓọỌôÔồỒổỔỗỖốỐộỘơƠờỜởỞỡỠớỚợỢùÙủỦũŨúÚụỤưƯừỪửỬữỮứỨựỰỳỲỷỶỹỸýÝỵỴ',
)
|
$protector
|