Highlight Detection Rules |
Prev | Working with Syntax Highlighting | Next |
This section describes the syntax detection rules.
Each rule can match zero or more characters at the beginning of the string they are test against. If the rule matches, the matching characters are assigned the style or attribute defined by the rule, and a rule may ask that the current context is switched.
A rule looks like this:
<RuleName attribute="(identifier)" context="(identifier)" [rule specific attributes] />
The attribute identifies the style to use for matched characters by name, and the context identifies the context to use from here.
The context can be identified by:
An identifier, which is the name of the other context.
An order telling the engine to stay in the
current context (#stay
), or to pop back to a
previous context used in the string (#pop
).
To go back more steps, the #pop keyword can be repeated:
#pop#pop#pop
Some rules can have child rules which are then evaluated only if the parent rule matched. The entire matched string will be given the attribute defined by the parent rule. A rule with child rules looks like this:
<RuleName (attributes)> <ChildRuleName (attributes) /> ... </RuleName>
Rule specific attributes varies and are described in the following sections.
Common attributes
All rules have the following attributes in common and are
available whenever (common attributes)
appears.
attribute and context
are required attributes, all others are optional.
attribute: An attribute maps to a defined itemData.
context: Specify the context to which the highlighting system switches if the rule matches.
beginRegion: Start a code folding block. Default: unset.
endRegion: Close a code folding block. Default: unset.
lookAhead: If true, the highlighting system will not process the matches length. Default: false.
firstNonSpace: Match only, if the string is the first non-whitespace in the line. Default: false.
column: Match only, if the column matches. Default: unset.
Dynamic rules
Some rules allow the optional attribute dynamic
of type boolean that defaults to false. If dynamic is
true, a rule can use placeholders representing the text
matched by a regular expression rule that switched to the
current context in its string
or
char
attributes. In a string
,
the placeholder %N
(where N is a number) will be
replaced with the corresponding capture N
from the calling regular expression. In a
char
the placeholder must be a number
N
and it will be replaced with the first character of
the corresponding capture N
from the calling regular
expression. Whenever a rule allows this attribute it will contain a
(dynamic).
dynamic: may be (true|false).
Detect a single specific character. Commonly used for example to find the ends of quoted strings.
<DetectChar char="(character)" (common attributes) (dynamic) />
The char
attribute defines the character
to match.
Detect two specific characters in a defined order.
<Detect2Chars char="(character)" char1="(character)" (common attributes) (dynamic) />
The char
attribute defines the first character to match,
char1
the second.
Detect one character of a set of specified characters.
<AnyChar String="(string)" (common attributes) />
The String
attribute defines the set of
characters.
Detect an exact string.
<StringDetect String="(string)" [insensitive="true|false"] (common attributes) (dynamic) />
The String
attribute defines the string
to match. The insensitive
attribute defaults to
false and is passed to the string comparison
function. If the value is true insensitive
comparing is used.
Detect an exact string but additionally require word boundaries
like a dot '.'
or a whitespace on the beginning
and the end of the word. Think of \b<string>\b
in terms of a regular expression, but it is faster than the rule RegExpr
.
<WordDetect String="(string)" [insensitive="true|false"] (common attributes) (dynamic) />
The String
attribute defines the string
to match. The insensitive
attribute defaults to
false and is passed to the string comparison
function. If the value is true insensitive
comparing is used.
Since: Kate 3.5 (KDE 4.5)
Matches against a regular expression.
<RegExpr String="(string)" [insensitive="true|false"] [minimal="true|false"] (common attributes) (dynamic) />
The String
attribute defines the regular
expression.
insensitive
defaults to
false and is passed to the regular expression
engine.
minimal
defaults to
false and is passed to the regular expression
engine.
Because the rules are always matched against the beginning of
the current string, a regular expression starting with a caret
(^
) indicates that the rule should only be
matched against the start of a line.
See Regular Expressions for more information on those.
Detect a keyword from a specified list.
<keyword String="(list name)" (common attributes) />
The String
attribute identifies the
keyword list by name. A list with that name must exist.
Detect an integer number.
<Int (common attributes) (dynamic) />
This rule has no specific attributes. Child rules are typically
used to detect combinations of L
and
U
after the number, indicating the integer type
in program code. Actually all rules are allowed as child rules, though,
the DTD only allows the child rule StringDetect
.
The following example matches integer numbers follows by the character 'L'.
<Int attribute="Decimal" context="#stay" > <StringDetect attribute="Decimal" context="#stay" String="L" insensitive="true"/> </Int>
Detect a floating point number.
<Float (common attributes) />
This rule has no specific attributes. AnyChar
is
allowed as a child rules and typically used to detect combinations, see rule
Int
for reference.
Detect an octal point number representation.
<HlCOct (common attributes) />
This rule has no specific attributes.
Detect a hexadecimal number representation.
<HlCHex (common attributes) />
This rule has no specific attributes.
Detect an escaped character.
<HlCStringChar (common attributes) />
This rule has no specific attributes.
It matches literal representations of characters commonly used in
program code, for example \n
(newline) or \t
(TAB).
The following characters will match if they follow a backslash
(\
):
abefnrtv"'?\
. Additionally, escaped
hexadecimal numbers like for example \xff
and
escaped octal numbers, for example \033
will
match.
Detect an C character.
<HlCChar (common attributes) />
This rule has no specific attributes.
It matches C characters enclosed in a tick (Example: 'c'
).
So in the ticks may be a simple character or an escaped character.
See HlCStringChar for matched escaped character sequences.
Detect a string with defined start and end characters.
<RangeDetect char="(character)" char1="(character)" (common attributes) />
char
defines the character starting the range,
char1
the character ending the range.
Useful to detect for example small quoted strings and the like, but note that since the highlighting engine works on one line at a time, this will not find strings spanning over a line break.
Matches a backslash ('\'
) at the end of a line.
<LineContinue (common attributes) />
This rule has no specific attributes.
This rule is useful for switching context at end of line, if the last
character is a backslash ('\'
). This is needed for
example in C/C++ to continue macros or strings.
Include rules from another context or language/file.
<IncludeRules context="contextlink" [includeAttrib="true|false"] />
The context
attribute defines which context to include.
If it a simple string it includes all defined rules into the current context, example:
<IncludeRules context="anotherContext" />
If the string begins with ##
the highlight system
will look for another language definition with the given name, example:
<IncludeRules context="##C++" />
If includeAttrib
attribute is
true, change the destination attribute to the one of
the source. This is required to make for example commenting work, if text
matched by the included context is a different highlight than the host
context.
Detect whitespaces.
<DetectSpaces (common attributes) />
This rule has no specific attributes.
Use this rule if you know that there can several whitespaces ahead, for example in the beginning of indented lines. This rule will skip all whitespace at once, instead of testing multiple rules and skipping one at the time due to no match.
Detect identifier strings (as a regular expression: [a-zA-Z_][a-zA-Z0-9_]*).
<DetectIdentifier (common attributes) />
This rule has no specific attributes.
Use this rule to skip a string of word characters at once, rather than testing with multiple rules and skipping one at the time due to no match.
Once you have understood how the context switching works it will be easy to write highlight definitions. Though you should carefully check what rule you choose in what situation. Regular expressions are very mighty, but they are slow compared to the other rules. So you may consider the following tips.
If you only match two characters use Detect2Chars
instead of StringDetect
. The same applies to
DetectChar
.
Regular expressions are easy to use but often there is another much
faster way to achieve the same result. Consider you only want to match
the character '#'
if it is the first character in the
line. A regular expression based solution would look like this:
<RegExpr attribute="Macro" context="macro" String="^\s*#" />
You can achieve the same much faster in using:
<DetectChar attribute="Macro" context="macro" char="#" firstNonSpace="true" />
If you want to match the regular expression '^#'
you
can still use DetectChar
with the attribute column="0"
.
The attribute column
counts character based, so a tabulator still is only one character.
You can switch contexts without processing characters. Assume that you
want to switch context when you meet the string */
, but
need to process that string in the next context. The below rule will match, and
the lookAhead
attribute will cause the highlighter to
keep the matched string for the next context.
<Detect2Chars attribute="Comment" context="#pop" char="*" char1="/" lookAhead="true" />
Use DetectSpaces
if you know that many whitespaces occur.
Use DetectIdentifier
instead of the regular expression '[a-zA-Z_]\w*'
.
Use default styles whenever you can. This way the user will find a familiar environment.
Look into other XML-files to see how other people implement tricky rules.
You can validate every XML file by using the command xmllint --dtdvalid language.dtd mySyntax.xml.
If you repeat complex regular expression very often you can use ENTITIES. Example:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE language SYSTEM "language.dtd" [ <!ENTITY myref "[A-Za-z_:][\w.:_-]*"> ]>
Now you can use &myref; instead of the regular expression.
Prev | Contents | Next |
The Highlight Definition XML Format | Up | Regular Expressions |