Highlight Detection Rules

This section describes the syntax detection rules.

Each rule can match zero or more characters at the beginning of the string they are test against. If the rule matches, the matching characters are assigned the style or attribute defined by the rule, and a rule may ask that the current context is switched.

A rule looks like this:

<RuleName attribute="(identifier)" context="(identifier)" [rule specific attributes] />

The attribute identifies the style to use for matched characters by name, and the context identifies the context to use from here.

The context can be identified by:

Some rules can have child rules which are then evaluated only if the parent rule matched. The entire matched string will be given the attribute defined by the parent rule. A rule with child rules looks like this:

<RuleName (attributes)>
  <ChildRuleName (attributes) />
  ...
</RuleName>

Rule specific attributes varies and are described in the following sections.

Common attributes

All rules have the following attributes in common and are available whenever (common attributes) appears. attribute and context are required attributes, all others are optional.

Dynamic rules

Some rules allow the optional attribute dynamic of type boolean that defaults to false. If dynamic is true, a rule can use placeholders representing the text matched by a regular expression rule that switched to the current context in its string or char attributes. In a string, the placeholder %N (where N is a number) will be replaced with the corresponding capture N from the calling regular expression. In a char the placeholder must be a number N and it will be replaced with the first character of the corresponding capture N from the calling regular expression. Whenever a rule allows this attribute it will contain a (dynamic).

The Rules in Detail

DetectChar

Detect a single specific character. Commonly used for example to find the ends of quoted strings.

<DetectChar char="(character)" (common attributes) (dynamic) />

The char attribute defines the character to match.

Detect2Chars

Detect two specific characters in a defined order.

<Detect2Chars char="(character)" char1="(character)" (common attributes) (dynamic) />

The char attribute defines the first character to match, char1 the second.

AnyChar

Detect one character of a set of specified characters.

<AnyChar String="(string)" (common attributes) />

The String attribute defines the set of characters.

StringDetect

Detect an exact string.

<StringDetect String="(string)" [insensitive="true|false"] (common attributes) (dynamic) />

The String attribute defines the string to match. The insensitive attribute defaults to false and is passed to the string comparison function. If the value is true insensitive comparing is used.

WordDetect

Detect an exact string but additionally require word boundaries like a dot '.' or a whitespace on the beginning and the end of the word. Think of \b<string>\b in terms of a regular expression, but it is faster than the rule RegExpr.

<WordDetect String="(string)" [insensitive="true|false"] (common attributes) (dynamic) />

The String attribute defines the string to match. The insensitive attribute defaults to false and is passed to the string comparison function. If the value is true insensitive comparing is used.

Since: Kate 3.5 (KDE 4.5)

RegExpr

Matches against a regular expression.

<RegExpr String="(string)" [insensitive="true|false"] [minimal="true|false"] (common attributes) (dynamic) />

The String attribute defines the regular expression.

insensitive defaults to false and is passed to the regular expression engine.

minimal defaults to false and is passed to the regular expression engine.

Because the rules are always matched against the beginning of the current string, a regular expression starting with a caret (^) indicates that the rule should only be matched against the start of a line.

See Regular Expressions for more information on those.

keyword

Detect a keyword from a specified list.

<keyword String="(list name)" (common attributes) />

The String attribute identifies the keyword list by name. A list with that name must exist.

Int

Detect an integer number.

<Int (common attributes) (dynamic) />

This rule has no specific attributes. Child rules are typically used to detect combinations of L and U after the number, indicating the integer type in program code. Actually all rules are allowed as child rules, though, the DTD only allows the child rule StringDetect.

The following example matches integer numbers follows by the character 'L'.

<Int attribute="Decimal" context="#stay" >
  <StringDetect attribute="Decimal" context="#stay" String="L" insensitive="true"/>
</Int>
Float

Detect a floating point number.

<Float (common attributes) />

This rule has no specific attributes. AnyChar is allowed as a child rules and typically used to detect combinations, see rule Int for reference.

HlCOct

Detect an octal point number representation.

<HlCOct (common attributes) />

This rule has no specific attributes.

HlCHex

Detect a hexadecimal number representation.

<HlCHex (common attributes) />

This rule has no specific attributes.

HlCStringChar

Detect an escaped character.

<HlCStringChar (common attributes) />

This rule has no specific attributes.

It matches literal representations of characters commonly used in program code, for example \n (newline) or \t (TAB).

The following characters will match if they follow a backslash (\): abefnrtv"'?\. Additionally, escaped hexadecimal numbers like for example \xff and escaped octal numbers, for example \033 will match.

HlCChar

Detect an C character.

<HlCChar (common attributes) />

This rule has no specific attributes.

It matches C characters enclosed in a tick (Example: 'c'). So in the ticks may be a simple character or an escaped character. See HlCStringChar for matched escaped character sequences.

RangeDetect

Detect a string with defined start and end characters.

<RangeDetect char="(character)"  char1="(character)" (common attributes) />

char defines the character starting the range, char1 the character ending the range.

Useful to detect for example small quoted strings and the like, but note that since the highlighting engine works on one line at a time, this will not find strings spanning over a line break.

LineContinue

Matches a backslash ('\') at the end of a line.

<LineContinue (common attributes) />

This rule has no specific attributes.

This rule is useful for switching context at end of line, if the last character is a backslash ('\'). This is needed for example in C/C++ to continue macros or strings.

IncludeRules

Include rules from another context or language/file.

<IncludeRules context="contextlink" [includeAttrib="true|false"] />

The context attribute defines which context to include.

If it a simple string it includes all defined rules into the current context, example:

<IncludeRules context="anotherContext" />

If the string begins with ## the highlight system will look for another language definition with the given name, example:

<IncludeRules context="##C++" />

If includeAttrib attribute is true, change the destination attribute to the one of the source. This is required to make for example commenting work, if text matched by the included context is a different highlight than the host context.

DetectSpaces

Detect whitespaces.

<DetectSpaces (common attributes) />

This rule has no specific attributes.

Use this rule if you know that there can several whitespaces ahead, for example in the beginning of indented lines. This rule will skip all whitespace at once, instead of testing multiple rules and skipping one at the time due to no match.

DetectIdentifier

Detect identifier strings (as a regular expression: [a-zA-Z_][a-zA-Z0-9_]*).

<DetectIdentifier (common attributes) />

This rule has no specific attributes.

Use this rule to skip a string of word characters at once, rather than testing with multiple rules and skipping one at the time due to no match.

Tips & Tricks

Once you have understood how the context switching works it will be easy to write highlight definitions. Though you should carefully check what rule you choose in what situation. Regular expressions are very mighty, but they are slow compared to the other rules. So you may consider the following tips.

  • If you only match two characters use Detect2Chars instead of StringDetect. The same applies to DetectChar.

  • Regular expressions are easy to use but often there is another much faster way to achieve the same result. Consider you only want to match the character '#' if it is the first character in the line. A regular expression based solution would look like this:

    <RegExpr attribute="Macro" context="macro" String="^\s*#" />

    You can achieve the same much faster in using:

    <DetectChar attribute="Macro" context="macro" char="#" firstNonSpace="true" />

    If you want to match the regular expression '^#' you can still use DetectChar with the attribute column="0". The attribute column counts character based, so a tabulator still is only one character.

  • You can switch contexts without processing characters. Assume that you want to switch context when you meet the string */, but need to process that string in the next context. The below rule will match, and the lookAhead attribute will cause the highlighter to keep the matched string for the next context.

    <Detect2Chars attribute="Comment" context="#pop" char="*" char1="/" lookAhead="true" />

  • Use DetectSpaces if you know that many whitespaces occur.

  • Use DetectIdentifier instead of the regular expression '[a-zA-Z_]\w*'.

  • Use default styles whenever you can. This way the user will find a familiar environment.

  • Look into other XML-files to see how other people implement tricky rules.

  • You can validate every XML file by using the command xmllint --dtdvalid language.dtd mySyntax.xml.

  • If you repeat complex regular expression very often you can use ENTITIES. Example:

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE language SYSTEM "language.dtd"
    [
            <!ENTITY myref    "[A-Za-z_:][\w.:_-]*">
    ]>
    

    Now you can use &myref; instead of the regular expression.