The Kate Syntax Highlight System

This section will discuss the Kate syntax highlighting mechanism in more detail. It is for you if you want to know about it, or if you want to change or create syntax definitions.

How it Works

Whenever you open a file, one of the first things the Kate editor does is detect which syntax definition to use for the file. While reading the text of the file, and while you type away in it, the syntax highlighting system will analyze the text using the rules defined by the syntax definition and mark in it where different contexts and styles begin and end.

When you type in the document, the new text is analyzed and marked on the fly, so that if you delete a character that is marked as the beginning or end of a context, the style of surrounding text changes accordingly.

The syntax definitions used by the Kate Syntax Highlighting System are XML files, containing

  • Rules for detecting the role of text, organized into context blocks

  • Keyword lists

  • Style Item definitions

When analyzing the text, the detection rules are evaluated in the order in which they are defined, and if the beginning of the current string matches a rule, the related context is used. The start point in the text is moved to the final point at which that rule matched and a new loop of the rules begins, starting in the context set by the matched rule.

Rules

The detection rules are the heart of the highlighting detection system. A rule is a string, character or regular expression against which to match the text being analyzed. It contains information about which style to use for the matching part of the text. It may switch the working context of the system either to an explicitly mentioned context or to the previous context used by the text.

Rules are organized in context groups. A context group is used for main text concepts within the format, for example quoted text strings or comment blocks in program source code. This ensures that the highlighting system does not need to loop through all rules when it is not necessary, and that some character sequences in the text can be treated differently depending on the current context.

Contexts may be generated dynamically to allow the usage of instance specific data in rules.

Context Styles and Keywords

In some programming languages, integer numbers are treated differently than floating point ones by the compiler (the program that converts the source code to a binary executable), and there may be characters having a special meaning within a quoted string. In such cases, it makes sense to render them differently from the surroundings so that they are easy to identify while reading the text. So even if they do not represent special contexts, they may be seen as such by the syntax highlighting system, so that they can be marked for different rendering.

A syntax definition may contain as many styles as required to cover the concepts of the format it is used for.

In many formats, there are lists of words that represent a specific concept. For example in programming languages, the control statements is one concept, data type names another, and built in functions of the language a third. The Kate Syntax Highlighting System can use such lists to detect and mark words in the text to emphasize concepts of the text formats.

Default Styles

If you open a C++ source file, a Java™ source file and an HTML document in Kate, you will see that even though the formats are different, and thus different words are chosen for special treatment, the colors used are the same. This is because Kate has a predefined list of Default Styles which are employed by the individual syntax definitions.

This makes it easy to recognize similar concepts in different text formats. For example comments are present in almost any programming, scripting or markup language, and when they are rendered using the same style in all languages, you do not have to stop and think to identify them within the text.

Tip

All styles in a syntax definition use one of the default styles. A few syntax definitions use more styles that there are defaults, so if you use a format often, it may be worth launching the configuration dialog to see if some concepts are using the same style. For example there is only one default style for strings, but as the Perl programming language operates with two types of strings, you can enhance the highlighting by configuring those to be slightly different. All available default styles will be explained later.