MediaWiki talk:SyntaxHighlightingRegexps

From LiteratePrograms

Jump to: navigation, search

Contents

List of languages we need to add support for

FAQ

Quick list of question-answer for this page:


Would you please add syntax highlighting for language X?

Sure! Just add it to the list above and we'll see if we can get around to it. If you have any good references on the language's syntax, please include them. Also please include a list of common file extensions for source files in the language.

I don't understand the format.

Each language specifies a name, the same name that appears in the code block. Under each language are rules, each specifying a regular expression and a style. Any text matching the expression is given that style (the HTML for the styles is defined at MediaWiki:SyntaxHighlightingStylesheet). Rules can have nested rules which are applied only to sections of text matched by the outer rule. This is useful for locally overriding the formatting of the outer rule, such as colouring escaped characters in a string differently from the rest of the string.
A language can inherit all rules from another earlier-listed language using the "inherit" attribute; most languages should inherit from "plain", which provides noweb syntax highlighting.

I updated the page, but nothing's happening.

If you make a change to the page, it will not propagate immediately to affected articles due to caching. Try editing the page and hitting "preview". Your changes (or any errors) should appear. Note that a "null edit" will not update the highlighting - you have to make an edit which will appear in the history.

I updated the page, and now I can't view any code at all.

You probably made the whole XML structure invalid somehow, like a missing tag. Revert and try again.

I updated the page, but when I try to view a file of the desired type I see strange errors. What do I do?

I need to improve the debugging experience here, but it probably means one of your regular expressions contains a syntax error. Try finding it by selectively removing rules, or contact me to track down the problem.

The syntax highlighting works on the page but not on the code page.

The code page currently sets the language to be the same as the file extension. Just make sure you add an alias to this page for your language for the file extension you're using, like this:
        <language name="extension" inherit="full name" />

Disabled rules

These rules aren't working on PHP PCRE due to limitations of its backreferencing compared to Perl.

    <rule>
      <!-- more strings - q// qw// -->
      <regex><![CDATA[(?:\b| )(?:q|qw)([^\w\s])(?:\\\2|[^\2\n])*\2]]></regex>
      <style>string</style>
      <rule>
        <!-- esc character -->
        <regex><![CDATA[\\.]]></regex>
        <style>esc character</style>
      </rule>
    </rule>
    <rule>
      <!-- more strings - qq// qx// -->
      <regex><![CDATA[(?:\b| )(?:qq|qx)([^\w\s])(?:\\\2|[^\2\n])*\2]]></regex>
      <style>string</style>
      <rule>
        <!-- esc character -->
        <regex><![CDATA[\\.]]></regex>
        <style>esc character</style>
      </rule>
      <rule>
        <!-- variables -->
        <regex><![CDATA[[\$@%]\$?(?:{[^}]*}|[^a-zA-Z0-9_/\t\n\.,\\[\\{\\(]|[0-9]+|[a-zA-Z_][a-zA-Z0-9_]*)?]]></regex>
        <style>identifier</style>
      </rule>
    </rule>
    <rule>
      <!-- regex matching II -->
      <regex><![CDATA[(?:\b| )(?:m|qq?|tr|y)([^\w\s])(?:\\\2|[^\2\n])*(?:\2[gimesox]*)]]></regex>
      <style>regex</style>
    </rule>

Highlightning bug?

It looks like there is a minor bug in the syntax highlithning of Directory listing (C, Windows). The last lines are colored as if they were part of a string. I tested the regex (""|".*?([^\\](\\\\)*)"|"\\\\") in Perl, and it looks like it is working, but maybe PHP (that's what you are using?) regexes are not that compatible. Ahy1 17:29, 21 April 2006 (PDT)

Yup, another hole in PHP's so-called "Perl-compatible" regexps. I replaced it with a much simpler regexp that does the job just fine. There's probably a few other places exhibiting this sort of issue in the regexps... Deco 17:47, 21 April 2006 (PDT)

Splitting up

Is it technically possible to split this document in to a separate document for each language? As it is now, it is too easy to mess up the whole document with a little typo. Very big XML documents are also kind of confusing to navigate, leading to errors (Yesterday, I actually added a rule to the wrong language). Ahy1 11:11, 15 August 2006 (PDT)

Rexx highlighting

Here is a first untested draft of support for Rexx. When I get it cleaned up, I'll ask to have someone add it to the rules. RossPatterson 20:17, 1 December 2006 (PST)

 <language name="rexx" inherit="plain">
   <rule>
     <regex><![CDATA[/\*.*?\*/]]></regex>
     <style>comment</style>
   </rule>
   <rule>
     <regex><![CDATA['(\s|[01])*'b]]></regex>
     <style>string</style>
   </rule>
   <rule>
     <regex><![CDATA["(\s|[01])+\*"b]]></regex>
     <style>string</style>
   </rule>
   <rule>
     <regex><![CDATA['(\s|[0-9a-fA-F]*{2})+\*'x]]></regex>
     <style>string</style>
   </rule>
   <rule>
     <regex><![CDATA["(\s|[0-9a-fA-F]*{2})*"x]]></regex>
     <style>string</style>
   </rule>
   <rule>
     <regex><![CDATA["([^"]|(""))*"]]></regex>
     <style>string</style>
   </rule>
   <rule>
     <regex><![CDATA['([^']|())*']]></regex>
     <style>string</style>
   </rule>
   <rule>
     <regex><![CDATA[\b(\+|-)?(([0-9]+(\.[0-9]+)?)|(\.[0-9]+))((e|E)(\+|-)?[0-9]+)?\b]]></regex>
     <style>numeric</style>
   </rule>
   <rule>
     <regex><![CDATA[\b(nop|drop|do|forever|by|to|until|while|end|iterate|leave|if|then|else|parse|upper|arg|linein|pull|source|var|value|version|with|procedure|expose|return|exit|interpret|say|numeric|digits|form|fuzz|signal|on|off|error|failure|halt|lostdigits|notready|novalue|syntax|condition|rc|sigl|select|when|otherwise)\b]]></regex>
     <style>reserved word</style>
   </rule>
   <rule>
     <regex><![CDATA[([\+\-\*/%\|&=^!\\\<>\.;:\(\)])]]></regex>
     <style>symbol</style>
   </rule>
   <rule>
     <regex><![CDATA[([a-zA-Z!?_][a-zA-Z0-9\.!?_]*)]]></regex>
     <style>identifier</style>
   </rule>
 </language>
 <language name="rex" inherit="rexx"/>
OK, I never "cleaned [it] up" because I never found a way to test syntax highlighting. Can someone with R/W access to the rules please add all of the above, and I'll give it a whirl? RossPatterson
Views