Regexp to create bold from an asterix encapsulated string

Last edited

awkiawki is wonderful

awkiawki is a very light weight wiki that is great as your personal wiki

Awkiawki is a wiki based on awk, hence the name. It works as a cgi-script and uses plain text files for data storage.

The plain text files are in a somewhat markdown-like format and each time a page is requested, awkiawki renders the textfile into html format on the fly. Awkiawki offers a search-field, and performs full text searches on the plain text files. This is fast and practical.

I have been using awkiawki for some time now, these are my experiences:

  • awkiawki runs perfectly on a Raspbery Pi (very light weight, so perfoms well)
  • awkiawki saves your data as plain text files which is imho the best way, plain text files are application-independent and can be worked up on with the very powerful Unix text tools.
  • awkiawki uses a wiki-style syntax close to the original wiki style (as in the old days)
  • awkiawki is a great wiki to use as your personal wiki, mainly because it is so easy to add files to the wiki by just typing a CamelCase word and because it is easy to do full text searches in your wiki.

A personal wiki can be used to write notes, create your own knowledge base, but can also be used for personal leadership (think of and create your personal plans and goals and stuff like that) or maintain todo sections.

Wiki style markup

Awkiawki mimicks the original wiki style from the early 2000~2004 era.

As I have been using ikiwki and vimwiki for a long time, I prefer a syntax that is closer to these two wiki's and is also more close to Markdown.

Fortunately, awkiawki can be hacked upon :)

Trying to create the perfect regexp to create strong, bold text

I like to use asterixes to create bold text, like this:

 This is a *very bold* part of this sentence

So the trick is to embed the part "very bold" into html strong markers, like this:

This is a very bold part of this sentence

Also, it must be possible to emphasis more parts on the same line, like this:

 This is a *very bold* part of this sentence, and *this* too

So, we must be carefull with "greediness" in our regexp.

I have been trying to come up with a regexp for this, by tinkering with the orginal awkiawki code. Awkiawki uses gsub a lot, gsub is an awk-command for string-replacement.

So far, this is what I have come up with:

/\*[A-Za-z0-9]/ { if ( $0 !~ /https?:/ && $0 !~ /^ / ) { gsub(/\*([^\*]+)\*/, "<strong>&</strong>"); gsub (/<strong>\*/, "<strong>" ); gsub (/\*<\/strong>/, "</strong>" ); } }

This is a line in the file parser.awk. Here is a short explanation of the several parts of this line.

First part

/\*[A-Za-z0-9]/ {

Look for strings that start with an asterix.

Second part

if ( $0 !~ /https?:/ && $0 !~ /^ / )

That does not contain an url and does not start with a space (lines starting with spaces are shown as preformatted text / typewriter text).

Fourth part

gsub(/\*([^\*]+)\*/, "<strong>&</strong>")

Surround substrings that are surrounded by asterixes with the html "strong" marker.

The part ([^*]+) means a character that is not an asterix, [^*] , optionally followed by one or more characters that are not an asterix, the ( +) part.

This part of the code was the hardest to come up with, given the constraint, mentioned above, that multiple parts in one line must be possible.


gsub (/<strong>\*/, "<strong>" ); gsub (/\*<\/strong>/, "</strong>" );

The remainder of the code is just to remove the surrounding asterixes.

Not perfect, yet

I am sure there must be a neater, faster and shorter way to do this, but this is the present status of my quest for that ...


The wonderful code of Awkiawki can be found on