box.matto.nl

home/

Enhanced my awk markdown to fodt parser

Last edited

As you can read on my page Generate odt files with awk I have created a little awk script to convert a Markdown like file to fodt.

I build this script just for fun, pandoc does a far more better job. I wanted something that is light, with just enough functionality. Also, it was just for keeping my awk skills from flowing into oblivion.

This script expects a litte non-conventional format of Markdown, the format it expects is based on the Vim-wiki format. The reason for this is that I have been using the Vim-wiki too long :)

The script could only recognise and convert a very few elements:

  • h1 header
  • h2 header

Now I have enhanced this script with the following elements:

  • h3 header
  • bullits

Reverse engineering

The only thing I did to discover how to format things in the fodt file, was to create a sample file in Libreoffice and then looked inside of it.

So the result of my awk script will not follow the fodt-specifications to the letter, but it works.

For more information, see my previous page.

File format

The script parsed files with the following format:

= H1 header
== H2 header
=== H3 header
- bullit 
* bullit

For bullits I choosed to support both - and * as a bullit marker. All the five elements (H1 - H3 and bullits) are to be places at the very beginning of the line.

New version of the script

BEGIN {
    cmd = "date -Iseconds";
    cmd | getline dateline 
    close(cmd)
    print "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n";
    print "<office:document xmlns:office=\"urn:oasis:names:tc:opendocument:xmlns:office:1.0\" office:mimetype=\"application/vnd.oasis.opendocument.text\" xmlns:style=\"urn:oasis:names:tc:opendocument:xmlns:style:1.0\" xmlns:text=\"urn:oasis:names:tc:opendocument:xmlns:text:1.0\" xmlns:dom=\"http://www.w3.org/2001/xml-events\"  xmlns:css3t=\"http://www.w3.org/TR/css3-text/\">";
    print " <office:meta><meta:creation-date>" dateline "</meta:creation-date></office:meta>";
    print " <office:body>";
    print " <office:text>";
    in_paragraph = 0;
    in_list = 0;
    id = 4985000297;
    blank_line = 0;
}

!/^[-|*] /{  if (in_list == 1) {
    print "</text:list>";
    in_list = 0;
}
}
/^= /{   if (in_paragraph == 1) {
        print "</text:p>";
        in_paragraph = 0;
    }
    $0 = "<text:h text:outline-level=\"1\">" substr($0, 2) "</text:h>"; print; next; 
}
/^== /{  if (in_paragraph == 1) {
        print "</text:p>";
        in_paragraph = 0;
    }
    $0 = "<text:h text:outline-level=\"2\">" substr($0, 3) "</text:h>"; print; next; 
} 
/^=== /{  if (in_paragraph == 1) {
        print "</text:p>";
        in_paragraph = 0;
    }
    $0 = "<text:h text:outline-level=\"3\">" substr($0, 4) "</text:h>"; print; next; 
}
/^[-|*] /{  if (in_list == 0) {
        if (in_paragraph == 1) {
            print "</text:p>";
            in_paragraph = 0;
        }
        print "<text:list xml:id=\"list" id "\" text:style-name=\"WWNum1\">";
        in_list = 1;
        id++;
    }
    $0 = "<text:list-item><text:p text:style-name=\"P2\">" substr($0, 3) "</text:p></text:list-item> "; print; next; 
}

/^$/ { blank_line = 1; next; } 

{
# print paragraph when blank_line registered
if (blank_line == 1) {
    if (in_paragraph == 1) {
        print "</text:p>";
    }
    print "<text:p text:style-name=\"Standard\"> ";
    print "</text:p>";
    print "<text:p text:style-name=\"Standard\">";
    in_paragraph = 1;
    blank_line=0;
}
print;
} 

END {

    if (in_paragraph == 1) {
        print "</text:p><text:p text:style-name=\"Standard\"/>";
    }
    if (in_list == 1) {
        print "</text:list>";
    }
    print "  </office:text>";
    print " </office:body>";
    print "</office:document>";
}