Kindle highlights to wiki
I do love my Kindle Paperwhite, I have the 2015 version, without ads (the version with ads is not available in our country). Of course Amazon is not a branch we should endorse, but I seriously like this device.
All the Amazon Kindle e-readers allow you to highlights text and to take notes. Every time you highlight a piece of text, this text is copied into a file, called "My Clippings.txt". As we can see from the space in the filename, this was probably build by some coder who lives in the DOS- or Windows world.
When you take notes, the note is copied to the "My Clippings.txt" file.
Every highlight and every note has a reference to the book and the location in it, where it was created.
Intelligence has to come from the parser
The software on the Kindle that stores your highlight and notes in the "My Clippings.txt" file, is not very sophisticated ("cough"). All the records are stored in historical order, the oldest records at the top of the file and the newest at the bottom. This means that records belonging to different books can be mingled.
Creating the highlights is not always without errors, it can be hard to get the boundaries right the first time. When you get a boundary wrong (missing a word or a line for example), then you can delete the highlight and create it again. Sometimes it requires several tries before getting the right boundaries. Because of the "sophistication" of the software on the Kindle, this results in multiple records, each try results in a record.
So the parser has to go through the "My Clippings.txt" file, and bundle the records per book. Also it has to ditch the deleted highlights and only keep the good ones. As far as I can tell, this has to be done by keeping the one nearest to the bottom.
awkiawki as a personal wiki
Awkiawki is a wiki that uses awk as cgi. This is a very fast wiki, that even performes great on a Raspberry Pi.
Awkiawki is a very simple wiki, that uses CamelCase to create links from one page to the other.
My awkiawki has become an awesome personal knowledge base and my poor mans Zettelkasten implementation, which is becoming more valuable each and every day.
Awkiawki stores it content as flat text files in the Markdown format. Whenever a page is requested, the cgi-script converts it on the fly into hmtl. During this conversion, CamelCase words are converted to html-links if a wiki file with a corresponding name exists, and if not, a link is created to allow the user to create a new wiki page.
Awkiawki only accepts alfabetical characters in the CamelCase filenames, and not numerical characters. So Catch22 will not be a legitimate filename.
Script to convert "My Clippings.txt" file to wiki pages
My aim is some script with the following result.
In my awkiawki I have a pointer to a file called "KindleHighlights". Remember that awkiawki uses CamelCase to generate links to other files.
The conversion script creates this file and adds a link to the page per book in this file.
The conversion script creates for every book a seperate file, with all the highlights and notes from that book, ordered by location.
Each highlight and note has an unique anchor, to which references can be made on other wiki-pages.
The file "KindleHighlights" functions as an index page to the pages per book. To link the index page to these pages, the filename of the pages has to be in CamelCase. Unfortunately, awkiawki only accepts alfabetical characters in the CamelCase filenames, and not numerical characters. So Catch22 will not be a legitimate filename.
The file "My Clippings.txt" starts each highlight and each note with a line containing the title of the book and the name of the author. The name of the author is between round brackets. This is an example:
As a Man Thinketh (James Allen)
Every page per book ends with a back-reference to the "KindleHighlights" index page, so that I can easely jump between the individual book-pages and the index of all the books with highlights or notes.
awk for fun and profit
Although Perl is what comes to mind when one wants to create such a script, I decided to give awk a try, just because that seemed like fun.
I wrote an awk script that can do the conversion. I want the index page page to be sorted on the book title and the pages per book to be sorted by location. Unfortunately, we have to use gawk for this, because the standard awk doesn't has a function for sorting arrays.
Awk and redirection
Although awk has a syntax like
print var > (target)
print var >> (target)
the redirection of the output works different from that in shell scripts.
When one performs three writes after each other, from the same awk session. this will result in three lines in the target file, even when > (target) is used. The syntax with the single greater-than character means, that the target file will be overwritten at the first write, and the consecutive writes will be appended to that.
Csh script and awk script
The shell script and the awk script, that creates the index page and the wiki pages per book, can be find here: