Thursday, February 4, 2016

The Plover Parser

Ted, our amazing lead developer, has done it again:

Github Page for The Plover Parser

The Plover Parser will read through your Plover stroke/definition log files (assuming that you've got logging enabled), compare them with a wordlist of the 10,000 most common English words (or, optionally, a wordlist that you provide, such as a medical dictionary, if you want to search for more specific matches), then compile a frequency list of which strokes you used most often for which translation during the span of time covered by the logs.

If you're interested, you can look over the reports for my counts (arranged by commonality of translation in English) and stats (arranged by the number of times each stroke appears in the logs). My log files go from last November to yesterday, and the parser found that I used 6,000 of the 10,000 included in the wordlist over that time.

Why is this useful, beyond just idle curiosity? A lot of steno beginners get confused by the number of misstrokes in the default Plover dictionary, especially when using lookup apps like StenoTray. They're not able to quickly distinguish a "canonical" stroke from a "misstroke", and they worry about learning the wrong one. Now, I'm not sure that's as big a problem as some people think it is; in my opinion, the only thing that distinguishes a misstroke from a brief is how easy it is to memorize. I don't want to take misstrokes out of the default dictionary, because they're very useful to have in there once you get a little speed under your belt. But for people who want to learn how steno works by learning the "canonical" strokes, this is a really easy way of separating the strokes I use deliberately day after day after day from the ones that come from a random, occasional slip of the finger, even though they translate just as correctly. So all we have to do is scrape the most used stroke from that list of 6,000 common words, and we've built ourselves a clean pedagogical dictionary for beginners who are intimidated by the full messy scope of the official Plover dictionary. It'll also be useful for building levels in Steno Arcade.

If you want to scrape your own logs and get a snapshot of your own writing style, feel free to use Ted's script and let us know what you come up with! I think it's a very cool project.

1 comment:

Achim63 said...

I tried it today – really interesting and helpful. One of my most used strokes turned out to be the one to call my "dictlook" script that tells me the correct stroke for a word after I typed it using the left hand alphabet.
But I already covered over 500 of the top 10.000 words just in one day's practice (I turned on logging just to try it out).