Friday, June 25, 2010

CART, Court, and Captioning

What Is Steno Good For?

Part One: How to Speak With Your Fingers
Part Two: Writing and Coding
Part Three: The Ergonomic Argument
Part Four: Mobile and Wearable Computing
Part Five: Raw Speed
Part Six: CART, Court, and Captioning

Finally, the sixth and last installment of my What Is Steno Good For? series. The first five sections dealt with using steno in daily life, for conversation, prose composition and coding, injury prevention, typing while walking, and inputting text as efficiently as possible. Plover is being developed primarily with those five spheres in mind.

This section is different. It focuses on people who actually want to make a living as court reporters, CART providers, or captioners. It's also the category that the majority of the Plover Project's current testers, readers, and commenters belong to. In order for Plover to succeed, that proportion needs to change.

Steno as a career is skyrocketing. Official reporters (the ones who work in actual courtrooms) are facing layoffs, but in every other field -- deposition work, captioning, and CART -- there's far more demand than supply. Rates are relatively high (though down considerably from their peak in the '90s, and gradually continuing to decline) and work is plentiful. Certified realtime stenographers can make six figures a year, while setting their own schedules and maintaining autonomy as independent contractors. It's pretty much a dream job.

Steno as an academic-vocational discipline is dying. Steno schools continue to shut down across the country. The national dropout rate is 85%. Student machines cost over $1,000, and DRM-riddled student software runs about $500, so without even considering tuition, students are forced to pay a largely non-refundable $1,500 right out of the gate. Considering the 15% graduation rate and the variable length of study (which ranges from 1 to 6 years, but averages around 4 years of intensive daily practice to reach graduation speeds of 225 WPM), steno school is a fool's gamble for the vast majority of new students. Most schools are for-profit, so it's in their interest to accept large numbers of theory students, selling them their steno machines when the semester starts and buying them back at a steep markdown from the dropouts, who tend to leave around 120 WPM, just in time for the next crop of theory students to arrive. There's no incentive for schools to screen for English aptitude, physical dexterity, or self-discipline, because the students that are all but doomed to fail are potentially even more lucrative than the successful ones, due to the revolving steno machine sale-and-buyback scheme. This means plenty of profit in the short term, but in the long term it spells the death not only of these short-sighted schools, but of the steno professions themselves.

A market in which demand exceeds supply will hold out only so long. Eventually the vaccuum caused by the shortage of stenographers will collapse, and inferior but readily available substitutes such as electronic recording, undertrained voice writers, and non-verbatim notetaking systems will move in to claim the territory. Compounding the problem is that many people think that the career is less than a decade away from obsolescence; 30 years of Star Trek has put the idea into their heads that artificial intelligence is a nut we're close to cracking, and that a computer that can understand and transcribe everything we say to it is just around the corner. I've got lots and lots to say on this one, but let me just lay out the short and sweet version, and you can either take my word for it now or wait for the long argument to come later. (You might also want to read this article for some of the technical details.)

Without true artificial intelligence, there is no reliable speech recognition. Current speech recognition software works relatively well with good audio, clear speakers, and a somewhat restricted vocabulary. Dictation at 160 WPM or less can give good results, especially if the speaker puts in the effort to train themselves and their software, and providing that they have the luxury to stop the dictation and correct any errors made by the software before continuing on. In real-life situations, where the speaker being transcribed can't be induced to slow down, correct errors, or enunciate perfectly in American-accented English -- even with an intermediary "respeaker" repeating the dictation directly into a microphone, inserting punctuation, and correcting errors on the fly -- the software's verbatim realtime accuracy is significantly below that of a trained stenographer. The only respeakers that even approach the accuracy of realtime steno are true voice writers, who spend thousands of hours training their voices, figuring out ways to differentiate the pronunciation of homophones, and creating macros to resolve mistranscriptions. It is not easy to do. I compare true voice writing to beatboxing and steno to playing a drumset in my article Voice Captioning Versus CART. You can read it if you're interested in that sort of thing.

The trouble is that everyone keeps saying "Voice recognition software is constantly improving. It gets better with every new release. Soon it'll be perfect." The first two statements are correct. The third is a fallacy. The software is improving, but asymptotically. Its theoretical ceiling of improvement is far below what's required for consistent, reliable transcription. Speech recognition software doesn't parse language the way humans do. It has no ability to use context or meaning to change sounds into words. It records audio waveforms, breaks them up into little bits, and compares them to a database of other audio waveforms. It never finds a perfect match, because no two humans say the same word in exactly the same way each time. Instead, it tries to choose the closest match in its database of thousands of other tiny fragments of audio. All speech recognition software relies on probability-based algorithms to guess at what's being said. This means that the more common the phrase, the more variants of it will be found in the database, and the more likely it will be to be correctly transcribed.

But the converse is also true. In the architecture class I provide CART for, the phrase "sum of the forces" comes up several dozen times a week. But because the phrase "some of the" is so much more common in normal speech than "sum of the", the VR software would mistranscribe it unless the voice writer figured out a way to say "sum" that sounded completely different from the word "some" and defined it as a custom waveform. There are scads of these soundalike words and phrases in the English language, and the voice writer is at a disadvantage when trying to distinguish them. The steno writer has a number of options to resolve homophone conflicts or to compress a wordy phrase into a single stroke. They can add the asterisk, they can alter the vowels, or they can take a cue from the way the word is spelled. It's much harder for a voice writer to find an alternative way to pronounce a word or syllable, because not only must they pronounce it consistently so that the computer can recognize it each time, but it also can't sound like any other words or syllables that they might be called upon to speak. It's much easier to write a memorable nonsense syllable on the steno keyboard than it ever would be to speak it.

There's also the inherent uncertainty involved in decoding analog speech with a digital algorithm. Even with good amplification, the signal is always lossy to some extent, and the speech processing algorithms are essentially a black box that weigh relative probabilities and then just spit out the most likely one, without being able to incorporate any semantic or contextual calculations. The voice writer is never quite sure what the machine is going to make out of what they said, and no matter how cleanly they speak, they're forced to build in a lot more error correction time into their transcription process. Steno writers can write a word in half a second that took the speaker three seconds to say, and they know with certainty what will come up on the screen when they hit a particular chord. That's an advantage a voice writer will never have. Add in that a voice writer has to speak at the same time that they're trying to listen, and you see some of the difficulties they labor under.

There are some excellent voice writers out there, and I don't want to devalue their talent or the enormous amount of training that goes into the process of achieving accurate verbatim realtime using VR software. On the contrary; I think if people realized how much work it takes to do the job properly with the voice, they might balk a lot less at the idea of learning to do it with their fingers. Unfortunately, the shortage of CART providers, captioners, and court reporters has led to a widespread practice of companies hiring untrained voice writers, deciding that their output is good enough, and dropping both standards and wages accordingly. It's a sad situation.

Because voice recognition is perceived to be so much easier than it really is, and because learning it only requires about $200, a microphone, and a computer, it's much easier to find people willing to give it a chance. After all, if it doesn't live up to their expectations, they're only out $200, rather than the $1,500 albatross steno school dropouts find themselves trying to unload. Imagine if computer programming required a special computer that couldn't connect to the internet or run games or do anything else except write computer software, and that it sold for $1,500. What do you think the state of software development would look like? Maybe some rich kids' parents would buy them the machine, but they'd probably prefer that they become doctors or lawyers than programmers, which is a lot of work for not much prestige. Poor kids would be completely out of luck. Middle class kids might think that programming sounded fun, but they'd probably decide it wasn't worth the restrictive entry cost. Some few people might decide that programming was their best shot at making a good living, so they'd scrimp and save and take out loans to buy the special programming computer plus the lessons to go with it. And after all that, what if they didn't like programming? What if they didn't have an aptitude for it? They were out $1,500 and a lot of wasted effort. What kind of smart, inquisitive, curious kid would make that kind of gamble? What would the field of computer programming look like if this were the only way to write software?

It's the state of steno today, and I'm worried that if it goes on for much longer, the discipline will die out altogether. The only way we can build the next generation of realtime reporters, captioners, and CART providers is if we get people using steno for all sorts of purposes -- not just the ones that will make them an immediate profit. Once there's a pool of amateurs and enthusiasts all using steno in their daily lives, it will be evident how useful it can be and how outdated the qwerty interface has become. Kids will start learning it in their typing classes. Companies will start selling steno machines (hopefully ultra-portable ones!) at consumer prices. People who would feel awkward talking to themselves in public via VR software will embrace steno as the most efficient way to put their thoughts into words.

All of this holds true even if they're only writing at 120 words per minute. It took me a year and a half to graduate from steno school. In that time, I noticed that most of my fellow students dropped out when they were writing between 120 and 225 words per minute. Relatively few of them dropped out before their third semester. They would make fairly steady progress through theory and up to 120 WPM, then plateau. It seems that nearly anyone can get up to 100 WPM or so in less than six months, but that closing the gap between 100 and 200 seems to take much more work. You don't need to write at 225 WPM to reap the advantages of steno. Even 120 WPM is double the average qwerty typing speed, and steno has significant ergonomic benefits as well. Users can overtake their qwerty speed within the first few months of use, then gradually work their way up to higher speeds while using steno to perform their daily tasks, rather than spending 10 hours a week in grueling, boring dictation classes.

Inevitably, some of these people will find they have both a passion and a talent for steno. They'll push themselves to go faster and faster, and eventually they'll arrive at court/CART/captioning speeds. Much like programmers do today, they'll start out tinkering around with the free software, discover a passion and an aptitude for the system, possibly spend some time in a formal program polishing their technique, and discover one day that they're skilled enough to take paying work. These people are the future of our profession, and right now they hardly know it exists. The only way people will bother to learn steno is if the software is free, the steno machine costs less than $100, and the lessons are available online. The Plover Project is an attempt to meet those goals, and to secure the future of the work that I love.


Jenni said...

Great article! I agree that something has to be done to bring people into the industry, and what you're proposing might just be it! I linked to your blog from my new steno student blog, hope word about this gets around. We can all take it to our personal social networks and see what happens. I haven't tried your software yet, but I'd like to.

Tony said...

Mirabai, your observation about people's Star Trek-like expectations of capabilities of computers vis-a-vis speech reminds me of what Geoffrey Pullum, a well-known linguist, wrote about his attempt to find a short, easy-to-explain definition of linguistics for lay people. He decided to start telling people that linguistics is the sort of science that you would learn if you wanted to get computers to be able to understand and produce ordinary human language. He was surprised to find that many people's reaction to this was: "Hasn't that already been done?"

NeverLNG said...

That is really very interesting.

I tried reading your transcribed dictation to my own DragonNaturallySpeaking that I have been using for several months for medical transcription, and it was really almost entirely accurate up to maybe 160 WPM.

The Google speech recognition program is just a hoot -- and essentially useless.

I do agree, that speech recognition DNS style has a long way to go before it is generally applicable, and probably can't really be truly useful without putting a human into the computer box, but it works fairly well for me in a very limited application.

(I typed this on QWERTY keyboard -- I didn't try to dictate it)


Then I just read what I typed into DNS -- this is the output, without corrections -- it is close enough to be useful, but it missed "hoot."


That is really very interesting.

I tried reading your transcribed dictation to my own Dragon NaturallySpeaking that I have been using for several months for medical transcription, and it was really almost entirely accurate up to maybe hundred and 60 words per minute

The Google speech recognition programs just and -- and essentially useless.

I do agree that speech recognition DNS style has a long way to go before it is generally applicable, and probably can't really be truly useful without putting a human into the computer box, but it works fairly well for me in a limited application

(I type this on the QWERTY keyboard -- I didn't try to dictate)

Tom Duncan

Mirabai Knight said...

That really is quite good, Tom. Yes, it seems like for dictation, DNS can be very useful if properly trained. It's only when people insist that the leap from trained single voice dictation to multivoice natural language transcription is imminent and inevitable that I get skeptical. They're two very different settings, and the former is much easier to teach a computer how to do than the latter is.

need a second chance said...

I am trying to re-enter the field and would love a free or reasonable (no huge down payment) writer to start building my speed again. I can make monthly payments of $60. I eventually would like to get into captioning.

Anonymous said...

@need a second chance:
I was an RPR-CM many years ago, and I want to re-enter the field too. I purchased some used equipment on eBay and it should arrive next week.

Other than preparing for the RPR again, I am not sure what I will do after that. I'm interested in hearing about your plan. dsr

Michael said...

Just for the record, people think translation is also a solved problem. And it ain't, not by a long shot, for the same reasons you list for transcription.

Anonymous said...

I think what you are doing is wonderful. I am interested is seeing if I can get into the CART field (passed two 225 Q&A many many years ago; then did not want to work in the field). What may stymie me is that the requirements for CART or Captioning are much stricter than for taking down a dep not using realtime. So for the novice, they can get good clean copy with Dragon at 160 to 180 wpm easily. With sten it will not be easy to get to 160wpm or 180. Even if I get to 225 this time, it is unlikely I will meet the requirements for CART or captioning. Only 30 percent have the capability of reaching the speed to begin with; writing for CART or Captioning, I believe the capability may be 10 percent of the 30 percent who do graduate. You are truly one of the very elite, talented few who have the capability work in CART. As evidenced by your spectacularly short time spent in court reporting school before graduation. You are a natural. Very, very few writers, even those capable of graduating, can do what you do. That being said, I think your idea of bringing steno to the masses at a price they can afford is a labor of love they will be rewarded one day.

Anonymous said...

Oops... typing on my crappy netbook. I meant "a labor of love THAT will be rewarded someday."

P.Ray said...

Is this software capable of doing live broadcasting for captioning? I am a CART writer and I am trying to transition over to captioning but the software is $6995. I am just overwhelmed with all the prices of everything.

Mirabai Knight said...

Plover is great for CART, but I'm afraid it's not set up to do television broadcast captioning. Sorry!

Unknown said...

How well does this software work for doing depositions and producing transcripts? Is it capable of creating cover pages, auto indexing, certificate page at the end, etc?

Mirabai Knight said...

Plover does none of those things. It's simply a conduit from your steno machine to your operating system. You'll need to use a text editor or word processor to do whatever transcript production work you need. I know nothing about transcripts, since I've never done any court reporting (and hope I never have to!), so I made a decision to keep Plover as streamlined as possible. It works beautifully for realtime captioning, but for broadcast captioning or transcript production, you might want to choose another steno engine.

Unknown said...

Ok. Thank you for the quick response :-)

Unknown said...

Ok. Thank you for the quick response :-)

IHateGoogle said...

As a voice writer wannabe and new Dragon nerd, I have learned how to overcome those homophone issues-- "Sum of all things." Go into your vocabulary editor, erase "SUM",replace it with Sum-CO, train it and now every time you need to say SUM, you speak as SUM-COM and you will always get SUM.

Alternatively, you can create a conflict resolution in EclipseVOX and allow the AI to solve these issues.

I think that Steno writing is awesome but the learning curve is HUGE. I think that voice writing is much harder than people believe but if a person trains hard they can get to the 180s in 3-6 months.