SpeechTEK 2011

in IVR Technology

Last month featured the annual SpeechTEK conference in New York City, where I spoke about the latest in “IVR with a Brain” at one of the sessions.  Several key themes resonated through the conference: multi-modal integration, mobile devices, and the changing face of call centers.

As a continuation from last year, there is increased focus on the newer communication mediums such web, email, social media, and (mobile device) apps.  Not only are companies putting more effort into supporting these channels, but also they are also looking for more integration between them. Ideally they would like all channels to be aware of each other — i.e. a customer’s recent web activity or text message would be taken into account when calling into the call center or using a mobile app.

Most of the major vendors are touting such capabilities — in the abstract. In reality, however, almost no one can justify the resources to make such tight integration a reality anytime soon. Let’s face it, many companies don’t even have decent CTI (putting their customers through ‘repeat info’ hell), and almost everyone has a tremendous backlog of such mundane projects as improving websites or phone apps, upgrading phone systems, or logistics integration that such forward-thinking integration isn’t on any immediate project plans.

With the fantastic growth of smartphones and tablets, speech recognition innovation is seeing a major increase in adoption (see my report on Mobile Voice conference). Free and super low-cost recognition options for mobile devices are becoming the norm. While these capabilities are powerful enablers for ‘voice search’ and ‘voice commands’, they will not replace sophisticated, conversational speech IVRs. Mobile voice apps are typically designed to interact with visual and touch interfaces.  In contrast, they are not designed for general purpose hands-free conversation, nor do they integrate with standard phone channels.

One key effect of this shift to mobile is that R&D in conventional IVR speech recognition (VXML-based) seems to be slowing to a crawl. I didn’t hear any new technology announcements in this area. This brings me to ‘the changing face of call centers’.

Several speakers mentioned that as support is being shifted from phone to the newer channels, call volumes are not actually dropping (yet). Many users are struggling with web and phone-based applications and end up using the phone for help. One example cited that 57% of calls into the call center were from users that had tried to self-serve via other modalities. Now, these problems are expected to diminish over the next few years as the new applications improve, and as users become more adept at using them. Total call center seats are expected to start declining slightly by around 2014.

Here’s the kicker: Simpler, more routine requests will self-serve via the web/ phone. Phone support will be increasingly for more complex calls.

The implications for IVRs are that unless they become more intelligent, they will struggle with the new types of support calls. True natural language IVRs ‘with a brain’ will be needed to achieve good automation rates and customer satisfaction.

0 Comments

Self-Service Rate Versus per Minute Cost

in IVR Technology

Self-Service Rate Versus per Minute Cost

Operational cost of an IVR, such as per minute charges for a hosted solution, are much less important than its self-service rate (i.e. the percentage of IVR calls that complete a key transaction without operator assistance.)  Note that this is quite apart from the crucial customer satisfaction benefits offered by a more effective IVR.

To illustrate the difference in savings I’ll compare a typical scenario across three different IVR types: touch tone, simple speech IVR, and intelligent, natural speech IVR.

Assumptions:  100,000 calls per month. Average call length is 3 minutes.  Fully loaded operator cost is $0.80 per active phone minute (from research showing an average per call cost in the United States of $4.80 for a 6 minute call).  Note that for this simple example additional factors such as transfer costs, savings from up-sells, save-the -sale, and ‘screen pops’ have not been taken into account (see my blog entry on ROI).

Touch-Tone IVR Simple Speech IVR Intelligent Speech IVR
Self-Service Rate 10% 20% 40%
Per Minute Cost 7 cents 12 cents 17 cents
IVR Cost (monthly) $ 2,100 $ 7,200 $ 20,400
Agent Savings (monthly) $ 24,000 $ 48,000 $ 96,000
Net Savings (annual) $ 262,800
$ 489,600
$ 907,200

Because IVR cost is only a small fraction of the cost of a live agent, increasing self-service rate has a much bigger impact than lowering per-minute cost. In the striking example below, even with a 42% higher per-minute cost, a self-service rate just 5 percent higher yields better overall cost savings.

Simple Speech IVR Intelligent Speech IVR
Self-Service Rate 20% 25%
Per Minute Cost 12 cents 17 cents
IVR Cost (monthly) $ 7,200 $ 12,750
Agent Savings (monthly) $ 48,000 $ 60,000
Net Savings (annual) $ 489,600
$ 567,000

There are many IVR applications where a speech IVR can more than double completion rates compared to a touch-tone IVR.  An intelligent speech IVR using natural language that is well designed and tuned can even do much better.  We have examples of Smart Call Agents providing completion rates 5, 10, or even 15 times better than conventional IVRs.

Using the best possible IVR technology not only provides by far the best savings (irrespective of per-minute cost) but it also optimizes customer satisfaction, first time resolution, and additional revenue.

0 Comments

Infinite ROI ?

in IVR Technology

Infinite ROI ?

Strictly speaking, ROI (Return on Investment) is the ratio of net gains (savings) over a specified period (usually a year) divided by the investment amount, expressed as a percentage.  For example, an investment of $100,000 that yields a net benefit of $50,000 per year has an annual ROI of 50%.  Speaking loosely , ROI may also refer to the investment payback period — e.g. two years in the example above.

What are key ROI factors for IVRs?

Potential investment expenses:

  • IVR Equipment, software and licenses
  • PBX upgrades
  • IVR design, configuration and testing
  • Interface and integration development

Factors that make up net operational savings

  • Reduced call center expenses (fewer operators, smaller size)
  • Increased revenue (save-the-sale, upsell, etc.)
  • Gains through better customer satisfaction
  • Less: IVR operational cost (including direct costs and maintenance)

Some in-the-cloud IVR solutions feature near zero investment costs (just minimal in-house IT time) — offering nearly ‘infinite’ ROI.  Here the ROI metric simply translates into just the dollar amount of (annual) savings — and the payback period is often measured in days!

IVRs requiring only insignificant investment also offer a much lower implementation risk (the risk that the system will not meet the expected return or damage customer loyalty).  The main consideration come down to actual monthly savings and customer satisfaction.  In a future post I’ll analyze how transaction cost and completion rate affects savings.

0 Comments

Measuring IVR Performance

in IVR Technology

Tricky.

IVR performance is generally very hard to compare, and sophisticated self-service applications are particularly hard to evaluate accurately. There isn’t just one “success rate”.

How can 27% of calls staying within the IVR be much worse than 18% ?  How can 30% call hang-ups be good?   Or 95% operator transfers be excellent?  See answers below, but first let’s start with a few definitions:

Containment Rate (CR) – Percentage of calls that do not end up with a live operator. Note that this includes all hang-ups (irrespective of whether the caller completed any or all tasks, simply hung up at the start, or the call just dropped).  This is one of the simplest measures, but also the crudest.  It does not differentiate between satisfied and irate callers.  Unfortunately, it is the most common measure.

Opt-Out Rate (OOR) - Percentage of calls where the caller either immediately asked for an operator, or simply hung up.  These are the people who simply ‘don’t want to play’, either because they ‘just hate these things’ (understandable) — or because something in the introduction turned them off or encouraged them to press ’0′ (like the explicit offer to do so!)

Task Completion Rate (TCR) - Percentage of specified tasks (or sub-tasks) that are completed without operator intervention. Properly done, this can be an excellent measure of success.  However, it is hard to meaningfully define all worthwhile sub-tasks, and to accurately measure their completion.  One key problem is knowing whether a hang up late in the task signifies success or not (see ‘Imponderables’ below).  Another issue: Was the caller’s intent correctly identified, or did they actually receive meaningless information?

BizRule Transfer Rate (BTR)Percentage of calls that are automatically transferred when the system come across a specified transfer condition. This can be as simple as automatically identifying a customer by caller ID, and transferring them because their account is flagged for ‘fraud’.  Or it can be as complex as successfully providing comprehensive order status information and changing several orders, before finally transferring to a live agent to complete an up-sell.  A BTR call will generally be considered a successful IVR transaction.  However, often these call types will suggest additional areas for automation.

Handled Rate (HR)Percentage of calls that were handled correctly given the design of the system. This is the complement of Error Rate (ER), or calls that did not complete in the IVR because of some error; including hang-ups and transfers due to misrecognition or confusion.  One can only get an approximate number for this because the caller’s reason for exiting the IVR will not always be apparent.

Imponderables

Several ‘imponderables’  confound efforts to accurately categorize all call outcomes.  These include:

Dropped Calls – One doesn’t know if these were deliberate or not.

Late Hang-Ups – Often one cannot determine if the caller hung up because he had completed the task, or because he gave up.

Operator After Finish – If the caller requests an operator after completing a task (such as order status), it is not obvious if the caller wants to perform additional functions not provided by the IVR (such as, for example, cancellations), or if the original task was not performed satisfactorily.  A good, integrated CRM system can help to answer such questions, but this task is not trivial.  Intelligent natural language call agents can also help by asking the caller for a reason for transfer, and creating a (text) transcript of the conversation.

Customer Containment Priority – A customer can decide how much they want to try to ‘contain’ their callers.  On the one hand one can have what I call easy ‘Courtesy Transfers’ – proactively offering operator assistance when the IVR detects that the caller is struggling.  On the other hand, the system can be designed to make it nearly impossible to reach an operator.

Another question is whether a caller who hangs up on the IVR will simply call back and somehow select an operator option, or resolve the issue by going to the web-site, or go to a competitor.  Analysis of repeat callers (from caller ID) can help to answer these questions.

Questions

Getting back to the questions at the beginning, a 27% containment rate can be much worse than an 18% rate if the performances are associated with task completion rates of 4% and 16% respectively.  In fact, a recent IVR replacement my company implemented improved the task completion rate from under 4% to 23% – almost a 6-fold improvement – while the containment rate only increased from 20% to 27%.  Early hang-ups declined from 20% to 4% (incidentally, in this case the relatively low containment rates are caused by very high business transfer rates).

Likewise, a 30% hang up rate can be good if these hang-ups occur near the end of a task – i.e., the caller has already received the information they needed.  Operator transfers of 95% will be regarded as ‘excellent’ if the IVR was to identify and verify the caller and their intent and then route the call to the appropriate live operator.

Handled Rate is probably the best simple measure of success. But there are no metrics that can be perfectly measured and meaningfully compared even within an application category, such as Order Status.  Success needs to be measured for each application based on requirements defined by the client.

As I said, “It’s tricky”.

2 Comments

Mobile Voice Conference

in IVR Technology

Last week I attended an informative Mobile Voice conference in San Jose. This year’s focus was on smart phones, multi-modal communication, and cloud-based services.

With smart phones projected to make up more than half of all mobile phones by 2013, an obvious topics was “Is speech self-service dead?” Unsurprisingly, my presentation at the conference concluded that ‘its death has been greatly exaggerated’. In short, there are many situations (walking, driving) and applications (conversational ) that don’t lend themselves well to screen/ keyboard communication. Also, often people lack the dexterity or visual acuity to comfortably use these smart phone features, or simply do not like using them.

On the flip side, companies are highly motivated to try to shift support from live agents to smart phone self-service. Potential cost savings are enormous, and for many applications customers may actually prefer (well-designed) mobile self-service. Participants on one panel discussing this trend remarked how many companies are rushing to implement such solutions oblivious of the cost, and more importantly, of metrics demonstrating their effectiveness. Key problems are that users are reluctant to download self-service apps, may not know how to use them, or may simply forget that they have them.

Speech innovations – The past year or two have shown significant innovations in speech technology and availability. Three key areas are:

o    On-demand speech recognition in the cloud.

o    Massive transaction volume of voice search and transcription has led to vast improvements in speech recognition.

o    Large user demand for ‘visual voicemail’ (auto transcription) plus voice command and search are further driving innovation.

These factors have combined to improve performance and drive down cost. However, one major constraint is the lack of standardization among Apple, Microsoft, Google/ Android, and Blackberry. These recognition technologies rely on voice capture at the (higher fidelity) handset end, plus recognition either by embedded recognizer or transmission to server-side processing. This has to be largely custom implemented for each application and platform.

Completion rate – A director at Nuance made an insightful comment along the lines of: “Some companies talk about 80 or 90% automation rates, but when you ask them how many people they employ in their call centers they talk about thousands…”. The point is that self-service rates are measured only against those applications that are automated. So why can’t we automate more? His answer was: limitations in natural language capability, lack of real understanding, no goal-directed behavior. I’d add memory, common-sense knowledge, meta-cognition, a comprehensive built-in library of general conversation skills and knowledge to the list of features missing in almost all speech IVR systems (see previous posts).

This brings me to another topic close to my heart: How to accurately track IVR performance. Measuring success is rather more complicated than it first appears. Maybe I’ll write about that soon…

0 Comments

Types of Speech IVR

in IVR Technology

The ability for the caller to respond to the IVR by voice in addition to pressing keys started becoming generally available in the mid 90s. While speech recognition promises vastly improved IVR coverage and convenience (e.g. hands-free operation), in general, this has not yet happened. Many speech IVR application suffer from limitation of the technology employed, as well as poor implementations. I’ll be saying more about these two issues later, but first an overview of currently available technology.

Nuance, the dominant supplier of speech IVR technology, offers three levels or ‘tiers’ of sophistication:

o    Tier 2 – Keyword recognition

o    Tier 3 – Key phrase recognition

o    Tier 4 – Open speech, or natural language, recognition

Keyword recognition provides for the recognition of simple words in response to a prompt. For example, “Please say sales, support or other to select a department”. In this case the system will only recognize and respond to a very limited list of words. Unfortunately, too often Tier 2 speech recognition are simply added to existing touch-tone systems, making for very awkward operation — long and tedious menus such as “Please listen carefully as our menu options have recently changed. Press 1 or say sales for sales. Press 2 or say support for support. Press 3 …”. It is not uncommon for these systems to have 6 or even 9 menu options per level!

Key phrase systems can recognize a range of phrases to communicate a response. Depending on the particular implementation this can be very limited, such as just recognizing ‘I want sales’, or quite sophisticated as in ‘When does United flight 263 arrive?’  These system require a substantial amount of design, development, and testing to get them to operate well. While Tier 3 offers a more natural and more expressive mode of communication this flexibility can actually impair performance. A problem common to all speech recognition systems is that, generally, the larger the recognized vocabulary, the lower the accuracy. There are some strategies to counter this undesirable trend.

Natural language speech recognition, as the name implies, theoretically allows for the user to communicate in normal English (or other language). For example: ‘How can I help you?’, ‘How would you like to change your order?’, ‘When would you like the appointment?’ or ‘What have you already tried to get your computer to work?’. This technology differs markedly from keyword/ phrase recognition. In ‘open speech’ at least three additional key issues need to be handled:

1.       Training the system with the large list of all possible words that need to be recognized, plus specifying the frequency of common word-pairs and phrases.

2.       Having speech recognition technology that can effectively recognize complete sentences.

3.       Making sense of the recognized sentences – i.e. extracting the meaning in order to correctly respond to user input.

At this time, with a few notable exceptions, natural language systems are employed to only perform call routing – i.e. ‘How can I help you?’. The key reason for this is the lack of commonly available technology to cost-effectively solve the third issue mentioned above. An effective way to deal with this problem, as well as other crucial needs for advanced IVRs, is to power the IVR with an Artificial Intelligence engine.

In summary, while speech IVRs promise significant advances in call automation, the current reality is that because of technological and implementation limitations, many of today’s systems perform below expectations. Indeed, it is not uncommon for speech IVR projects to be abandoned because of cost overruns, poor performance, and customer dissatisfaction.

0 Comments

What is an IVR?

in IVR Technology

IVR (Interactive Voice response) is a rather confusing term. It is applied to a wide range of phone automation systems — starting from simple touch-tone call routing to, say, highly sophisticated, natural language (plain English) trouble-shooting agents. Let’s look at the key categories in a simple matrix:

Function Touch-Tone Key-word or phrase Natural Language
Call Routing ‘Press 1 for sales, 2 for support’ ‘Say sales or support’ ‘How may direct your call?’
Self-Service ‘Enter order number to get status’ ‘What is your order number?’ ‘Which order are you calling about?’

Call Routing or ACD (Automated Call Distribution) allows the caller to select a department, function, or person at the beginning of the call. Touch-tone, or key-press selection (also called DTMF) is by far the most common method, though spoken selection is now becoming more common. Both of these approaches usually involve navigating a series  of menus. This can be confusing and tedious. A significantly more convenient option is to provide natural language recognition. Here you simply tell the system what you want. This is great, provided the system has been implemented correctly! More about this later.

Information Gathering lets the IVR obtain certain routine information about the caller and their request before the call is handed to a live agent. This may include indentifying (and verifying) the caller, their account, order, product or serial number, and demographic details. This information is then passed (‘screen popped’) to the operator who handles the actual request. This can shave from 25 seconds to several minutes off live handling time, providing substantial cost savings.

Self-Service refers to an automated service where the IVR handles a complete caller request. This includes functions such as:

  • Providing information – e.g. account balance, order status, flight status, product information or availability, FAQ, etc.
  • Placing, cancelling, changing, and returning orders
  • Managing appointments & bookings
  • Managing work-orders, technicians & jobs
  • Surveys and status updates (such as work or school illness reporting)
  • Product troubleshooting, website support

Some of the above functions can be handled by outbound IVRs – i.e. IVRs that initiate the call to the (potential) customer.

Self-service is generally the most challenging application, and only the simplest calls can be handled by touch-tone IVRs. Next I’ll look at speech IVR technology.

0 Comments

The Future of IVR: Smart Call Agents

in IVR Technology

As discussed previously, current IVR approaches seem to have reached a dead-end. They don’t handle natural language (except for call routing/ ‘how can I help you?’), are rigidly scripted, and don’t have any built-in conversation knowledge or skills. Furthermore, development costs rise exponentially with complexity. These factors combine to severely limit their range of applications and success rate. They are not very smart — as any user of current systems will tell you.

A new breed of IVR are starting to appear on the scene. They are so different from conventional technology that they probably deserve a new name: ‘Smart Call Agents’. The key difference lies in their conversations being managed by an artificial intelligence engine.

IVR based on an A.I. engine, designed specifically for the job, offer several crucial advantages:

o  Generic, advanced conversation skills are available ‘out of the box’, dramatically improving functionality while reducing development time and cost

o  Libraries of common knowledge and natural language are inherently available to the Virtual Agent

o  Conversations are more intelligent as the system uses goals, context and memory

o  Database and live operator integration is simplified by the A.I engine’s built-in capabilities

o  Improvements to core A.I. engine improves all existing Smart Agents!

Particular vertical markets or functions, such as insurance or appointment scheduling, will require specialized knowledge and skills. Custom Smart Agents can be developed that satisfy these specialized requirements. However, even in these cases many capabilities will be cumulative and shared between individual applications.

It is hard to see how IVR technology can improve without adding intelligence to the equation.

0 Comments

Brain-dead IVRs

in IVR Technology

My recent visit to the premier annual speech IVR conference, SpeechTEK, confirmed just how stagnant speech IVR development is. Last year, in 2009, the buzz was all about speaker verification (voice prints), analytics, and multi-modal customer service (web, voice, IM, email, smart-phone); this year mobile phones replaced the speaker verification focus. In spite of this year’s keynote being about artificial intelligence (A.I.), it did not address the issue of integrating artificial intelligence with IVR. Furthermore, only three conference session dealt in any way with ‘advanced dialog’ – i.e. IVR systems that offer intelligent, free-form, open conversation.

The key problem is that the big IVR & speech recognition vendors’ R&D is not at all focused on making IVRs more intelligent and capable. We are essentially stuck with 1990′s technology: fully scripted, directed dialog.

The two fundamental approaches to managing IVR conversations are ‘directed’ versus ‘advanced’ or ‘open’ dialog. Directed dialog is the conventional approach used in almost all commercial IVRs. Here, each step of the conversation is explicitly scripted and responses are limited to pre-defined keywords and phrases. Advanced dialog systems, in contrast, are goal directed and only loosely scripted. In addition, they understand natural language (within the design context) and allow the user to interrupt, ask questions, and change the topic. In short, they have a good measure of intelligence.

Here are some of the cognitive features one would expect of an Advanced Dialog system:

o Built-in general knowledge and conversation skills

o Natural language understanding – understanding the meaning of sentences, and responding appropriately

o Short-term and long-term memory

o Goal-directed, contextual responses

o Ability to interrupt and change topic

o Meta-cognition: gauging the progress and mood of the conversation, and adjusting accordingly

Looks like we need IVRs with a brain!

That’s what I’ll talk about next…

0 Comments