Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Along with applying categorization rules to a conversation transcript, Conversation Analyzer applies substitution rules to refine the output. Substitution rules replace words that are often incorrectly transcribed and improve the spelling of words. You will most likely require these rules for proper nouns, such as place, company or product names. For example, Conversation Analyzer may transcribe 'Basingstoke' as 'Beijing spoke'. Create rules that replace the incorrect word or words. Substitution rules also replace sensitive information such as credit card details—you details — you can, for example, replace specified text with text such as '(redacted)', '(removed)', or 'xxxxxxxxxxxxxx'.

...

In the following example, the categorization profile—profile — SubstitutionRules—contains  — contains three substitution rules.

...

The categorization expression language describes the format of the value in the Search phrase field. The language supports simple values where the presence of the exact word or phrase would result in a match. For information about writing expressions, see Categorization expression language in Categorization rules.

Replace with

When creating or editing a substitution rule, define the value that will replace the found text in the Replace with field. 

...

Expand
titleExamples of overlapping rules


Info

Example 1. We want to replace "credit card" with "payment method" and remove credit card number.

Transcription text:

My credit card is 1234567890123456

Substitution rules:

Rule 1:

Search phrase: credit card
Replace with: payment method

Rule 2:

Search phrase: credit card #*
Words between: 5
Replace with: (credit card information redacted)

Intended text:

My (credit card information redacted)

Processed text:

My payment method is 1234567890123456

Why:

Rules 1 and 2 overlap. In this scenario, Conversation Analyzer applies rule 1—because 1 — because rule 1 has higher priority—and priority — and discards rule 2. The result is that the credit card number is still exposed

Solution:

Redact first, substitute after.


Info

Example 2. We want to remove all strings of three or more numbers because they can contain sensitive information. However, we want to label PIN numbers differently to credit card numbers.

Transcription text:

My PIN is 1234

Substitution rules:

Rule 1:

Search phrase: ###*
Replace with: (redacted)

Rule 2:

Search phrase: credit card ################
Words between: 5
Replace with: (credit card has been redacted)

Rule 3:

Search phrase: PIN ####
Words between: 5
Replace with: (PIN has been redacted)

Intended text:

My (PIN has been redacted)

Processed text:

My PIN is (redacted)

Why: 

Rules 1 and 3 overlap. In this scenario, Conversation Analyzer applies rule 1—because 1 — because rule 1 has higher priority—and priority — and discards rule 3. The result is that instead of applying the more specific rule "(PIN has been redacted)", we applied the more general one.

Solution:

Write more specific rules first, followed by more general—catch-all—rules general — catch-all — rules later.


Info

Example 3. Due to the highly sensitive nature of passwords, we want to remove user account names, and wipe out the whole text containing password.

Transcription text:

My account name is administrator and my password is Jupiter, with upper case J

Substitution rules:

Rule 1:

Search phrase: account name is *
Replace with: (account name redacted)

Rule 2:

Search phrase: * password *
Replace with: (password redacted)

Intended text:

My (account name redacted) and (password redacted)

Processed text:

My (account name redacted)

Why:

In this scenario, Conversation Analyzer applies rule 1, because rule 1 has higher priority than rule 2. In removing the account name, the whole of the password text is removed too. Rule 2 does not match the remaining text.

Solution: 

Write your rules in order of most sensitive to least sensitive. Avoid using operators like * and ~ as much as possible. 


Info

Example 4. For a dogwalking service, we want to improve the transcription with more accurate, business-related words.

Transcription text:

I have a big hunting dog

Substitution rules:

Rule 1:

Search phrase: big hunting dog
Replace with: hound

Rule 2:

Search phrase: I have * dog
Replace with: I am a dog owner

Rule 3:

Search phrase: have
Replace with: look after

Processed text:

I look after a hound

Why:

In this scenario, Conversation Analyzer applies rule 1. Rule 2 overlaps rule 1 so Conversation Analyzer discards rule 2. Rule 3 overlaps rule 2 only, but because Conversation Analyzer has discarded rule 2, rule 3 can be applied.

Solution:

Write your substitution rules in order of importance.


...