Substitution rules

Along with applying categorization rules to a conversation transcript, Conversation Analyzer applies substitution rules to refine the output. Substitution rules replace words that are often incorrectly transcribed and improve the spelling of words. You will most likely require these rules for proper nouns, such as place, company or product names. For example, Conversation Analyzer may transcribe 'Basingstoke' as 'Beijing spoke'. Create rules that replace the incorrect word or words. Substitution rules also replace sensitive information such as credit card details — you can, for example, replace specified text with text such as '(redacted)', '(removed)', or 'xxxxxxxxxxxxxx'.

In this page

Example categorization profile (substitution rules only)

In the following example, the categorization profile — SubstitutionRules — contains three substitution rules.

Expression

When creating or editing a substitution rule, define the value you want to replace in the Search phrase field. The value defines the text that must appear in the transcript to match the substitution rule.

The categorization expression language describes the format of the value in the Search phrase field. The language supports simple values where the presence of the exact word or phrase would result in a match. For information about writing expressions, see Categorization expression language in Categorization rules.

Replace with

When creating or editing a substitution rule, define the value that will replace the found text in the Replace with field.

Applying substitution rules results in Conversation Analyzer modifying transcript text. Because of this, you must take extra care when writing your rules.

Overlapping substitution rules

Overlapping occurs when more than one rule matches the same transcript text. Because substitution rules actually modify the transcript text, overlapping rules can cause a conflict whereby multiple rules try to replace text with different values. To handle overlapping, Conversation Analyzer uses the following logic when applying the rules:

The order of the rules in the profile determine their priority; the first rule has the highest priority.
If rules overlap, the higher priority rule takes precedence over the lower priority. The lower priority rule is discarded.
A discarded rule does not block any other lower priority rules.

Examples of overlapping rules

Example 1. We want to replace "credit card" with "payment method" and remove credit card number.

Transcription text:

My credit card is 1234567890123456

Substitution rules:

Rule 1:

Search phrase: credit card
Replace with: payment method

Rule 2:

Search phrase: credit card #*
Words between: 5
Replace with: (credit card information redacted)

Intended text:

My (credit card information redacted)

Processed text:

My payment method is 1234567890123456

Why:

Rules 1 and 2 overlap. In this scenario, Conversation Analyzer applies rule 1 — because rule 1 has higher priority — and discards rule 2. The result is that the credit card number is still exposed

Solution:

Redact first, substitute after.

Example 2. We want to remove all strings of three or more numbers because they can contain sensitive information. However, we want to label PIN numbers differently to credit card numbers.

Transcription text:

My PIN is 1234

Substitution rules:

Rule 1:

Search phrase: ###*
Replace with: (redacted)

Rule 2:

Search phrase: credit card ################
Words between: 5
Replace with: (credit card has been redacted)

Rule 3:

Search phrase: PIN ####
Words between: 5
Replace with: (PIN has been redacted)

Intended text:

My (PIN has been redacted)

Processed text:

My PIN is (redacted)

Why:

Rules 1 and 3 overlap. In this scenario, Conversation Analyzer applies rule 1 — because rule 1 has higher priority — and discards rule 3. The result is that instead of applying the more specific rule "(PIN has been redacted)", we applied the more general one.

Solution:

Write more specific rules first, followed by more general — catch-all — rules later.

Example 3. Due to the highly sensitive nature of passwords, we want to remove user account names, and wipe out the whole text containing password.

Transcription text:

My account name is administrator and my password is Jupiter, with upper case J

Substitution rules:

Rule 1:

Search phrase: account name is *
Replace with: (account name redacted)

Rule 2:

Search phrase: * password *
Replace with: (password redacted)

Intended text:

My (account name redacted) and (password redacted)

Processed text:

My (account name redacted)

Why:

In this scenario, Conversation Analyzer applies rule 1, because rule 1 has higher priority than rule 2. In removing the account name, the whole of the password text is removed too. Rule 2 does not match the remaining text.

Solution:

Write your rules in order of most sensitive to least sensitive. Avoid using operators like * and ~ as much as possible.

Example 4. For a dogwalking service, we want to improve the transcription with more accurate, business-related words.

Transcription text:

I have a big hunting dog

Substitution rules:

Rule 1:

Search phrase: big hunting dog
Replace with: hound

Rule 2:

Search phrase: I have * dog
Replace with: I am a dog owner

Rule 3:

Search phrase: have
Replace with: look after

Processed text:

I look after a hound

Why:

In this scenario, Conversation Analyzer applies rule 1. Rule 2 overlaps rule 1 so Conversation Analyzer discards rule 2. Rule 3 overlaps rule 2 only, but because Conversation Analyzer has discarded rule 2, rule 3 can be applied.

Solution:

Write your substitution rules in order of importance.

Chaining substitution rules

Chaining occurs when one rule matches the output of another rule. Chaining only occurs when you re-analyze a recording. For information about re-analyzing recordings, see Analyzing a call recording.

Each time Conversation Analyzer applies substitution rules to a transcript, Conversation Analyzer overwrites the original transcript with the processed text. Rerunning the substitution rules can therefore further refine the text.

Example of chaining rules

Example: Simple case to illustrate chaining.

Original transcript text:

I have a dog

Substitution rules:

Rule 1:

Search phrase: dog
Replace with: big cat

Rule 2:

Search phrase: cat
Replace with: mouse

Processed text:

I have a big cat

Reprocessed text:

I have a big mouse

Why:

Rule 2 matches part the output of rule 1. On the initial processing, Conversation Analyzer applies rule 1. Conversation Analyzer overwrites the original text with the replaced text. On reprocessing, Conversation Analyzer applies rule 2.

Solution:

Write rules so that they don't apply to the output of each other to avoid chaining.

Highlighting replaced text

After Conversation Analyzer has processed a transcript, substituting or redacting text as your rules require, you are unable to see what has changed. If you want to see where in the transcript Conversation Analyzer, for example, removed text, create a category that highlights the replaced text.

If you substitute text with characters that are not valid in Expression values, you will not be able to create a categorization rule to highlight the text. For example, if you create a substitution rule that replaces account numbers with *********, a categorization rule with Expression: ********* will be invalid.

Example of highlighting replaced text

Example: We want to see where account numbers have been removed from the transcript.

Original transcript text:

My account number is 1234567890123456

Substitution rule:

Search phrase: ################
Replace with: **** **** **** ****

Processed text:

My account number is **** **** **** ****

Categorization rule:

Category name: Replaced text
[...]
Expression: **** **** **** ****

In your transcript, the replaced text is highlighted within the Replaced text category.

In this section

Managing substitution rules