Categorization rules

As part of transcribing recordings, Conversation Analyzer categorizes the textual contents of the transcript, by identifying key phrases based on the defined rules, and recording the subcategory or category those rules belong to. A category is a collection of subcategories, which in turn contain a series of rules. Each rule consists of a word or phrase and the party who said that word or phrase. If the transcript contains the word or phrase and was spoken by the specified party, Conversation Analyzer matches it against the category.

For example, you may want to track how polite your agents are when speaking with customers. Create a category of 'Politeness' that contains subcategories with rules that look for phrases such as 'Please', 'Thank you' and 'You're welcome'. You may also want to ensure that agents are promoting a new product or service. You would need to create a category for the product or service with subcategories identifying incidences of the agent using terms relating to the product or service. For information on how to create a categorization rule, see Managing categorization rules.

In this page

Categorization expression language

The categorization expression language describes the required format of the values you provide in the Expression and Find fields in Category Editor when creating categorization and substitution rules. Conversation Analyzer can then use these values to locate matching text in the transcripts. For more information, see Managing categorization rules and Managing substitution rules

Expression and Find value validation

Valid Expression and Find field values contain only alphanumeric, apostrophe and space characters; that is, values can contain spaces (U+0020), apostrophes (U+0027), and characters from the following Unicode categories:

Unicode Category Name
Description
Ll

Letter, Lowercase.

For example, a-z, ᵯ, ḅ, ṥ, ở, ﬓ

Lu

Letter, Uppercase.

For example, A-Z, Ý, Ŧ, Ǣ, Щ, 𝕐

LtLetter, Titlecase.

For example, Dž, ᾎ, ᾟ, ᾭ

Lo

Letter, Other (e.g. ª, ܗ, 爨)

The Mongolian Letter "Manchu Ali Gali Lha" (U+18AA,) is not allowed within expression and find values. This character is used internally within the categorisation engine. If the character appears within spoken text, Conversation Analyzer treats the character as an apostrophe.

LmLetter, Modifier.

For example, ʰ, ᵓ, 〲, ꟹ

MnMark, Nonspacing.

For example, ុ, ᜴

NdNumber, Decimal Digit.

For example, 0-9, ۳, ૮, ๗

Pc

Punctuation, Connector.

For example, _, ‿, ⁀, ⁔, ︳, ︴, ﹍, ﹎, ﹏, _

This category includes ten characters; the most commonly used is the LOW LINE character (_), u+005F.

Values can be no more than 100 characters long.

Replace by value validation

Values can be no more than 64 characters long.

Wildcards in values

The categorization expression language supports the following wildcards within the values. Examples refer to the Expression field you fill in when creating categorization rules, but exactly the same rules apply to the Search phrase field in substitution rules.

Wildcard
Description
Example expressions
Details
?Wildcard representing one character
Each ? represents one character.
wh?

The following words will match the example expression: "who" and "why". For an example of an expression using the ? wildcard, see Example 2. Expression using the ? character wildcard.

wh??The following words will match the example expression: "what", "when", "whom". For an example of an expression using the ?? wildcard, see Example 5. Expression using the ?? wildcard.
* Wildcard representing zero to many characterssit*

The following words will match the example expression: "sit", "sits", "sitting". For an example of an expression using the * wildcard, see Example 3. Expression using the * character wildcard.

To use * to represent a character or characters, ensure that the * is contiguous with the characters in the containing word.

You can also use * to represent a word or words. For information, see Wildcard representing zero to many words.

#Wildcard representing one numeric character ###

Only digits will match the example expression, not text.

Text containing "123" will match the example expression but text containing "one two three" will not.

For an example of an expression using the # wildcard, see Example 4. Expression using the # character wildcard.

* Wildcard representing zero to many wordscat * mat

The following phrases will match the example expression: "cat mat", "cat sits on the mat", and "cat always sits happily on the mat".

For an example of an expression using the * wildcard, see Example 6. Expression using the * word wildcard.

To use * to represent a word or words, type a space between the * and any other characters in the expression.

You can also use * to represent a character or characters. For information, see Wildcard representing zero to many characters.

Words between value

The Words between field is available when creating categorization rules or substitution rules. It represents the number of words that can appear between the specified words in a phrase. If set to a value different than 0, the ~N expression appears at the end of the rule name in the profile tree.

If the expression contains more than two words, the Words between value applies to the number of words between any of the specified words.

See below for examples.

Expression examples 

Example 1. Simple expression

Expression: the cat sat

With a simple expression, only the exact word or phrase will satisfy the rule.

Example 2. Expression using the ? character wildcard

Expression: the cat? sat

The ? in the expression represents a single character that must appear after "cat" but before "sat" in matching text.

Text
Does it match?
Explanation
the cat satNoThe ? in the expression requires a character in its place.
the cats satYes

The ? in the expression represents the "s" in the text.

their cats satNo

The expression does not allow any additional characters after "the".

Example 3. Expression using the * character wildcard

Expression: sit*

The * in the expression represents zero to many characters that can appear after "sit" in matching text.

Text
Does it match?
Explanation
sitYesThe * in the expression requires zero to many characters in its place.
sitsYes

The * in the expression represents the "s" in the text.

sittingYes

The * in the expression represents the "ting" in the text.

satNoThe expression requires that "sit" appears in the text.

Example 4. Expression using the # character wildcard

Expression: ### ###

Matching text must contain two sets of three digits, separated by a non-word character and no other characters.

Text
Does it match?
Explanation
123 456YesThe expression matches two sets of three digits.
123-456YesThe expression matches two sets of three digits. The hyphen is a non-word character and separates the two sets of three digits.
123456NoThe expression requires two sets of three digits, not one set of six.
123 abc 456NoThe expression requires two consecutive sets of three digits, not two sets separated by any other characters.

Example 5. Expression using the ?? wildcard

Expression: wh?? cat

The ?? in the expression represents two characters must appear after "wh" and before "cat" in matching text.

Text
Does it match?
Why
what catYes

The ?? in the expression represents the "at" in the text.

when catYesThe ?? in the expression represents the "en" in the text.
who catNoThe ?? in the expression requires two characters after "wh" not one.
which catNo

The ?? in the expression only represents two characters after "wh" not three.

Example 6. Expression using the * word wildcard

Expression: the cat sits * on the mat

The text must contain the phrase "the cat sits on the mat" with zero to many words between "sits" and "on".

Text
Does it match?
Why
the cat sits on the matYes

The * in the expression requires zero to many words in its place.

the cat sits happily on the matYes

The * in the expression represents "happily" in the text.

the cat always sits on the matNo

The * in the expression appears after "sits", not before.

Example 7. Expression using the Words between field

Expression: cat mat

Words between: 3

The text must contain the words "cat" and "mat" with up to three words between them.

Text
Does it match?
Why
the cat matYesThe text contains no words between "cat" and "mat" and the expression allows up to three.
the cat likes matYesThe text contains one word between "cat" and "mat", and the expression allows up to three.
the cat sits on the matYesThe text contains three words between "cat" and "mat", and the expression allows up to three.
the cat always sits happily on the matNoThe text contains five words between "cat" and "mat", but the expression only allows up to three.

Example 8. Expression using the Words between field

Expression: cat sat mat

Words between: 3

The text must contain the words "cat", "sat" and "mat" with up to three words between each of them. In this example, matching text may contain three words between "cat" and "sat" and also three words between "sat" and "mat".

Text
Does it match?
Why
the cat eagerly sat on the matYesThe text contains one word between "cat" and "sat", and two words between "sat" and "mat"; the expression allows up to three.
the cat eagerly and promptly sat on the green matYesThe text contains three words between "cat" and "sat", and three words between "sat" and "mat"; the expression allows up to three.
the cat sat on the green and blue matNoThe text contains too many words (five) between "sat" and "mat".
Support and documentation feedback

For general assistance, please contact Customer Support.

For help using this documentation, please send an email to docs_feedback@vonage.com. We're happy to hear from you. Your contribution helps everyone at Vonage! Please include the name of the page in your email.