Skip to content

Commit 41f14fd

Browse files
authored
Merge pull request #47 from vcon-dev/bee-feelings
Bee feelings
2 parents 464f28c + 79033bb commit 41f14fd

4 files changed

Lines changed: 904 additions & 0 deletions

File tree

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# Analyze and Label Link
2+
3+
## Overview
4+
5+
The `analyze_and_label` link is a powerful component of the vCon server that automatically analyzes dialog content and generates relevant labels/tags for categorization. It uses OpenAI's language models to process various dialog formats (transcripts, messages, chats, emails) and extract meaningful labels that are then applied as tags to the vCon.
6+
7+
## How It Works
8+
9+
1. The link retrieves a vCon from Redis storage
10+
2. For each dialog in the vCon, it checks if a source analysis (typically of type "transcript") is present
11+
3. It extracts the text content from the source analysis (from the specified location in the configuration)
12+
4. It sends the text to OpenAI's API with a customizable prompt
13+
5. It processes the API response to extract labels
14+
6. It adds the analysis as a new analysis object to the vCon
15+
7. It applies each extracted label as a tag to the vCon
16+
17+
## Supported Dialog Formats
18+
19+
The link is designed to handle various text formats that might appear in dialogs, including:
20+
21+
- **Standard Transcripts**: Plain text transcripts of conversations
22+
- **Email Format**: Text with headers, subject, body, etc.
23+
- **Chat Format**: Text with timestamps and speaker identification
24+
- **Message Format**: Text with headers and body
25+
26+
The link is able to intelligently process these different formats and extract appropriate labels regardless of the format.
27+
28+
## Configuration Options
29+
30+
The link accepts the following configuration options:
31+
32+
| Option | Description | Default |
33+
|--------|-------------|--------|
34+
| `prompt` | The prompt sent to OpenAI for analysis | "Analyze this transcript and provide a list of relevant labels for categorization..." |
35+
| `analysis_type` | The type assigned to the analysis output | "labeled_analysis" |
36+
| `model` | The OpenAI model to use | "gpt-4-turbo" |
37+
| `sampling_rate` | Rate at which to run the analysis (1 = 100%, 0.5 = 50%, etc.) | 1 |
38+
| `temperature` | The temperature parameter for the OpenAI API | 0.2 |
39+
| `source.analysis_type` | The type of analysis to use as source | "transcript" |
40+
| `source.text_location` | The JSON path to the text within the source analysis | "body.paragraphs.transcript" |
41+
| `response_format` | Format specification for the OpenAI API response | `{"type": "json_object"}` |
42+
| `OPENAI_API_KEY` | The OpenAI API key (required but not defined in defaults) | None |
43+
44+
## Usage Example
45+
46+
```python
47+
from server.links.analyze_and_label import run
48+
49+
# Run with default options (requires OPENAI_API_KEY in the options)
50+
run(
51+
vcon_uuid="your-vcon-uuid",
52+
link_name="analyze_and_label",
53+
opts={
54+
"OPENAI_API_KEY": "your-openai-api-key",
55+
# Optionally override other defaults
56+
"prompt": "Identify key topics, sentiments, and issues in this conversation. Return your response as a JSON object with a single key 'labels' containing an array of strings.",
57+
"model": "gpt-3.5-turbo"
58+
}
59+
)
60+
```
61+
62+
## Customizing Label Generation
63+
64+
You can customize the label generation process by modifying the `prompt` parameter. The prompt should instruct the model to return labels in a specific format - a JSON object with a "labels" key containing an array of strings.
65+
66+
Example specialized prompts:
67+
68+
- **Support Issues**: "Analyze this transcript and identify the specific support issues mentioned. Return your response as a JSON object with a single key 'labels' containing an array of issue categories."
69+
- **Sentiment Analysis**: "Analyze this conversation and identify the customer's sentiments and emotional states. Return your response as a JSON object with a single key 'labels' containing an array of sentiment descriptors."
70+
- **Product Mentions**: "Identify all products or services mentioned in this transcript. Return your response as a JSON object with a single key 'labels' containing an array of product names."
71+
72+
## Error Handling
73+
74+
The link includes robust error handling:
75+
76+
- Exponential backoff retry mechanism for API calls
77+
- JSON parsing error handling
78+
- Logging of errors and performance metrics
79+
80+
## Testing
81+
82+
The link includes comprehensive tests for all functionality. To run the tests with actual OpenAI API calls (optional):
83+
84+
```bash
85+
# Set environment variables
86+
export OPENAI_API_KEY="your-api-key"
87+
export RUN_OPENAI_ANALYZE_LABEL_TESTS=1
88+
89+
# Run the tests
90+
pytest server/links/analyze_and_label/tests/test_analyze_and_label.py
91+
```
92+
93+
Without setting `RUN_OPENAI_ANALYZE_LABEL_TESTS=1`, tests will run with mocked API responses.
94+
95+
## Metrics and Monitoring
96+
97+
The link emits several metrics for monitoring:
98+
99+
- `conserver.link.openai.labels_added`: Number of labels added per run
100+
- `conserver.link.openai.analysis_time`: Time taken for analysis
101+
- `conserver.link.openai.json_parse_failures`: Count of JSON parsing failures
102+
- `conserver.link.openai.analysis_failures`: Count of overall analysis failures
103+
104+
## Integration with vCon Structure
105+
106+
The link integrates with the vCon structure in two ways:
107+
108+
1. It adds a new analysis object with the `labeled_analysis` type (or the configured type)
109+
2. It adds tags to the vCon based on the extracted labels
110+
111+
This allows for both structured access to the full analysis and quick filtering/categorization using the applied tags.
Lines changed: 211 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,211 @@
1+
from lib.vcon_redis import VconRedis
2+
from lib.logging_utils import init_logger
3+
import logging
4+
import json
5+
from openai import OpenAI
6+
from tenacity import (
7+
retry,
8+
stop_after_attempt,
9+
wait_exponential,
10+
before_sleep_log,
11+
) # for exponential backoff
12+
from lib.metrics import init_metrics, stats_gauge, stats_count
13+
import time
14+
from lib.links.filters import is_included, randomly_execute_with_sampling
15+
16+
init_metrics()
17+
18+
logger = init_logger(__name__)
19+
20+
default_options = {
21+
"prompt": "Analyze this transcript and provide a list of relevant labels for categorization. Return your response as a JSON object with a single key 'labels' containing an array of strings.",
22+
"analysis_type": "labeled_analysis",
23+
"model": "gpt-4-turbo",
24+
"sampling_rate": 1,
25+
"temperature": 0.2,
26+
"source": {
27+
"analysis_type": "transcript",
28+
"text_location": "body.paragraphs.transcript",
29+
},
30+
"response_format": {"type": "json_object"}
31+
}
32+
33+
34+
def get_analysis_for_type(vcon, index, analysis_type):
35+
for a in vcon.analysis:
36+
if a["dialog"] == index and a["type"] == analysis_type:
37+
return a
38+
return None
39+
40+
41+
@retry(
42+
wait=wait_exponential(multiplier=2, min=1, max=65),
43+
stop=stop_after_attempt(6),
44+
before_sleep=before_sleep_log(logger, logging.INFO),
45+
)
46+
def generate_analysis_with_labels(transcript, prompt, model, temperature, client, response_format) -> dict:
47+
messages = [
48+
{"role": "system", "content": "You are a helpful assistant that analyzes text and provides relevant labels."},
49+
{"role": "user", "content": prompt + "\n\n" + transcript},
50+
]
51+
52+
response = client.chat.completions.create(
53+
model=model,
54+
messages=messages,
55+
temperature=temperature,
56+
response_format=response_format
57+
)
58+
59+
return response.choices[0].message.content
60+
61+
62+
def run(
63+
vcon_uuid,
64+
link_name,
65+
opts=default_options,
66+
):
67+
module_name = __name__.split(".")[-1]
68+
logger.info(f"Starting {module_name}: {link_name} plugin for: {vcon_uuid}")
69+
merged_opts = default_options.copy()
70+
merged_opts.update(opts)
71+
opts = merged_opts
72+
73+
vcon_redis = VconRedis()
74+
vCon = vcon_redis.get_vcon(vcon_uuid)
75+
76+
if not is_included(opts, vCon):
77+
logger.info(f"Skipping {link_name} vCon {vcon_uuid} due to filters")
78+
return vcon_uuid
79+
80+
if not randomly_execute_with_sampling(opts):
81+
logger.info(f"Skipping {link_name} vCon {vcon_uuid} due to sampling")
82+
return vcon_uuid
83+
84+
client = OpenAI(api_key=opts["OPENAI_API_KEY"], timeout=120.0, max_retries=0)
85+
source_type = navigate_dict(opts, "source.analysis_type")
86+
text_location = navigate_dict(opts, "source.text_location")
87+
88+
for index, dialog in enumerate(vCon.dialog):
89+
source = get_analysis_for_type(vCon, index, source_type)
90+
if not source:
91+
logger.warning("No %s found for vCon: %s", source_type, vCon.uuid)
92+
continue
93+
source_text = navigate_dict(source, text_location)
94+
if not source_text:
95+
logger.warning("No source_text found at %s for vCon: %s", text_location, vCon.uuid)
96+
continue
97+
analysis = get_analysis_for_type(vCon, index, opts["analysis_type"])
98+
99+
# See if it already has the analysis
100+
if analysis:
101+
logger.info(
102+
"Dialog %s already has a %s in vCon: %s",
103+
index,
104+
opts["analysis_type"],
105+
vCon.uuid,
106+
)
107+
continue
108+
109+
logger.info(
110+
"Analysing dialog %s with options: %s",
111+
index,
112+
{k: v for k, v in opts.items() if k != "OPENAI_API_KEY"},
113+
)
114+
start = time.time()
115+
try:
116+
# Get the structured analysis with labels
117+
analysis_json_str = generate_analysis_with_labels(
118+
transcript=source_text,
119+
prompt=opts["prompt"],
120+
model=opts["model"],
121+
temperature=opts["temperature"],
122+
client=client,
123+
response_format=opts.get("response_format", {"type": "json_object"})
124+
)
125+
126+
# Parse the response to get labels
127+
try:
128+
analysis_data = json.loads(analysis_json_str)
129+
labels = analysis_data.get("labels", [])
130+
131+
# Add the structured analysis to the vCon
132+
vendor_schema = {}
133+
vendor_schema["model"] = opts["model"]
134+
vendor_schema["prompt"] = opts["prompt"]
135+
vCon.add_analysis(
136+
type=opts["analysis_type"],
137+
dialog=index,
138+
vendor="openai",
139+
body=analysis_json_str,
140+
encoding="json",
141+
extra={
142+
"vendor_schema": vendor_schema,
143+
},
144+
)
145+
146+
# Apply each label as a tag
147+
for label in labels:
148+
vCon.add_tag(tag_name=label, tag_value=label)
149+
logger.info(f"Applied label as tag: {label}")
150+
151+
stats_gauge(
152+
"conserver.link.openai.labels_added",
153+
len(labels),
154+
tags=[f"analysis_type:{opts['analysis_type']}"],
155+
)
156+
157+
except json.JSONDecodeError as e:
158+
logger.error(f"Failed to parse JSON response for vCon {vcon_uuid}: {e}")
159+
stats_count(
160+
"conserver.link.openai.json_parse_failures",
161+
tags=[f"analysis_type:{opts['analysis_type']}"],
162+
)
163+
# Add the raw text anyway as the analysis
164+
vCon.add_analysis(
165+
type=opts["analysis_type"],
166+
dialog=index,
167+
vendor="openai",
168+
body=analysis_json_str,
169+
encoding="none",
170+
extra={
171+
"vendor_schema": {
172+
"model": opts["model"],
173+
"prompt": opts["prompt"],
174+
"parse_error": str(e)
175+
},
176+
},
177+
)
178+
179+
except Exception as e:
180+
logger.error(
181+
"Failed to generate analysis for vCon %s after multiple retries: %s",
182+
vcon_uuid,
183+
e,
184+
)
185+
stats_count(
186+
"conserver.link.openai.analysis_failures",
187+
tags=[f"analysis_type:{opts['analysis_type']}"],
188+
)
189+
raise e
190+
191+
stats_gauge(
192+
"conserver.link.openai.analysis_time",
193+
time.time() - start,
194+
tags=[f"analysis_type:{opts['analysis_type']}"],
195+
)
196+
197+
vcon_redis.store_vcon(vCon)
198+
logger.info(f"Finished analyze_and_label - {module_name}:{link_name} plugin for: {vcon_uuid}")
199+
200+
return vcon_uuid
201+
202+
203+
def navigate_dict(dictionary, path):
204+
keys = path.split(".")
205+
current = dictionary
206+
for key in keys:
207+
if key in current:
208+
current = current[key]
209+
else:
210+
return None
211+
return current
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+

0 commit comments

Comments
 (0)