Captioning API
Captioning API Complete Reference¶
This section provides comprehensive examples for all Captioning API endpoints available in Vision Studio, including both image and video captioning with various modes and model providers.
Prerequisites¶
import requests
import json
# Your API endpoint and key
API_URL = "http://localhost:8527/api/v1/caption"
API_KEY = "your-api-key"
headers = {
"X-API-Key": API_KEY,
"Content-Type": "application/json"
}
Image Captioning¶
1. Simple Caption¶
Generate a straightforward, concise caption for an image.
data = {
"image_url": "https://images.pexels.com/photos/31976103/pexels-photo-31976103.jpeg",
"mode": "simple", # can also use "expert"
"model_provider": "openai" # can also use "gemini"
}
response = requests.post(
f"{API_URL}/image",
headers=headers,
json=data
)
3. Image Tagging¶
Generate specific tags or keywords for an image with customizable parameters.
Basic Image Tagging¶
data = {
"image_url": "https://images.pexels.com/photos/1108099/pexels-photo-1108099.jpeg",
"mode": "image_tagging",
"model_provider": "openai",
"nbr_tags": 8
}
response = requests.post(
f"{API_URL}/image",
headers=headers,
json=data
)
Constrained Tagging with Possible Tags¶
data = {
"image_url": "https://images.pexels.com/photos/1108099/pexels-photo-1108099.jpeg",
"mode": "image_tagging",
"model_provider": "gemini",
"nbr_tags": 5,
"possible_tags": [
"architecture",
"urban",
"building",
"street",
"modern",
"glass",
"concrete",
"skyline",
"downtown",
"commercial",
"windows",
"facade"
]
}
response = requests.post(
f"{API_URL}/image",
headers=headers,
json=data
)
Video Captioning¶
1. Simple Video Caption¶
Generate a basic description of video content.
data = {
"video_url": "https://media.w3.org/2010/05/sintel/trailer.mp4",
"mode": "simple" # can also chose "expert", "detailed"
}
response = requests.post(
f"{API_URL}/video",
headers=headers,
json=data
)
2. Video Tagging¶
Generate tags or keywords that describe the video content.
data = {
"video_url": "https://sample-videos.com/zip/10/mp4/SampleVideo_1280x720_2mb.mp4",
"mode": "tagging",
"nbr_tags": 10
}
response = requests.post(
f"{API_URL}/video",
headers=headers,
json=data
)
3. Segment Timestamps¶
Get basic timestamp markers for different segments of the video.
data = {
"video_url": "https://media.w3.org/2010/05/sintel/trailer.mp4",
"mode": "segment_timestamps" # can also be "segment_timestamps_descriptive"
}
response = requests.post(
f"{API_URL}/video",
headers=headers,
json=data
)
4. Segment Timestamps with Tags¶
Generate tags for each video segment with timestamp information.
data = {
"video_url": "https://sample-videos.com/zip/10/mp4/SampleVideo_1280x720_5mb.mp4",
"mode": "segment_timestamps_tags",
"nbr_tags": 5
}
response = requests.post(
f"{API_URL}/video",
headers=headers,
json=data
)
Available Modes¶
Image Captioning Modes¶
| Mode | Description | Output | Model Support |
|---|---|---|---|
simple |
Basic, concise image description | Single caption string | OpenAI, Gemini |
expert |
Detailed, technical/artistic analysis | Detailed caption string | OpenAI, Gemini |
image_tagging |
Generate relevant tags/keywords | Array of tags | OpenAI, Gemini |
Video Captioning Modes¶
| Mode | Description | Output | Parameters |
|---|---|---|---|
simple |
Basic video description | Single caption | - |
expert |
Professional video analysis | Detailed analysis | - |
detailed |
Comprehensive scene description | Scene-by-scene breakdown | - |
tagging |
Generate video tags | Array of tags | nbr_tags |
segment_timestamps |
Basic segment markers | Timestamp segments | - |
segment_timestamps_descriptive |
Described segments | Segments with descriptions | - |
segment_timestamps_tags |
Tagged segments | Segments with tags | nbr_tags |
Best Practices¶
- Choose appropriate modes: Use
simplefor basic descriptions,expertfor detailed analysis,image_taggingfor SEO/categorization - Optimize for concurrent processing: Use async/await patterns for batch processing multiple images/videos
- Handle timeouts appropriately: Video processing can take longer than image processing
- Validate URLs: Ensure image/video URLs are accessible and point to valid media files
- Tag management: For
image_tagging, providepossible_tagsto constrain results to your taxonomy - Error handling: Implement retry logic for network failures and rate limiting