Captioning API

Captioning API Complete Reference¶

This section provides comprehensive examples for all Captioning API endpoints available in Vision Studio, including both image and video captioning with various modes and model providers.

Prerequisites¶

import requests
import json

# Your API endpoint and key
API_URL = "http://localhost:8527/api/v1/caption"
API_KEY = "your-api-key"

headers = {
    "X-API-Key": API_KEY,
    "Content-Type": "application/json"
}

Image Captioning¶

1. Simple Caption¶

Generate a straightforward, concise caption for an image.

data = {
    "image_url": "https://images.pexels.com/photos/31976103/pexels-photo-31976103.jpeg",
    "mode": "simple", # can also use "expert"
    "model_provider": "openai" # can also use "gemini"
}

response = requests.post(
    f"{API_URL}/image",
    headers=headers,
    json=data
)

3. Image Tagging¶

Generate specific tags or keywords for an image with customizable parameters.

Basic Image Tagging¶

data = {
    "image_url": "https://images.pexels.com/photos/1108099/pexels-photo-1108099.jpeg",
    "mode": "image_tagging",
    "model_provider": "openai",
    "nbr_tags": 8
}

response = requests.post(
    f"{API_URL}/image",
    headers=headers,
    json=data
)

Constrained Tagging with Possible Tags¶

data = {
    "image_url": "https://images.pexels.com/photos/1108099/pexels-photo-1108099.jpeg",
    "mode": "image_tagging",
    "model_provider": "gemini",
    "nbr_tags": 5,
    "possible_tags": [
        "architecture",
        "urban",
        "building",
        "street",
        "modern",
        "glass",
        "concrete",
        "skyline",
        "downtown",
        "commercial",
        "windows",
        "facade"
    ]
}

response = requests.post(
    f"{API_URL}/image",
    headers=headers,
    json=data
)

Video Captioning¶

1. Simple Video Caption¶

Generate a basic description of video content.

data = {
    "video_url": "https://media.w3.org/2010/05/sintel/trailer.mp4",
    "mode": "simple" # can also chose "expert", "detailed"
}

response = requests.post(
    f"{API_URL}/video",
    headers=headers,
    json=data
)

2. Video Tagging¶

Generate tags or keywords that describe the video content.

data = {
    "video_url": "https://sample-videos.com/zip/10/mp4/SampleVideo_1280x720_2mb.mp4",
    "mode": "tagging",
    "nbr_tags": 10
}

response = requests.post(
    f"{API_URL}/video",
    headers=headers,
    json=data
)

3. Segment Timestamps¶

Get basic timestamp markers for different segments of the video.

data = {
    "video_url": "https://media.w3.org/2010/05/sintel/trailer.mp4",
    "mode": "segment_timestamps" # can also be  "segment_timestamps_descriptive"
}

response = requests.post(
    f"{API_URL}/video",
    headers=headers,
    json=data
)

4. Segment Timestamps with Tags¶

Generate tags for each video segment with timestamp information.

data = {
    "video_url": "https://sample-videos.com/zip/10/mp4/SampleVideo_1280x720_5mb.mp4",
    "mode": "segment_timestamps_tags",
    "nbr_tags": 5
}

response = requests.post(
    f"{API_URL}/video",
    headers=headers,
    json=data
)

Available Modes¶

Image Captioning Modes¶

Mode	Description	Output	Model Support
`simple`	Basic, concise image description	Single caption string	OpenAI, Gemini
`expert`	Detailed, technical/artistic analysis	Detailed caption string	OpenAI, Gemini
`image_tagging`	Generate relevant tags/keywords	Array of tags	OpenAI, Gemini

Video Captioning Modes¶

Mode	Description	Output	Parameters
`simple`	Basic video description	Single caption	-
`expert`	Professional video analysis	Detailed analysis	-
`detailed`	Comprehensive scene description	Scene-by-scene breakdown	-
`tagging`	Generate video tags	Array of tags	`nbr_tags`
`segment_timestamps`	Basic segment markers	Timestamp segments	-
`segment_timestamps_descriptive`	Described segments	Segments with descriptions	-
`segment_timestamps_tags`	Tagged segments	Segments with tags	`nbr_tags`

Best Practices¶

Choose appropriate modes: Use simple for basic descriptions, expert for detailed analysis, image_tagging for SEO/categorization
Optimize for concurrent processing: Use async/await patterns for batch processing multiple images/videos
Handle timeouts appropriately: Video processing can take longer than image processing
Validate URLs: Ensure image/video URLs are accessible and point to valid media files
Tag management: For image_tagging, provide possible_tags to constrain results to your taxonomy
Error handling: Implement retry logic for network failures and rate limiting