Skip to content

Captioning API

Captioning API Complete Reference

This section provides comprehensive examples for all Captioning API endpoints available in Vision Studio, including both image and video captioning with various modes and model providers.

Prerequisites

import requests
import json

# Your API endpoint and key
API_URL = "http://localhost:8527/api/v1/caption"
API_KEY = "your-api-key"

headers = {
    "X-API-Key": API_KEY,
    "Content-Type": "application/json"
}

Image Captioning

1. Simple Caption

Generate a straightforward, concise caption for an image.

data = {
    "image_url": "https://images.pexels.com/photos/31976103/pexels-photo-31976103.jpeg",
    "mode": "simple", # can also use "expert"
    "model_provider": "openai" # can also use "gemini"
}

response = requests.post(
    f"{API_URL}/image",
    headers=headers,
    json=data
)

3. Image Tagging

Generate specific tags or keywords for an image with customizable parameters.

Basic Image Tagging
data = {
    "image_url": "https://images.pexels.com/photos/1108099/pexels-photo-1108099.jpeg",
    "mode": "image_tagging",
    "model_provider": "openai",
    "nbr_tags": 8
}

response = requests.post(
    f"{API_URL}/image",
    headers=headers,
    json=data
)
Constrained Tagging with Possible Tags
data = {
    "image_url": "https://images.pexels.com/photos/1108099/pexels-photo-1108099.jpeg",
    "mode": "image_tagging",
    "model_provider": "gemini",
    "nbr_tags": 5,
    "possible_tags": [
        "architecture",
        "urban",
        "building",
        "street",
        "modern",
        "glass",
        "concrete",
        "skyline",
        "downtown",
        "commercial",
        "windows",
        "facade"
    ]
}

response = requests.post(
    f"{API_URL}/image",
    headers=headers,
    json=data
)

Video Captioning

1. Simple Video Caption

Generate a basic description of video content.

data = {
    "video_url": "https://media.w3.org/2010/05/sintel/trailer.mp4",
    "mode": "simple" # can also chose "expert", "detailed"
}

response = requests.post(
    f"{API_URL}/video",
    headers=headers,
    json=data
)

2. Video Tagging

Generate tags or keywords that describe the video content.

data = {
    "video_url": "https://sample-videos.com/zip/10/mp4/SampleVideo_1280x720_2mb.mp4",
    "mode": "tagging",
    "nbr_tags": 10
}

response = requests.post(
    f"{API_URL}/video",
    headers=headers,
    json=data
)

3. Segment Timestamps

Get basic timestamp markers for different segments of the video.

data = {
    "video_url": "https://media.w3.org/2010/05/sintel/trailer.mp4",
    "mode": "segment_timestamps" # can also be  "segment_timestamps_descriptive"
}

response = requests.post(
    f"{API_URL}/video",
    headers=headers,
    json=data
)

4. Segment Timestamps with Tags

Generate tags for each video segment with timestamp information.

data = {
    "video_url": "https://sample-videos.com/zip/10/mp4/SampleVideo_1280x720_5mb.mp4",
    "mode": "segment_timestamps_tags",
    "nbr_tags": 5
}

response = requests.post(
    f"{API_URL}/video",
    headers=headers,
    json=data
)

Available Modes

Image Captioning Modes

Mode Description Output Model Support
simple Basic, concise image description Single caption string OpenAI, Gemini
expert Detailed, technical/artistic analysis Detailed caption string OpenAI, Gemini
image_tagging Generate relevant tags/keywords Array of tags OpenAI, Gemini

Video Captioning Modes

Mode Description Output Parameters
simple Basic video description Single caption -
expert Professional video analysis Detailed analysis -
detailed Comprehensive scene description Scene-by-scene breakdown -
tagging Generate video tags Array of tags nbr_tags
segment_timestamps Basic segment markers Timestamp segments -
segment_timestamps_descriptive Described segments Segments with descriptions -
segment_timestamps_tags Tagged segments Segments with tags nbr_tags

Best Practices

  1. Choose appropriate modes: Use simple for basic descriptions, expert for detailed analysis, image_tagging for SEO/categorization
  2. Optimize for concurrent processing: Use async/await patterns for batch processing multiple images/videos
  3. Handle timeouts appropriately: Video processing can take longer than image processing
  4. Validate URLs: Ensure image/video URLs are accessible and point to valid media files
  5. Tag management: For image_tagging, provide possible_tags to constrain results to your taxonomy
  6. Error handling: Implement retry logic for network failures and rate limiting