Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium

Hunyuan-Turbo-Vision

#38 in Multimodal Models

tencent · turbo vision · 2× · last seen Jun 29, 2026

Momentum

Hunyuan-Turbo-Vision is a multimodal language model by Tencent that processes combined image and text inputs. It is part of the Hunyuan-Turbo model family and is exclusively available as a cloud API through Tencent Cloud. The model supports image understanding, image captioning, multimodal dialogues, and — in its extended variant hunyuan-turbos-vision-video — video analysis via URL input. According to Tencent Cloud billing documentation, it shares a common free token quota with other Hunyuan multimodal models.

Momentum trend

04.04.03.07.

Features

Context Window (Tokens)	32,000 tokens (32K)
Multimodal Inputs	Image + text (image_url + text); officially documented as the hunyuan-vision model for image understanding, image captioning, multimodal dialogue, image OCR, and knowledge-based image analysis
On-Device vs. Cloud	Cloud only – available exclusively as an API via Tencent Cloud (hunyuan.tencentcloudapi.com), no on-device operation
Price per Unit	$1.20 per 1 million tokens (input + output combined, per the pricing overview for the Hunyuan-Turbo-Vision model on Tencent Cloud)
Video Analysis Capability	Yes – via the hunyuan-turbos-vision-video variant: video analysis through the video_url input type with configurable FPS rate, documented in the official Tencent Cloud API documentation

Hunyuan-Turbo-Vision

Features

Sources (2)

More products in this category: Multimodal Models

Subscribe free. Unsubscribe the second it sucks.