

Hunyuan-Turbo-Vision
#38tencent · turbo vision · 2× · zuletzt 29. Juni 2026
10
Momentum
Hunyuan-Turbo-Vision is a multimodal language model by Tencent that processes combined image and text inputs. It is part of the Hunyuan-Turbo model family and is exclusively available as a cloud API through Tencent Cloud. The model supports image understanding, image captioning, multimodal dialogues, and — in its extended variant hunyuan-turbos-vision-video — video analysis via URL input. According to Tencent Cloud billing documentation, it shares a common free token quota with other Hunyuan multimodal models.
Historique du momentum
04.04.03.07.
Fonctionnalités
| Context Window (Tokens) | 32,000 tokens (32K) |
| Multimodal Inputs | Image + text (image_url + text); officially documented as the hunyuan-vision model for image understanding, image captioning, multimodal dialogue, image OCR, and knowledge-based image analysis |
| On-Device vs. Cloud | Cloud only – available exclusively as an API via Tencent Cloud (hunyuan.tencentcloudapi.com), no on-device operation |
| Price per Unit | $1.20 per 1 million tokens (input + output combined, per the pricing overview for the Hunyuan-Turbo-Vision model on Tencent Cloud) |
| Video Analysis Capability | Yes – via the hunyuan-turbos-vision-video variant: video analysis through the video_url input type with configurable FPS rate, documented in the official Tencent Cloud API documentation |