

Step 3.7
#69 in Frontier-Spraakmodellestepfun · v3.7 · siet 29. Mai 2026 · 11× · tolest 29. Juni 2026
Step 3.7 Flash is a multimodal language model developed by StepFun (Jieyue Xingchen), built on a sparse Mixture-of-Experts architecture with 198B total parameters (~11B active per token) and an integrated 1.8B-parameter vision encoder for native image and video understanding. It is designed for agentic workflows, coding, tool use, and long-context tasks, supports a 256K-token context window plus three selectable reasoning levels (low/medium/high), and is released as open-weight under the Apache 2.0 license. It launched on May 29, 2026, and is available via StepFun's API, Hugging Face, GitHub, OpenRouter, and NVIDIA NIM.
Features
| Key Benchmark (%) | SWE-Bench Pro: 56.26%; ClawEval-1.1: 67.07% (each leading among compared models) |
| Context Window (Tokens) | 256,000 tokens (256K) |
| License | Apache 2.0 (open weights) |
| Multimodality | Text and image input (native), video understanding; text output; 1.8B-parameter vision encoder (ViT) |
| Platform | StepFun API (platform.stepfun.ai / platform.stepfun.com), Hugging Face, GitHub, OpenRouter, NVIDIA NIM |
| Price per 1M Tokens | $0.20 input / $1.15 output (StepFun API) |
| Release Date | May 29, 2026 |