Mistral simply shipped their maximum succesful style but, and it runs self-hosted on 4 GPUs.
What it’s: Mistral Medium 3.5 is a 128B dense style that merges instruction-following, reasoning, and coding right into a unmarried set of weights, with a 256k context window and configurable reasoning effort according to request.
Maximum frontier-class fashions both require huge infrastructure to self-host or lock you into proprietary APIs.
Mistral Medium 3.5 sits in a captivating place: it ratings 77.6% on SWE-Bench Verified, forward of fashions like Qwen3.5 397B A17B, whilst working on as few as 4 GPUs.
The reasoning effort is configurable according to name, so you might be no longer paying or looking ahead to deep reasoning on a easy answer, however the similar style can take care of a multi-step agentic run.
What makes it other: That is Mistral’s first “merged” flagship style, which means instruction-following, reasoning, and coding are living in a single set of weights reasonably than being break up throughout specialized variants.
The open weights are launched beneath a changed MIT license on Hugging Face, and it is already the default style in each Mistral Vibe and Le Chat.
The imaginative and prescient encoder was once skilled from scratch to take care of variable symbol sizes and side ratios.
Key options:
-
128B dense style, 256k context window
-
Configurable reasoning effort according to request
-
77.6% on SWE-Bench Verified
-
Open weights on Hugging Face beneath a changed MIT license
-
Self-hostable on 4 GPUs
-
API at $1.5/M enter tokens and $7.5/M output tokens
-
Powers Vibe far off coding brokers and Le Chat Paintings mode (Professional/Group/Endeavor plans)
-
To be had on NVIDIA construct.nvidia.com and as an NIM container
Advantages:
-
Run a frontier-class style by yourself infrastructure with out a big GPU cluster
-
Music reasoning intensity on the API stage, helpful for cost-sensitive agentic pipelines
-
Unmarried style handles the overall vary from fast chat replies to long-horizon coding duties
-
Open weights manner fine-tuning, auditing, and on-prem deployment are all at the desk
Who it is for: Backend and ML engineers comparing open-weight possible choices to proprietary frontier fashions for agentic pipelines, coding equipment, or self-hosted inference.
The fascinating design selection here’s the merged weights structure.
Maximum labs at this capacity tier nonetheless send separate reasoning and instruction fashions.
Collapsing them with configurable effort according to name is a realistic tradeoff that is price looking at as different labs reply.



