A 128B style for coding, reasoning, and lengthy duties – Mistral Medium 3.5

Mistral simply shipped their maximum succesful style but, and it runs self-hosted on 4 GPUs.

What it’s: Mistral Medium 3.5 is a 128B dense style that merges instruction-following, reasoning, and coding right into a unmarried set of weights, with a 256k context window and configurable reasoning effort according to request.

Maximum frontier-class fashions both require huge infrastructure to self-host or lock you into proprietary APIs.

Mistral Medium 3.5 sits in a captivating place: it ratings 77.6% on SWE-Bench Verified, forward of fashions like Qwen3.5 397B A17B, whilst working on as few as 4 GPUs.

The reasoning effort is configurable according to name, so you might be no longer paying or looking ahead to deep reasoning on a easy answer, however the similar style can take care of a multi-step agentic run.

What makes it other: That is Mistral’s first “merged” flagship style, which means instruction-following, reasoning, and coding are living in a single set of weights reasonably than being break up throughout specialized variants.

The open weights are launched beneath a changed MIT license on Hugging Face, and it is already the default style in each Mistral Vibe and Le Chat.

The imaginative and prescient encoder was once skilled from scratch to take care of variable symbol sizes and side ratios.

Key options:

128B dense style, 256k context window
Configurable reasoning effort according to request
77.6% on SWE-Bench Verified
Open weights on Hugging Face beneath a changed MIT license
Self-hostable on 4 GPUs
API at $1.5/M enter tokens and $7.5/M output tokens
Powers Vibe far off coding brokers and Le Chat Paintings mode (Professional/Group/Endeavor plans)
To be had on NVIDIA construct.nvidia.com and as an NIM container

Advantages:

Run a frontier-class style by yourself infrastructure with out a big GPU cluster
Music reasoning intensity on the API stage, helpful for cost-sensitive agentic pipelines
Unmarried style handles the overall vary from fast chat replies to long-horizon coding duties
Open weights manner fine-tuning, auditing, and on-prem deployment are all at the desk

Who it is for: Backend and ML engineers comparing open-weight possible choices to proprietary frontier fashions for agentic pipelines, coding equipment, or self-hosted inference.

The fascinating design selection here’s the merged weights structure.

Maximum labs at this capacity tier nonetheless send separate reasoning and instruction fashions.

Collapsing them with configurable effort according to name is a realistic tradeoff that is price looking at as different labs reply.

A 128B style for coding, reasoning, and lengthy duties – Mistral Medium 3.5

Leave a Comment Cancel Reply

Sign up to receive email updates, fresh news and more!

Related Posts

Leave a Comment Cancel Reply