Model Lifecycle and Changelog

The NRP catalog rotates quickly to track the open-weights frontier. Newer, faster models replace obsolete ones, and GPU allocations are shifted toward what the community is actually using. This page explains the process and lists what’s changed recently.

How models are added and removed

Added — new models are added based on benchmarks (artificialanalysis.ai) and qualitative evidence (e.g. r/LocalLLaMA), with the final decision made by administrators in discussion with users.

Removed — obsolete models are removed when smaller models perform better all-around or another model has clearly replaced the use case.

Deprecated — research groups that need a specific model for reproducibility can declare research usage. Deprecated models stay up until the research concludes, but their replacement is still encouraged. If your group depends on a model that has been deprecated or removed, please reach out via the Matrix Nautilus AI/ML channel.

GPU allocation is the limiting factor: larger models that require many GPUs are removed sooner if relative performance falls behind, while small or efficient models get more leniency. New-model decisions and retirement discussion happen in the same Matrix channel.

Recent changes

April 2026

gemma3 was renamed to gemma and switched from google/gemma-3-27b-it to google/gemma-4-31B-it. Context size changed.
qwen3-small switched from Qwen/Qwen3.5-27B to Qwen/Qwen3.6-27B; context size changed; qwen3-27b added as an alias.
gemma-small / gemma-4-e4b (google/gemma-4-E4B-it) was added with audio input support.
minimax-m2 upgraded from MiniMax-M2.5 to MiniMax-M2.7.
kimi upgraded from Kimi-K2.5 to Kimi-K2.6.

March 2026

qwen3-embedding (Qwen/Qwen3-VL-Embedding-8B) added on the AI Gateway.
embed-mistral (intfloat/e5-mistral-7b-instruct) decommissioned and replaced with qwen3-embedding due to incompatibilities with Jupyter AI.
llama3-sdsc (Llama-3.3-70B-Instruct) removed from the Envoy AI Gateway after a long deprecation.
glm-v (GLM-4.6V multimodal route) removed from the Envoy AI Gateway. Use glm-4.7 for text and other multimodal options for vision/video.

Older changes

February 2026

minimax-m2 changed from MiniMaxAI/MiniMax-M2.1 to MiniMaxAI/MiniMax-M2.5.
qwen3 changed from Qwen3-VL-235B-A22B-Thinking-FP8 to Qwen3.5-397B-A17B-FP8.

January 2026

Added/Changed

glm-4.7 switched to the official zai-org/GLM-4.7-FP8; more GPUs allocated due to its position in the catalog.
kimi switched to the multimodal moonshotai/Kimi-K2.5.

Removed

olmo (allenai/OLMo-2-0325-32B-Instruct) and gorilla (gorilla-llm/gorilla-openfunctions-v2) removed — both had been broken for months without anyone reporting.

December 2025

glm-v upgraded to zai-org/GLM-4.6V-FP8 — larger context size.
glm-4.6 renamed to glm-4.7 and changed to the QuantTrio/GLM-4.7-GPTQ-Int4-Int8Mix quant.
minimax-m2 changed from MiniMaxAI/MiniMax-M2 to MiniMaxAI/MiniMax-M2.1.

November 2025

Added/Changed

qwen3 changed to Qwen3-VL-235B-A22B-Thinking-FP8 — adds state-of-the-art vision and video.
kimi (moonshotai/Kimi-K2-Thinking) — frontier programming model comparable to Claude Sonnet 4.5 / GPT-5.
glm-4.6 (QuantTrio/GLM-4.6-GPTQ-Int4-Int8Mix) — comparable to Claude Sonnet 4 / Gemini 2.5 Pro.
minimax-m2 (MiniMaxAI/MiniMax-M2) — comparable to Sonnet 4 / Gemini 2.5 Pro, fits in four A100s.
gpt-oss (openai/gpt-oss-120b) — capable agentic model on a single A100 or two RTX A6000 GPUs. LTS candidate, supersedes deprecated Llama3 models.
gemma3 moved to 2× RTX A6000 GPUs. Sliding-window attention allows full context.
glm-v switched to zai-org/GLM-4.5V-FP8 on 4× L40 GPUs.

Removed

llama3 (meta-llama/Llama-3.2-90B-Vision-Instruct) — consumed 4 A100s while being outperformed by single-GPU models.
deepseek-r1 — used 8 GPUs but was very slow (5–6 tokens/s) at larger contexts.
watt (watt-ai/watt-tool-8B) — removed for inactivity.

This work was supported in part by National Science Foundation (NSF) awards CNS-1730158, ACI-1540112, ACI-1541349, OAC-1826967, OAC-2112167, CNS-2100237, CNS-2120019.