NRP-Managed LLMs

The NRP hosts a rotating catalog of open-weights LLMs, accessible either through hosted chat interfaces in the browser or programmatically via an OpenAI-compatible API. Models are added and retired as the open-weights frontier moves.

Chat with NRP LLMs → Open WebUI in your browser — no setup required

Get an API token → Programmatic access via OpenAI-compatible endpoint

Pick a model

Every active model is compared side-by-side in the feature matrix, and each has a card with strengths, trade-offs, and recommended uses. A few quick recommendations:

Frontier reasoning, longest context, multimodal — qwen3
Agentic coding — kimi, glm-4.7, or minimax-m2
Multimodal with audio (ASR, speech-to-text) — gemma-small
Reproducible research / general-purpose LTS — gpt-oss
Embeddings for vector search — qwen3-embedding

Available models & feature matrix Compare every model side-by-side; jump to any model card

Model lifecycle & changelog What's been added, deprecated, or removed — and why

Use the LLMs

For browser chat, the NRP Open WebUI is the most full-featured option. We also host LibreChat, and the same OpenAI-compatible API works with desktop apps like Cherry Studio and Chatbox.

For programmatic API access, the OpenAI-compatible endpoint at https://ellm.nrp-nautilus.io/v1 works with the OpenAI Python client, curl, or any compatible library. The API access page also covers cache_salt — recommended for any tenant where prompts shouldn’t be cached across users.

For coding CLIs, ready-to-paste configs are provided for Claude Code, OpenCode, Crush, Kimi CLI, pi, and Copilot CLI.

Chat interfaces Open WebUI, LibreChat, Cherry Studio, and Chatbox

API access Endpoint, examples, and cache_salt isolation

Client configurations Configs for the major coding CLIs

Operate responsibly

The NRP applies a Fair Use Policy with per-model concurrency limits — please review it before sending high-volume traffic. SDSC and Internet2 affiliates get higher limits. For real-time capacity and health, the LLM Status page surfaces live Prometheus / vLLM metrics, with one card per Kubernetes container.

Fair use policy Per-model concurrency limits and retry guidance

Live LLM status Live deployment health from Prometheus / vLLM

Deploy your own model

If the managed catalog doesn’t fit your need, the NRP supports running your own LLM via vLLM or SGLang. See Managing AI Models for the deployment path.

This work was supported in part by National Science Foundation (NSF) awards CNS-1730158, ACI-1540112, ACI-1541349, OAC-1826967, OAC-2112167, CNS-2100237, CNS-2120019.