Fair Use Policy

The restrictions below are current fair usage limitations for each of the managed LLM models, in order to make the LLM usage experience stable for everyone.

If you need to go beyond this limit in exceptional circumstances (such as deadlines or other special requirements for high volume), you must contact the administrators to make arrangements in advance. Otherwise, you are recommended to deploy and run your own LLM models if you need higher concurrency — see Managing AI Models.

The below limits are currently not automatically enforced, but automatic rate limiting in the LLM gateway is coming soon. Please report to the Nautilus Artificial Intelligence/Machine Learning channel if you observe lagging requests and high request volume in Grafana.

Maximum Per-User Concurrency	Models
`2`	`kimi`
`4`	`glm-4.7`, `minimax-m2`
`8`	`qwen3-small`, `gemma`, `gemma-small`, `olmo`
`16`	`qwen3`, `gpt-oss`, `qwen3-embedding`

San Diego Supercomputer Center (SDSC) and Internet2 have contributed their GPU nodes for managed LLM inference. Therefore, users affiliated with these organizations are granted twice the limits and separately arrangeable higher-volume sessions.

The National Research Platform is non-profit and non-commercial. All usage of the cluster, including LLMs, is subject to the AUP.

This work was supported in part by National Science Foundation (NSF) awards CNS-1730158, ACI-1540112, ACI-1541349, OAC-1826967, OAC-2112167, CNS-2100237, CNS-2120019.