Llama 3.1 Nemotron Ultra 253B v1 (free) Check detailed information and pricing for AI models

Context Length 131,072 tokens, nvidia from provided

131,072

Context Tokens

Free

Prompt Price

Free

Output Price

9/16

Feature Support

Model Overview

Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model (LLM) optimized for advanced reasoning, human-interactive chat, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta’s Llama-3.1-405B-Instruct, it has been significantly customized using Neural Architecture Search (NAS), resulting in enhanced efficiency, reduced memory usage, and improved inference latency. The model supports a context length of up to 128K tokens and can operate efficiently on an 8x NVIDIA H100 node. Note: you must include `detailed thinking on` in the system prompt to enable reasoning. Please see [Usage Recommendations](https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1#quick-start-and-usage-recommendations) for more.

Basic Information

Developer

nvidia

Model Series

Llama3

Release Date

2025-04-08

Context Length

131,072 tokens

Variant

free

Pricing Information

This model is free to use

Data Policy

학습 정책

Supported Features

Supported (9)

Top K

Seed

Frequency Penalty

Presence Penalty

Repetition Penalty

Min P

Logit Bias

Logprobs

Top Logprobs

Unsupported (7)

Image Input

Response Format

Tool Usage

Structured Outputs

Reasoning

Web Search Options

Top A

Other Variants

Llama 3.1 Nemotron Ultra 253B v1

standard

$0.60 / $1.80

Actual Usage Statistics

No recent usage data available.

Models by Same Author (nvidia)

Nemotron Nano 9B V2 (free)

128,000 tokens

Free

View Details

Nemotron Nano 9B V2

131,072 tokens

$0.04 / $0.16

View Details

Llama 3.1 Nemotron Nano 8B v1

131,072 tokens

$0.00 / $0.00

View Details

Llama 3.3 Nemotron Super 49B v1 (free)

131,072 tokens

Free

View Details

Llama 3.3 Nemotron Super 49B v1

131,072 tokens

$0.00 / $0.00

View Details