Actions

Tools

Showing 82 of 82 inputs across 66 benchmark families.

Browse the shared inputs as a spreadsheet or grouped catalog. Related measurements stay together instead of reading like duplicate rows.

82Benchmarks66Benchmark families9Variant families73Linked citations

Money, payouts, and prices

Revenue anchors, deal values, labour rates, and inference pricing.

Money, payouts, and prices benchmarks with editable current values, metadata, and source details.
InputCurrent valueMetadataSource and usage
Main example

Reported annualized revenue run rate for OpenAI.

billions of dollars

About 25.00 billion dollars.

Scale
News
Yearly RevenueOpenai
Reuters citing The InformationReviewed 2026-05-28

Reuters reported that The Information said OpenAI topped $25B in annualized revenue at the end of February 2026; Reuters noted that it could not verify the report.

1 scenario uses this
Main example

Public API price benchmark for GPT-4.1 mini input tokens.

dollars per 1M input tokens

About 0.40 dollars per 1m input tokens.

Scale
First-party report
Inference PriceGpt 4 1 Mini
OpenAI model pageReviewed 2026-05-28

GPT-4.1 mini input pricing is listed at $0.40 per 1M tokens.

Main example

Public API price benchmark for GPT-4o input tokens.

dollars per 1M input tokens

About 2.50 dollars per 1m input tokens.

Scale
First-party report
Inference PriceGpt 4o
OpenAI model pageReviewed 2026-05-28

GPT-4o input pricing is listed at $2.50 per 1M tokens.

Reported annualized revenue run rate for Microsoft's AI business.

billions of dollars

About 37.00 billion dollars.

Scale
First-party report
Yearly RevenueMicrosoft
Microsoft earnings releaseReviewed 2026-05-28

Microsoft said in April 2026 that its AI business had surpassed a $37B annual revenue run rate.

Reported annualized revenue run rate for Anthropic.

billions of dollars

About 30.00 billion dollars.

Scale
First-party report
Yearly RevenueAnthropic
AnthropicReviewed 2026-05-28

Anthropic said in April 2026 that its run-rate revenue had surpassed $30B, up from about $9B at the end of 2025.

Annualized revenue run rate for an AI-cloud infrastructure company.

billions of dollars

About 8.31 billion dollars.

Scale
First-party report
Yearly RevenueCoreweave
CoreWeave earnings releaseReviewed 2026-05-28

CoreWeave reported $2.078B of revenue for Q1 2026; this input annualizes that quarter to $8.312B.

Public API price benchmark for GPT-4.1 mini output tokens.

dollars per 1M output tokens

About 1.60 dollars per 1m output tokens.

Scale
First-party report
Inference PriceGpt 4 1 Mini
OpenAI model pageReviewed 2026-05-28

GPT-4.1 mini output pricing is listed at $1.60 per 1M tokens.

Public API price benchmark for GPT-4o output tokens.

dollars per 1M output tokens

About 10.00 dollars per 1m output tokens.

Scale
First-party report
Inference PriceGpt 4o
OpenAI model pageReviewed 2026-05-28

GPT-4o output pricing is listed at $10.00 per 1M tokens.

Estimated total value of the OpenAI-News Corp content licensing agreement.

millions of dollars

About 250.00 million dollars.

Scale
News
Deal ValueNewscorp
Reuters citing WSJ reportingReviewed 2026-03-10

Reported as worth more than $250M over five years; stored here as a round-number benchmark.

A conservative benchmark for paying expert contributors to produce evaluation questions.

dollars per question

About 200.00 dollars per question.

Scale
First-party report
Wage DataPhd
Scale AI Humanity's Last Exam leaderboardReviewed 2026-05-02

A $500k prize pool spread across 2,500 final public questions implies about $200 per retained question, excluding organizer and reviewer labor.

2 scenarios use this

A premium expert-question benchmark based on Humanity's Last Exam prize tiers.

dollars per question

About 500.00 dollars per question.

Scale
First-party report
Wage DataHle Runner Up Prize
Scale AI Humanity's Last Exam leaderboardReviewed 2026-05-02

Scale reports that HLE contributors competed for a $500,000 prize pool, with $500 awards for the next 500 questions after the top 50.

National mean hourly wage benchmark for U.S. family medicine physicians.

dollars per hour

About 122.99 dollars per hour.

Scale
First-party report
Wage DataFamily Medicine Physician
U.S. Bureau of Labor Statistics OEWSReviewed 2026-05-28

BLS reports a mean hourly wage of $122.99 for family medicine physicians in May 2025.

National mean hourly wage benchmark for U.S. general internal medicine physicians.

dollars per hour

About 128.46 dollars per hour.

Scale
First-party report
Wage DataPhysician
U.S. Bureau of Labor Statistics OEWSReviewed 2026-05-28

BLS reports a mean hourly wage of $128.46 for general internal medicine physicians in May 2025.

2 scenarios use this

The disclosed floor for the Microsoft-Taylor & Francis AI licensing agreement.

millions of dollars

About 10.00 million dollars.

Scale
First-party report
Deal ValueTaylorandfrancis Microsoft
Informa market updateReviewed 2026-03-10

Informa disclosed a $10M+ initial fee plus recurring payments; the stored value is a conservative floor, not the full contract total.

A higher-end professional benchmark for commissioned writing labor.

dollars per word

About 0.090 dollars per word.

Scale
Third-party report
Wage DataGeneric Freelance Higher
Editorial Freelancers Association 2024 rate chartReviewed 2026-05-02

Uses the high end of EFA's book-proposal per-word rate range as a conservative paid-writing proxy.

1 scenario uses this

A lower-bound professional benchmark for paid per-word labor.

dollars per word

About 0.020 dollars per word.

Scale
Third-party report
Wage DataGeneric Freelance Lower
Editorial Freelancers Association 2024 rate chartReviewed 2026-05-02

Uses a low-end professional editorial benchmark as a rough floor for paid per-word labor.

Reported yearly value of the Google-Reddit data licensing deal.

millions of dollars

About 60.00 million dollars.

Scale
News
Deal ValueReddit Google
ReutersReviewed 2026-03-10

Reuters reported the Google-Reddit licensing contract was worth about $60M per year.

2 scenarios use this

Training sizes and benchmark totals

Token counts, example counts, benchmark sizes, and other scale assumptions.

Training sizes and benchmark totals benchmarks with editable current values, metadata, and source details.
InputCurrent valueMetadataSource and usage
Main example

A rule-of-thumb conversion between English words and tokens.

words per token

About 0.75 words per token.

Scale
First-party report
Training DetailOpenai
OpenAI Help CenterReviewed 2026-03-10

OpenAI's English rule of thumb is that one token is about three-quarters of a word.

1 scenario uses this
Main example

Total tokens used to pre-training a model

billions of tokens

About 15.00 trillion tokens.

Scale
First-party report
Dataset SizeLlama3
Meta Llama 3 model cardReviewed 2026-03-10

Meta reports that Llama 3 was pretrained on about 15 trillion multilingual tokens.

2 scenarios use this
Main example

Total tokens Meta reports using to pretrain Llama 4 Scout.

billions of tokens

About 40.00 trillion tokens.

Scale
First-party report
Dataset SizeLlama4 Scout
Meta Llama 4 model cardReviewed 2026-03-11

Meta reports that Llama 4 Scout was trained on 40 trillion tokens.

Main example

Approximate number of tokens used in the released OLMo 3 7B pretraining mix.

billions of tokens

About 5.93 trillion tokens.

Scale
First-party report
Dataset SizeOlmo3
AllenAI Dolma 3 mix cardReviewed 2026-03-10

AllenAI reports a 5.93T-token mix for the released OLMo 3 7B recipe.

Main example

Approximate code-token count for The Stack v2.

billions of tokens

About 900.00 billion tokens.

Scale
First-party report
Dataset SizeThe Stack V2
The Stack v2 dataset cardReviewed 2026-03-11

The dataset card describes The Stack v2 as a roughly 900B-token code corpus.

Main example

Total tokens DeepSeek reports for DeepSeek-V3 training.

billions of tokens

About 14.80 trillion tokens.

Scale
First-party report
Dataset SizeDeepseek V3
DeepSeek-V3 repositoryReviewed 2026-03-11

DeepSeek reports that DeepSeek-V3 was pretrained on 14.8 trillion high-quality and diverse tokens.

Planning assumption for yearly physician time spent auditing, reviewing, and supervising medical-AI systems.

hours per physician per year

About 40.00 hours.

Scale
Other
Training DetailMedical Ai Oversight
Scenario planning benchmarkReviewed 2026-04-06

Uses one workweek of annual oversight per participating physician as the default assumption.

1 scenario uses this

Internal Meta comparison point for the size of a book-heavy corpus.

billions of tokens

About 30.00 billion tokens.

Scale
Third-party report
Dataset SizeBooks3 Plus Gutenberg
Kadrey v. Meta Exhibit CReviewed 2026-03-11

An unsealed February 28, 2023 Meta email says Books3 plus Gutenberg is about 30B tokens.

Approximate size of the Common Pile open corpus.

terabytes

About 8.00 terabytes.

Scale
Third-party report
Dataset SizeCommon Pile
Common Pile paperReviewed 2026-03-11

The paper describes Common Pile as an 8TB openly licensed corpus.

Approximate storage size for FineWeb2.

terabytes

About 20.00 terabytes.

Scale
First-party report
Dataset SizeFineweb2
FineWeb2 dataset cardReviewed 2026-03-11

The dataset card describes FineWeb2 as a 20TB corpus assembled from 96 Common Crawl snapshots.

Approximate total storage size for The Stack v2.

terabytes

About 67.53 terabytes.

Scale
First-party report
Dataset SizeThe Stack V2
The Stack v2 dataset cardReviewed 2026-03-11

The dataset card describes The Stack v2 as 67.53TB in total before deduplication.

Planning assumption for the share of physicians who would participate in an ongoing medical-AI oversight regime.

percent of physicians

About 1.00 percent.

Scale
Other
Training DetailMedical Ai Oversight
Scenario planning benchmarkReviewed 2026-04-06

Starts at 1 percent to reflect a deliberately macro-scale oversight regime rather than a small benchmark panel.

1 scenario uses this

Project-planning benchmark for physician time spent creating and reviewing a domain-specific eval set.

hours per physician

About 8.00 hours.

Scale
Other
Training DetailMedical Eval
Project planning benchmarkReviewed 2026-04-03

Uses one workday per physician reviewer as a default assumption for drafting examples, refining rubrics, and calibration.

1 scenario uses this

Number of DPO preference pairs in the released Tulu 3 70B preference mixture.

generation pairs

About 337.19 thousand generation pairs.

Scale
First-party report
Post Training SizeTulu 3 70b
Tulu 3 70B preference mixture cardReviewed 2026-03-10

The released Tulu 3 70B preference mixture contains 337,186 generation pairs.

Number of supervised fine-tuning examples in the public Tulu 3 SFT mixture.

examples

About 939.34 thousand examples.

Scale
First-party report
Post Training SizeTulu 3
Tulu 3 SFT mixture cardReviewed 2026-03-10

The released Tulu 3 SFT mixture contains 939,344 examples.

Approximate raw-token count across the published Comma v0.1 training stages.

billions of raw tokens

About 639.80 billion tokens.

Scale
First-party report
Dataset SizeComma V0 1
Comma v0.1 training dataset cardReviewed 2026-03-11

The dataset card lists 463.6B raw tokens in the main stage and 176.2B raw tokens in the cooldown stage; this input sums both stages.

Plaintiffs' estimate of the Z-Library and LibGen portion of Meta's alleged shadow-library downloads.

terabytes

About 35.70 terabytes.

Scale
Third-party report
Dataset SizeZlibrary Libgen
Kadrey v. Meta unsealed filing appendixReviewed 2026-03-11

An unsealed appendix filed in February 2025 says 35.7TB of the downloads came from Z-Library and LibGen.

A small-network planning assumption for ongoing medical-AI oversight.

percent of physicians

About 0.010 percent.

Scale
Other
Training DetailMedical Ai Oversight Specialty Network
Scenario planning benchmarkReviewed 2026-05-02

Uses 0.01 percent of physicians as a deliberately small specialty-network comparison against the larger default oversight regime.

Internal planning target for the size of a hypothetical Stackipedia corpus.

billions of tokens

About 1.00 billion tokens.

Scale
Other
Target MetricStackipedia
Project defaultReviewed 2026-03-10

Internal target for a hypothetical public-knowledge corpus, not an observed empirical statistic.

Lower-bound count of books Anthropic was found to have copied from LibGen.

millions of books

About 5.00 million books.

Scale
Third-party report
Total BooksLibgen
Bartz v. Anthropic summary-judgment orderReviewed 2026-03-11

Judge Alsup wrote that Anthropic downloaded at least five million books from LibGen.

Lower-bound count of books Anthropic was found to have copied from Pirate Library Mirror.

millions of books

About 2.00 million books.

Scale
Third-party report
Total BooksPirate Library Mirror
Bartz v. Anthropic summary-judgment orderReviewed 2026-03-11

Judge Alsup wrote that Anthropic downloaded at least two million books from Pirate Library Mirror.

Lower-bound count of pirated books the court says Anthropic copied into its central library.

millions of books

About 7.00 million books.

Scale
Third-party report
Total BooksAnthropic Pirated Library
Bartz v. Anthropic summary-judgment orderReviewed 2026-03-11

Judge Alsup wrote that Anthropic assembled a central library of more than seven million pirated books.

1 scenario uses this

Approximate document count for FineWeb2.

billions of documents

About 5.00 billion documents.

Scale
First-party report
Dataset SizeFineweb2
FineWeb2 dataset cardReviewed 2026-03-11

The dataset card describes FineWeb2 as containing more than 5 billion documents.

Number of books in Books3

books

About 196.64 thousand books.

Scale
Third-party report
Total BooksBooks3
Bartz v. Anthropic summary-judgment orderReviewed 2026-03-11

The June 2025 order identifies the Books3 dataset as containing 196,640 books.

Total tokens Meta reports using to pretrain Llama 4 Maverick.

billions of tokens

About 22.00 trillion tokens.

Scale
First-party report
Dataset SizeLlama4 Maverick
Meta Llama 4 model cardReviewed 2026-03-11

Meta reports that Llama 4 Maverick was trained on 22 trillion tokens.

Total tokens used to pre-training a model

billions of tokens

About 5.00 trillion tokens.

Scale
First-party report
Dataset SizeOlmo2
AllenAI OLMo 2 model cardReviewed 2026-03-10

AllenAI reports 5 trillion training tokens for OLMo 2.

Total tokens Alibaba reports using to pretrain Qwen3-Coder.

billions of tokens

About 7.50 trillion tokens.

Scale
First-party report
Dataset SizeQwen3 Coder
Qwen3-Coder official blogReviewed 2026-03-11

Alibaba says Qwen3-Coder was pretrained on 7.5 trillion tokens.

Total tokens Alibaba reports using to pretrain Qwen3.

billions of tokens

About 36.00 trillion tokens.

Scale
First-party report
Dataset SizeQwen3
Qwen3 official blogReviewed 2026-03-11

Alibaba says Qwen3 models were pretrained on about 36 trillion tokens across 119 languages and dialects.

The number of public benchmark questions in Humanity's Last Exam.

questions

About 2.50 thousand questions.

Scale
First-party report
Dataset SizeHle
Scale AI Humanity's Last Exam leaderboardReviewed 2026-05-02

Scale's April 3, 2025 update says HLE was finalized to 2,500 questions.

1 scenario uses this

The main GPQA benchmark size, useful as a smaller expert-question benchmark.

questions

About 448.00 questions.

Scale
Third-party report
Dataset SizeGpqa Main
GPQA paper and dataset cardReviewed 2026-05-02

The GPQA paper and dataset card describe the main benchmark as 448 expert-written multiple-choice questions in biology, physics, and chemistry.

Plaintiffs' estimate of total shadow-library data discussed in an unsealed Meta filing appendix.

terabytes

About 81.70 terabytes.

Scale
Third-party report
Dataset SizeMeta Shadow Library
Kadrey v. Meta unsealed filing appendixReviewed 2026-03-11

An unsealed appendix filed in February 2025 says Meta downloaded 81.7TB from shadow libraries.

Approximate file count for The Stack v2.

billions of files

About 3.28 billion files.

Scale
First-party report
Dataset SizeThe Stack V2
The Stack v2 dataset cardReviewed 2026-03-11

The dataset card describes The Stack v2 as containing 3.28 billion unique files.

Approximate word count for FineWeb2.

trillions of words

About 3.00 trillion words.

Scale
First-party report
Dataset SizeFineweb2
FineWeb2 dataset cardReviewed 2026-03-11

The dataset card describes FineWeb2 as containing more than 3 trillion words.

People, groups, and distribution targets

Population, audience, workforce, and contributor counts used in per-person math.

People, groups, and distribution targets benchmarks with editable current values, metadata, and source details.
InputCurrent valueMetadataSource and usage

Rough benchmark for Wall Street Journal newsroom size.

people

About 2.00 thousand journalists.

Scale
Other
Deal Group SizeWsj
Historic public reportingReviewed 2026-03-10

This is a rough historical benchmark for WSJ/Dow Jones newsroom size, not a freshly disclosed audited count.

Estimated global physician headcount, derived from WHO's worldwide doctor-density benchmark.

million physicians

About 13.92 million people.

Scale
First-party report
Group SizeWorld Physicians
WHO Global Health ObservatoryReviewed 2026-04-06

WHO reports a global density of 17.2 doctors per 10,000 people in 2022. This input turns that density into a rough headcount by multiplying it by the repo's January 1, 2025 world-population benchmark.

1 scenario uses this

Approximate total employee count for News Corp.

people

About 23.90 thousand employees.

Scale
First-party report
Deal Group SizeNewscorp
News Corp annual reportReviewed 2026-03-10

News Corp reported about 23,900 employees as of June 30, 2024.

Projected U.S. population.

millions of people

About 341.15 million people.

Scale
First-party report
Group SizeUsa
U.S. Census BureauReviewed 2026-03-10

Uses the Census Bureau projection for the U.S. population on January 1, 2025.

Projected world population.

billions of people

About 8.09 billion people.

Scale
First-party report
Group SizeWorld
U.S. Census BureauReviewed 2026-03-10

Uses the Census Bureau projection for world population on January 1, 2025.

1 scenario uses this

Number of source collections included in Common Pile.

source collections

About 30.00 sources.

Scale
Third-party report
Group SizeCommon Pile
Common Pile paperReviewed 2026-03-11

The paper describes Common Pile as drawing from 30 sources.

Number of Taylor and Francis Articles

millions of articles

About 5.29 million articles.

Scale
First-party report
Deal Group SizeTaylorandfrancis
Taylor & Francis OnlineReviewed 2026-03-10

Uses the article count publicly shown on the Taylor & Francis platform homepage.

Number of physicians who partnered with OpenAI to build HealthBench.

physicians

About 262.00 people.

Scale
First-party report
Group SizeHealthbench
OpenAI HealthBench postReviewed 2026-04-03

HealthBench gives the exact count of 262 physicians, and OpenAI's January 8, 2026 healthcare announcement separately describes a global network of more than 260 licensed physicians across 60 countries.

1 scenario uses this

Average daily active unique users on Reddit.

millions of daily active users

About 121.40 million daily active users.

Scale
First-party report
Deal Group SizeReddit
Reddit investor relationsReviewed 2026-03-10

Uses Reddit's Q4 2025 daily active uniques as the latest official public scale benchmark.

1 scenario uses this

Data mix and dataset structure

Composition shares, source slices, document size, and related dataset structure.

Data mix and dataset structure benchmarks with editable current values, metadata, and source details.
InputCurrent valueMetadataSource and usage

Internal planning assumption for how many contributions it takes to assemble one Stackipedia document.

contributions per document

About 2.00 contributions per document.

Scale
Other
Conversion RateStackipedia
Project defaultReviewed 2026-03-10

Hypothetical productivity assumption for Stackipedia.

Average number of tokens in a single 'contribution'

tokens per contribution

About 1.41 thousand tokens per contribution.

Scale
Third-party report
Dataset AttributeRedpajama
RedPajama-Data repositoryReviewed 2026-03-10

Derived from the English deduplicated counts reported by the project: 20.5T tokens over 14.5B documents.

An average number of words per book

words

About 80.00 thousand words.

Scale
Other
Average LengthBook
Penguin Books explainerReviewed 2026-03-10

This is an industry rule-of-thumb for a typical full-length book, not a census of Books3 itself.

Internal planning assumption for the typical size of a Stackipedia document.

tokens per document

About 1.50 thousand tokens per document.

Scale
Other
Dataset AttributeStackipedia
Project defaultReviewed 2026-03-10

Hypothetical default used for Stackipedia scenarios; no external public source is claimed.

Internal planning assumption for how much demand is needed to generate one Stackipedia contribution.

visitors per contribution

About 50.00 visitors per contribution.

Scale
Other
Conversion RateStackipedia
Project defaultReviewed 2026-03-10

Hypothetical engagement assumption for Stackipedia.

Number of Aya examples in the public Tulu 3 SFT mixture.

examples

About 100.00 thousand examples.

Scale
First-party report
Post Training SourceTulu 3
Tulu 3 SFT mixture cardReviewed 2026-03-10

Aya contributes 100,000 multilingual instruction examples in the released mixture.

Number of FLAN v2 examples in the public Tulu 3 SFT mixture.

examples

About 89.98 thousand examples.

Scale
First-party report
Post Training SourceTulu 3
Tulu 3 SFT mixture cardReviewed 2026-03-10

FLAN v2 contributes 89,982 examples in the released mixture.

Approximate share of Dolma v1.6 tokens that come from scientific papers.

percent of tokens

About 2.30 percent.

Scale
First-party report
Pretraining CompositionDolma V1 6
AllenAI Dolma dataset cardReviewed 2026-03-10

Derived from the published PeS2o paper count in Dolma v1.6.

Approximate share of Dolma v1.6 tokens that come from books.

percent of tokens

About 0.20 percent.

Scale
First-party report
Pretraining CompositionDolma V1 6
AllenAI Dolma dataset cardReviewed 2026-03-10

Derived from the Project Gutenberg token count in Dolma v1.6.

1 scenario uses this

Approximate share of Dolma v1.6 tokens that come from code.

percent of tokens

About 13.40 percent.

Scale
First-party report
Pretraining CompositionDolma V1 6
AllenAI Dolma dataset cardReviewed 2026-03-10

Derived from the published The Stack token count in Dolma v1.6.

Approximate share of Qwen3-Coder pretraining tokens that came from code.

percent of tokens

About 70.00 percent.

Scale
First-party report
Pretraining CompositionQwen3 Coder
Qwen3-Coder official blogReviewed 2026-03-11

Alibaba says 70 percent of Qwen3-Coder pretraining data was code.

Approximate share of Dolma v1.6 tokens that come from Reddit.

percent of tokens

About 2.90 percent.

Scale
First-party report
Pretraining CompositionDolma V1 6
AllenAI Dolma dataset cardReviewed 2026-03-10

Derived from the published Reddit token count in Dolma v1.6.

Approximate share of Dolma v1.6 tokens that come from web crawls.

percent of tokens

About 81.00 percent.

Scale
First-party report
Pretraining CompositionDolma V1 6
AllenAI Dolma dataset cardReviewed 2026-03-10

Derived from the Common Crawl and C4 token counts in Dolma v1.6.

Estimated share of C4 URLs carrying AI-restrictive terms or robots exclusions.

percent of URLs

About 45.00 percent.

Scale
Third-party report
Dataset AttributeC4
Consent in CrisisReviewed 2026-03-11

Longpre et al. estimate that about 45% of C4 URLs are restricted for AI use by terms or robots exclusions.

Number of synthetic persona-style examples in the public Tulu 3 SFT mixture.

examples

About 284.92 thousand examples.

Scale
First-party report
Post Training SourceTulu 3
Tulu 3 SFT mixture cardReviewed 2026-03-10

This combines the five Persona subsets in the released Tulu 3 SFT mixture.

Number of WildChat GPT-4 examples in the public Tulu 3 SFT mixture.

examples

About 100.00 thousand examples.

Scale
First-party report
Post Training SourceTulu 3
Tulu 3 SFT mixture cardReviewed 2026-03-10

WildChat GPT-4 contributes 100,000 examples in the released mixture.

Other inputs

Additional anchors that do not fit one of the main question buckets.

Other inputs benchmarks with editable current values, metadata, and source details.
InputCurrent valueMetadataSource and usage
Main example

Public training-compute benchmark for Llama 4 Scout.

millions of H100-80GB GPU hours

About 5.00 million h100 80gb gpu hours.

Scale
First-party report
Training ComputeLlama4 Scout
Meta Llama 4 model cardReviewed 2026-03-11

Meta reports 5.0 million H100-80GB GPU hours for Llama 4 Scout training.

The ordinary minimum statutory damages amount for copyright infringement under 17 U.S.C. 504(c).

dollars per work

About 750.00 dollars per work.

Scale
First-party report
Settlement ValueCopyright Statutory Minimum
17 U.S.C. 504(c)Reviewed 2026-05-02

The statute sets ordinary statutory damages at not less than $750 and not more than $30,000 per work, before willfulness or innocent-infringer adjustments.

Public full-training compute benchmark for DeepSeek-V3.

millions of H800 GPU hours

About 2.79 million h800 gpu hours.

Scale
First-party report
Training ComputeDeepseek V3
DeepSeek-V3 repositoryReviewed 2026-03-11

DeepSeek reports 2.788 million H800 GPU hours for the full DeepSeek-V3 training run.

Number of identified works referenced in the Anthropic books settlement papers.

works

About 482.46 thousand works.

Scale
Third-party report
Settlement Group SizeAnthropic Books
Bartz v. Anthropic settlement orderReviewed 2026-03-11

The October 2025 order says the parties identified 482,460 works for settlement administration.

Public pretraining-compute benchmark for DeepSeek-V3.

millions of H800 GPU hours

About 2.66 million h800 gpu hours.

Scale
First-party report
Training ComputeDeepseek V3
DeepSeek-V3 repositoryReviewed 2026-03-11

DeepSeek reports 2.664 million H800 GPU hours for DeepSeek-V3 pretraining.

Proposed average payout per identified work in the Anthropic books settlement.

dollars per work

About 3.00 thousand dollars per work.

Scale
Third-party report
Settlement ValueAnthropic Books
Bartz v. Anthropic preliminary-approval memorandumReviewed 2026-03-11

The October 17, 2025 filing describes an expected per-work award of about $3,000 across 482,460 identified works.

1 scenario uses this

Public training-compute benchmark for Llama 4 Maverick.

millions of H100-80GB GPU hours

About 2.38 million h100 80gb gpu hours.

Scale
First-party report
Training ComputeLlama4 Maverick
Meta Llama 4 model cardReviewed 2026-03-11

Meta reports 2.38 million H100-80GB GPU hours for Llama 4 Maverick training.