Inputs Library

Browse and edit the inputs behind the scenarios.

Browse the shared inputs as a spreadsheet or grouped catalog. Related measurements stay together instead of reading like duplicate rows.

82Benchmarks66Benchmark families9Variant families73Linked citations

Money, payouts, and prices

Revenue anchors, deal values, labour rates, and inference pricing.

Money, payouts, and prices benchmarks with editable current values, metadata, and source details.
Input	Current value	Metadata	Source and usage
Main example Reported annualized revenue run rate for OpenAI.	billions of dollars About 25.00 billion dollars. Scale	News Yearly RevenueOpenai	Reuters citing The InformationReviewed 2026-05-28 Reuters reported that The Information said OpenAI topped $25B in annualized revenue at the end of February 2026; Reuters noted that it could not verify the report. Source 1 scenario uses this
Main example Public API price benchmark for GPT-4.1 mini input tokens.	dollars per 1M input tokens About 0.40 dollars per 1m input tokens. Scale	First-party report Inference PriceGpt 4 1 Mini	OpenAI model pageReviewed 2026-05-28 GPT-4.1 mini input pricing is listed at $0.40 per 1M tokens. Source
Main example Public API price benchmark for GPT-4o input tokens.	dollars per 1M input tokens About 2.50 dollars per 1m input tokens. Scale	First-party report Inference PriceGpt 4o	OpenAI model pageReviewed 2026-05-28 GPT-4o input pricing is listed at $2.50 per 1M tokens. Source
Reported annualized revenue run rate for Microsoft's AI business.	billions of dollars About 37.00 billion dollars. Scale	First-party report Yearly RevenueMicrosoft	Microsoft earnings releaseReviewed 2026-05-28 Microsoft said in April 2026 that its AI business had surpassed a $37B annual revenue run rate. Source
Reported annualized revenue run rate for Anthropic.	billions of dollars About 30.00 billion dollars. Scale	First-party report Yearly RevenueAnthropic	AnthropicReviewed 2026-05-28 Anthropic said in April 2026 that its run-rate revenue had surpassed $30B, up from about $9B at the end of 2025. Source
Annualized revenue run rate for an AI-cloud infrastructure company.	billions of dollars About 8.31 billion dollars. Scale	First-party report Yearly RevenueCoreweave	CoreWeave earnings releaseReviewed 2026-05-28 CoreWeave reported $2.078B of revenue for Q1 2026; this input annualizes that quarter to $8.312B. Source
Public API price benchmark for GPT-4.1 mini output tokens.	dollars per 1M output tokens About 1.60 dollars per 1m output tokens. Scale	First-party report Inference PriceGpt 4 1 Mini	OpenAI model pageReviewed 2026-05-28 GPT-4.1 mini output pricing is listed at $1.60 per 1M tokens. Source
Public API price benchmark for GPT-4o output tokens.	dollars per 1M output tokens About 10.00 dollars per 1m output tokens. Scale	First-party report Inference PriceGpt 4o	OpenAI model pageReviewed 2026-05-28 GPT-4o output pricing is listed at $10.00 per 1M tokens. Source
Estimated total value of the OpenAI-News Corp content licensing agreement.	millions of dollars About 250.00 million dollars. Scale	News Deal ValueNewscorp	Reuters citing WSJ reportingReviewed 2026-03-10 Reported as worth more than $250M over five years; stored here as a round-number benchmark. Source
A conservative benchmark for paying expert contributors to produce evaluation questions.	dollars per question About 200.00 dollars per question. Scale	First-party report Wage DataPhd	Scale AI Humanity's Last Exam leaderboardReviewed 2026-05-02 A $500k prize pool spread across 2,500 final public questions implies about $200 per retained question, excluding organizer and reviewer labor. Source 2 scenarios use this
A premium expert-question benchmark based on Humanity's Last Exam prize tiers.	dollars per question About 500.00 dollars per question. Scale	First-party report Wage DataHle Runner Up Prize	Scale AI Humanity's Last Exam leaderboardReviewed 2026-05-02 Scale reports that HLE contributors competed for a $500,000 prize pool, with $500 awards for the next 500 questions after the top 50. Source
National mean hourly wage benchmark for U.S. family medicine physicians.	dollars per hour About 122.99 dollars per hour. Scale	First-party report Wage DataFamily Medicine Physician	U.S. Bureau of Labor Statistics OEWSReviewed 2026-05-28 BLS reports a mean hourly wage of $122.99 for family medicine physicians in May 2025. Source
National mean hourly wage benchmark for U.S. general internal medicine physicians.	dollars per hour About 128.46 dollars per hour. Scale	First-party report Wage DataPhysician	U.S. Bureau of Labor Statistics OEWSReviewed 2026-05-28 BLS reports a mean hourly wage of $128.46 for general internal medicine physicians in May 2025. Source 2 scenarios use this
The disclosed floor for the Microsoft-Taylor & Francis AI licensing agreement.	millions of dollars About 10.00 million dollars. Scale	First-party report Deal ValueTaylorandfrancis Microsoft	Informa market updateReviewed 2026-03-10 Informa disclosed a $10M+ initial fee plus recurring payments; the stored value is a conservative floor, not the full contract total. Source
A higher-end professional benchmark for commissioned writing labor.	dollars per word About 0.090 dollars per word. Scale	Third-party report Wage DataGeneric Freelance Higher	Editorial Freelancers Association 2024 rate chartReviewed 2026-05-02 Uses the high end of EFA's book-proposal per-word rate range as a conservative paid-writing proxy. Source 1 scenario uses this
A lower-bound professional benchmark for paid per-word labor.	dollars per word About 0.020 dollars per word. Scale	Third-party report Wage DataGeneric Freelance Lower	Editorial Freelancers Association 2024 rate chartReviewed 2026-05-02 Uses a low-end professional editorial benchmark as a rough floor for paid per-word labor. Source
Reported yearly value of the Google-Reddit data licensing deal.	millions of dollars About 60.00 million dollars. Scale	News Deal ValueReddit Google	ReutersReviewed 2026-03-10 Reuters reported the Google-Reddit licensing contract was worth about $60M per year. Source 2 scenarios use this

Training sizes and benchmark totals

Token counts, example counts, benchmark sizes, and other scale assumptions.

Training sizes and benchmark totals benchmarks with editable current values, metadata, and source details.
Input	Current value	Metadata	Source and usage
Main example A rule-of-thumb conversion between English words and tokens.	words per token About 0.75 words per token. Scale	First-party report Training DetailOpenai	OpenAI Help CenterReviewed 2026-03-10 OpenAI's English rule of thumb is that one token is about three-quarters of a word. Source 1 scenario uses this
Main example Total tokens used to pre-training a model	billions of tokens About 15.00 trillion tokens. Scale	First-party report Dataset SizeLlama3	Meta Llama 3 model cardReviewed 2026-03-10 Meta reports that Llama 3 was pretrained on about 15 trillion multilingual tokens. Source 2 scenarios use this
Main example Total tokens Meta reports using to pretrain Llama 4 Scout.	billions of tokens About 40.00 trillion tokens. Scale	First-party report Dataset SizeLlama4 Scout	Meta Llama 4 model cardReviewed 2026-03-11 Meta reports that Llama 4 Scout was trained on 40 trillion tokens. Source
Main example Approximate number of tokens used in the released OLMo 3 7B pretraining mix.	billions of tokens About 5.93 trillion tokens. Scale	First-party report Dataset SizeOlmo3	AllenAI Dolma 3 mix cardReviewed 2026-03-10 AllenAI reports a 5.93T-token mix for the released OLMo 3 7B recipe. Source
Main example Approximate code-token count for The Stack v2.	billions of tokens About 900.00 billion tokens. Scale	First-party report Dataset SizeThe Stack V2	The Stack v2 dataset cardReviewed 2026-03-11 The dataset card describes The Stack v2 as a roughly 900B-token code corpus. Source
Main example Total tokens DeepSeek reports for DeepSeek-V3 training.	billions of tokens About 14.80 trillion tokens. Scale	First-party report Dataset SizeDeepseek V3	DeepSeek-V3 repositoryReviewed 2026-03-11 DeepSeek reports that DeepSeek-V3 was pretrained on 14.8 trillion high-quality and diverse tokens. Source
Planning assumption for yearly physician time spent auditing, reviewing, and supervising medical-AI systems.	hours per physician per year About 40.00 hours. Scale	Other Training DetailMedical Ai Oversight	Scenario planning benchmarkReviewed 2026-04-06 Uses one workweek of annual oversight per participating physician as the default assumption. 1 scenario uses this
Internal Meta comparison point for the size of a book-heavy corpus.	billions of tokens About 30.00 billion tokens. Scale	Third-party report Dataset SizeBooks3 Plus Gutenberg	Kadrey v. Meta Exhibit CReviewed 2026-03-11 An unsealed February 28, 2023 Meta email says Books3 plus Gutenberg is about 30B tokens. Source
Approximate size of the Common Pile open corpus.	terabytes About 8.00 terabytes. Scale	Third-party report Dataset SizeCommon Pile	Common Pile paperReviewed 2026-03-11 The paper describes Common Pile as an 8TB openly licensed corpus. Source
Approximate storage size for FineWeb2.	terabytes About 20.00 terabytes. Scale	First-party report Dataset SizeFineweb2	FineWeb2 dataset cardReviewed 2026-03-11 The dataset card describes FineWeb2 as a 20TB corpus assembled from 96 Common Crawl snapshots. Source
Approximate total storage size for The Stack v2.	terabytes About 67.53 terabytes. Scale	First-party report Dataset SizeThe Stack V2	The Stack v2 dataset cardReviewed 2026-03-11 The dataset card describes The Stack v2 as 67.53TB in total before deduplication. Source
Planning assumption for the share of physicians who would participate in an ongoing medical-AI oversight regime.	percent of physicians About 1.00 percent. Scale	Other Training DetailMedical Ai Oversight	Scenario planning benchmarkReviewed 2026-04-06 Starts at 1 percent to reflect a deliberately macro-scale oversight regime rather than a small benchmark panel. 1 scenario uses this
Project-planning benchmark for physician time spent creating and reviewing a domain-specific eval set.	hours per physician About 8.00 hours. Scale	Other Training DetailMedical Eval	Project planning benchmarkReviewed 2026-04-03 Uses one workday per physician reviewer as a default assumption for drafting examples, refining rubrics, and calibration. 1 scenario uses this
Number of DPO preference pairs in the released Tulu 3 70B preference mixture.	generation pairs About 337.19 thousand generation pairs. Scale	First-party report Post Training SizeTulu 3 70b	Tulu 3 70B preference mixture cardReviewed 2026-03-10 The released Tulu 3 70B preference mixture contains 337,186 generation pairs. Source
Number of supervised fine-tuning examples in the public Tulu 3 SFT mixture.	examples About 939.34 thousand examples. Scale	First-party report Post Training SizeTulu 3	Tulu 3 SFT mixture cardReviewed 2026-03-10 The released Tulu 3 SFT mixture contains 939,344 examples. Source
Approximate raw-token count across the published Comma v0.1 training stages.	billions of raw tokens About 639.80 billion tokens. Scale	First-party report Dataset SizeComma V0 1	Comma v0.1 training dataset cardReviewed 2026-03-11 The dataset card lists 463.6B raw tokens in the main stage and 176.2B raw tokens in the cooldown stage; this input sums both stages. Source
Plaintiffs' estimate of the Z-Library and LibGen portion of Meta's alleged shadow-library downloads.	terabytes About 35.70 terabytes. Scale	Third-party report Dataset SizeZlibrary Libgen	Kadrey v. Meta unsealed filing appendixReviewed 2026-03-11 An unsealed appendix filed in February 2025 says 35.7TB of the downloads came from Z-Library and LibGen. Source
A small-network planning assumption for ongoing medical-AI oversight.	percent of physicians About 0.010 percent. Scale	Other Training DetailMedical Ai Oversight Specialty Network	Scenario planning benchmarkReviewed 2026-05-02 Uses 0.01 percent of physicians as a deliberately small specialty-network comparison against the larger default oversight regime.
Internal planning target for the size of a hypothetical Stackipedia corpus.	billions of tokens About 1.00 billion tokens. Scale	Other Target MetricStackipedia	Project defaultReviewed 2026-03-10 Internal target for a hypothetical public-knowledge corpus, not an observed empirical statistic.
Lower-bound count of books Anthropic was found to have copied from LibGen.	millions of books About 5.00 million books. Scale	Third-party report Total BooksLibgen	Bartz v. Anthropic summary-judgment orderReviewed 2026-03-11 Judge Alsup wrote that Anthropic downloaded at least five million books from LibGen. Source
Lower-bound count of books Anthropic was found to have copied from Pirate Library Mirror.	millions of books About 2.00 million books. Scale	Third-party report Total BooksPirate Library Mirror	Bartz v. Anthropic summary-judgment orderReviewed 2026-03-11 Judge Alsup wrote that Anthropic downloaded at least two million books from Pirate Library Mirror. Source
Lower-bound count of pirated books the court says Anthropic copied into its central library.	millions of books About 7.00 million books. Scale	Third-party report Total BooksAnthropic Pirated Library	Bartz v. Anthropic summary-judgment orderReviewed 2026-03-11 Judge Alsup wrote that Anthropic assembled a central library of more than seven million pirated books. Source 1 scenario uses this
Approximate document count for FineWeb2.	billions of documents About 5.00 billion documents. Scale	First-party report Dataset SizeFineweb2	FineWeb2 dataset cardReviewed 2026-03-11 The dataset card describes FineWeb2 as containing more than 5 billion documents. Source
Number of books in Books3	books About 196.64 thousand books. Scale	Third-party report Total BooksBooks3	Bartz v. Anthropic summary-judgment orderReviewed 2026-03-11 The June 2025 order identifies the Books3 dataset as containing 196,640 books. Source
Total tokens Meta reports using to pretrain Llama 4 Maverick.	billions of tokens About 22.00 trillion tokens. Scale	First-party report Dataset SizeLlama4 Maverick	Meta Llama 4 model cardReviewed 2026-03-11 Meta reports that Llama 4 Maverick was trained on 22 trillion tokens. Source
Total tokens used to pre-training a model	billions of tokens About 5.00 trillion tokens. Scale	First-party report Dataset SizeOlmo2	AllenAI OLMo 2 model cardReviewed 2026-03-10 AllenAI reports 5 trillion training tokens for OLMo 2. Source
Total tokens Alibaba reports using to pretrain Qwen3-Coder.	billions of tokens About 7.50 trillion tokens. Scale	First-party report Dataset SizeQwen3 Coder	Qwen3-Coder official blogReviewed 2026-03-11 Alibaba says Qwen3-Coder was pretrained on 7.5 trillion tokens. Source
Total tokens Alibaba reports using to pretrain Qwen3.	billions of tokens About 36.00 trillion tokens. Scale	First-party report Dataset SizeQwen3	Qwen3 official blogReviewed 2026-03-11 Alibaba says Qwen3 models were pretrained on about 36 trillion tokens across 119 languages and dialects. Source
The number of public benchmark questions in Humanity's Last Exam.	questions About 2.50 thousand questions. Scale	First-party report Dataset SizeHle	Scale AI Humanity's Last Exam leaderboardReviewed 2026-05-02 Scale's April 3, 2025 update says HLE was finalized to 2,500 questions. Source 1 scenario uses this
The main GPQA benchmark size, useful as a smaller expert-question benchmark.	questions About 448.00 questions. Scale	Third-party report Dataset SizeGpqa Main	GPQA paper and dataset cardReviewed 2026-05-02 The GPQA paper and dataset card describe the main benchmark as 448 expert-written multiple-choice questions in biology, physics, and chemistry. Source
Plaintiffs' estimate of total shadow-library data discussed in an unsealed Meta filing appendix.	terabytes About 81.70 terabytes. Scale	Third-party report Dataset SizeMeta Shadow Library	Kadrey v. Meta unsealed filing appendixReviewed 2026-03-11 An unsealed appendix filed in February 2025 says Meta downloaded 81.7TB from shadow libraries. Source
Approximate file count for The Stack v2.	billions of files About 3.28 billion files. Scale	First-party report Dataset SizeThe Stack V2	The Stack v2 dataset cardReviewed 2026-03-11 The dataset card describes The Stack v2 as containing 3.28 billion unique files. Source
Approximate word count for FineWeb2.	trillions of words About 3.00 trillion words. Scale	First-party report Dataset SizeFineweb2	FineWeb2 dataset cardReviewed 2026-03-11 The dataset card describes FineWeb2 as containing more than 3 trillion words. Source

People, groups, and distribution targets

Population, audience, workforce, and contributor counts used in per-person math.

People, groups, and distribution targets benchmarks with editable current values, metadata, and source details.
Input	Current value	Metadata	Source and usage
Rough benchmark for Wall Street Journal newsroom size.	people About 2.00 thousand journalists. Scale	Other Deal Group SizeWsj	Historic public reportingReviewed 2026-03-10 This is a rough historical benchmark for WSJ/Dow Jones newsroom size, not a freshly disclosed audited count.
Estimated global physician headcount, derived from WHO's worldwide doctor-density benchmark.	million physicians About 13.92 million people. Scale	First-party report Group SizeWorld Physicians	WHO Global Health ObservatoryReviewed 2026-04-06 WHO reports a global density of 17.2 doctors per 10,000 people in 2022. This input turns that density into a rough headcount by multiplying it by the repo's January 1, 2025 world-population benchmark. Source 1 scenario uses this
Approximate total employee count for News Corp.	people About 23.90 thousand employees. Scale	First-party report Deal Group SizeNewscorp	News Corp annual reportReviewed 2026-03-10 News Corp reported about 23,900 employees as of June 30, 2024. Source
Projected U.S. population.	millions of people About 341.15 million people. Scale	First-party report Group SizeUsa	U.S. Census BureauReviewed 2026-03-10 Uses the Census Bureau projection for the U.S. population on January 1, 2025. Source
Projected world population.	billions of people About 8.09 billion people. Scale	First-party report Group SizeWorld	U.S. Census BureauReviewed 2026-03-10 Uses the Census Bureau projection for world population on January 1, 2025. Source 1 scenario uses this
Number of source collections included in Common Pile.	source collections About 30.00 sources. Scale	Third-party report Group SizeCommon Pile	Common Pile paperReviewed 2026-03-11 The paper describes Common Pile as drawing from 30 sources. Source
Number of Taylor and Francis Articles	millions of articles About 5.29 million articles. Scale	First-party report Deal Group SizeTaylorandfrancis	Taylor & Francis OnlineReviewed 2026-03-10 Uses the article count publicly shown on the Taylor & Francis platform homepage. Source
Number of physicians who partnered with OpenAI to build HealthBench.	physicians About 262.00 people. Scale	First-party report Group SizeHealthbench	OpenAI HealthBench postReviewed 2026-04-03 HealthBench gives the exact count of 262 physicians, and OpenAI's January 8, 2026 healthcare announcement separately describes a global network of more than 260 licensed physicians across 60 countries. Source 1 scenario uses this
Average daily active unique users on Reddit.	millions of daily active users About 121.40 million daily active users. Scale	First-party report Deal Group SizeReddit	Reddit investor relationsReviewed 2026-03-10 Uses Reddit's Q4 2025 daily active uniques as the latest official public scale benchmark. Source 1 scenario uses this

Data mix and dataset structure

Composition shares, source slices, document size, and related dataset structure.

Data mix and dataset structure benchmarks with editable current values, metadata, and source details.
Input	Current value	Metadata	Source and usage
Internal planning assumption for how many contributions it takes to assemble one Stackipedia document.	contributions per document About 2.00 contributions per document. Scale	Other Conversion RateStackipedia	Project defaultReviewed 2026-03-10 Hypothetical productivity assumption for Stackipedia.
Average number of tokens in a single 'contribution'	tokens per contribution About 1.41 thousand tokens per contribution. Scale	Third-party report Dataset AttributeRedpajama	RedPajama-Data repositoryReviewed 2026-03-10 Derived from the English deduplicated counts reported by the project: 20.5T tokens over 14.5B documents. Source
An average number of words per book	words About 80.00 thousand words. Scale	Other Average LengthBook	Penguin Books explainerReviewed 2026-03-10 This is an industry rule-of-thumb for a typical full-length book, not a census of Books3 itself. Source
Internal planning assumption for the typical size of a Stackipedia document.	tokens per document About 1.50 thousand tokens per document. Scale	Other Dataset AttributeStackipedia	Project defaultReviewed 2026-03-10 Hypothetical default used for Stackipedia scenarios; no external public source is claimed.
Internal planning assumption for how much demand is needed to generate one Stackipedia contribution.	visitors per contribution About 50.00 visitors per contribution. Scale	Other Conversion RateStackipedia	Project defaultReviewed 2026-03-10 Hypothetical engagement assumption for Stackipedia.
Number of Aya examples in the public Tulu 3 SFT mixture.	examples About 100.00 thousand examples. Scale	First-party report Post Training SourceTulu 3	Tulu 3 SFT mixture cardReviewed 2026-03-10 Aya contributes 100,000 multilingual instruction examples in the released mixture. Source
Number of FLAN v2 examples in the public Tulu 3 SFT mixture.	examples About 89.98 thousand examples. Scale	First-party report Post Training SourceTulu 3	Tulu 3 SFT mixture cardReviewed 2026-03-10 FLAN v2 contributes 89,982 examples in the released mixture. Source
Approximate share of Dolma v1.6 tokens that come from scientific papers.	percent of tokens About 2.30 percent. Scale	First-party report Pretraining CompositionDolma V1 6	AllenAI Dolma dataset cardReviewed 2026-03-10 Derived from the published PeS2o paper count in Dolma v1.6. Source
Approximate share of Dolma v1.6 tokens that come from books.	percent of tokens About 0.20 percent. Scale	First-party report Pretraining CompositionDolma V1 6	AllenAI Dolma dataset cardReviewed 2026-03-10 Derived from the Project Gutenberg token count in Dolma v1.6. Source 1 scenario uses this
Approximate share of Dolma v1.6 tokens that come from code.	percent of tokens About 13.40 percent. Scale	First-party report Pretraining CompositionDolma V1 6	AllenAI Dolma dataset cardReviewed 2026-03-10 Derived from the published The Stack token count in Dolma v1.6. Source
Approximate share of Qwen3-Coder pretraining tokens that came from code.	percent of tokens About 70.00 percent. Scale	First-party report Pretraining CompositionQwen3 Coder	Qwen3-Coder official blogReviewed 2026-03-11 Alibaba says 70 percent of Qwen3-Coder pretraining data was code. Source
Approximate share of Dolma v1.6 tokens that come from Reddit.	percent of tokens About 2.90 percent. Scale	First-party report Pretraining CompositionDolma V1 6	AllenAI Dolma dataset cardReviewed 2026-03-10 Derived from the published Reddit token count in Dolma v1.6. Source
Approximate share of Dolma v1.6 tokens that come from web crawls.	percent of tokens About 81.00 percent. Scale	First-party report Pretraining CompositionDolma V1 6	AllenAI Dolma dataset cardReviewed 2026-03-10 Derived from the Common Crawl and C4 token counts in Dolma v1.6. Source
Estimated share of C4 URLs carrying AI-restrictive terms or robots exclusions.	percent of URLs About 45.00 percent. Scale	Third-party report Dataset AttributeC4	Consent in CrisisReviewed 2026-03-11 Longpre et al. estimate that about 45% of C4 URLs are restricted for AI use by terms or robots exclusions. Source
Number of synthetic persona-style examples in the public Tulu 3 SFT mixture.	examples About 284.92 thousand examples. Scale	First-party report Post Training SourceTulu 3	Tulu 3 SFT mixture cardReviewed 2026-03-10 This combines the five Persona subsets in the released Tulu 3 SFT mixture. Source
Number of WildChat GPT-4 examples in the public Tulu 3 SFT mixture.	examples About 100.00 thousand examples. Scale	First-party report Post Training SourceTulu 3	Tulu 3 SFT mixture cardReviewed 2026-03-10 WildChat GPT-4 contributes 100,000 examples in the released mixture. Source

Other inputs

Additional anchors that do not fit one of the main question buckets.

Other inputs benchmarks with editable current values, metadata, and source details.
Input	Current value	Metadata	Source and usage
Main example Public training-compute benchmark for Llama 4 Scout.	millions of H100-80GB GPU hours About 5.00 million h100 80gb gpu hours. Scale	First-party report Training ComputeLlama4 Scout	Meta Llama 4 model cardReviewed 2026-03-11 Meta reports 5.0 million H100-80GB GPU hours for Llama 4 Scout training. Source
The ordinary minimum statutory damages amount for copyright infringement under 17 U.S.C. 504(c).	dollars per work About 750.00 dollars per work. Scale	First-party report Settlement ValueCopyright Statutory Minimum	17 U.S.C. 504(c)Reviewed 2026-05-02 The statute sets ordinary statutory damages at not less than $750 and not more than $30,000 per work, before willfulness or innocent-infringer adjustments. Source
Public full-training compute benchmark for DeepSeek-V3.	millions of H800 GPU hours About 2.79 million h800 gpu hours. Scale	First-party report Training ComputeDeepseek V3	DeepSeek-V3 repositoryReviewed 2026-03-11 DeepSeek reports 2.788 million H800 GPU hours for the full DeepSeek-V3 training run. Source
Number of identified works referenced in the Anthropic books settlement papers.	works About 482.46 thousand works. Scale	Third-party report Settlement Group SizeAnthropic Books	Bartz v. Anthropic settlement orderReviewed 2026-03-11 The October 2025 order says the parties identified 482,460 works for settlement administration. Source
Public pretraining-compute benchmark for DeepSeek-V3.	millions of H800 GPU hours About 2.66 million h800 gpu hours. Scale	First-party report Training ComputeDeepseek V3	DeepSeek-V3 repositoryReviewed 2026-03-11 DeepSeek reports 2.664 million H800 GPU hours for DeepSeek-V3 pretraining. Source
Proposed average payout per identified work in the Anthropic books settlement.	dollars per work About 3.00 thousand dollars per work. Scale	Third-party report Settlement ValueAnthropic Books	Bartz v. Anthropic preliminary-approval memorandumReviewed 2026-03-11 The October 17, 2025 filing describes an expected per-work award of about $3,000 across 482,460 identified works. Source 1 scenario uses this
Public training-compute benchmark for Llama 4 Maverick.	millions of H100-80GB GPU hours About 2.38 million h100 80gb gpu hours. Scale	First-party report Training ComputeLlama4 Maverick	Meta Llama 4 model cardReviewed 2026-03-11 Meta reports 2.38 million H100-80GB GPU hours for Llama 4 Maverick training. Source