Inputs Library
Browse and edit the inputs behind the scenarios.
Browse the shared inputs as a spreadsheet or grouped catalog. Related measurements stay together instead of reading like duplicate rows.
Money, payouts, and prices
Revenue anchors, deal values, labour rates, and inference pricing.
| Input | Current value | Metadata | Source and usage |
|---|---|---|---|
Main example Reported annualized revenue run rate for OpenAI. | billions of dollars About 25.00 billion dollars. | News Yearly RevenueOpenai | Reuters citing The InformationReviewed 2026-05-28 Reuters reported that The Information said OpenAI topped $25B in annualized revenue at the end of February 2026; Reuters noted that it could not verify the report. 1 scenario uses this |
Main example Public API price benchmark for GPT-4.1 mini input tokens. | dollars per 1M input tokens About 0.40 dollars per 1m input tokens. | First-party report Inference PriceGpt 4 1 Mini | OpenAI model pageReviewed 2026-05-28 GPT-4.1 mini input pricing is listed at $0.40 per 1M tokens. |
Main example Public API price benchmark for GPT-4o input tokens. | dollars per 1M input tokens About 2.50 dollars per 1m input tokens. | First-party report Inference PriceGpt 4o | OpenAI model pageReviewed 2026-05-28 GPT-4o input pricing is listed at $2.50 per 1M tokens. |
Reported annualized revenue run rate for Microsoft's AI business. | billions of dollars About 37.00 billion dollars. | First-party report Yearly RevenueMicrosoft | Microsoft earnings releaseReviewed 2026-05-28 Microsoft said in April 2026 that its AI business had surpassed a $37B annual revenue run rate. |
Reported annualized revenue run rate for Anthropic. | billions of dollars About 30.00 billion dollars. | First-party report Yearly RevenueAnthropic | AnthropicReviewed 2026-05-28 Anthropic said in April 2026 that its run-rate revenue had surpassed $30B, up from about $9B at the end of 2025. |
Annualized revenue run rate for an AI-cloud infrastructure company. | billions of dollars About 8.31 billion dollars. | First-party report Yearly RevenueCoreweave | CoreWeave earnings releaseReviewed 2026-05-28 CoreWeave reported $2.078B of revenue for Q1 2026; this input annualizes that quarter to $8.312B. |
Public API price benchmark for GPT-4.1 mini output tokens. | dollars per 1M output tokens About 1.60 dollars per 1m output tokens. | First-party report Inference PriceGpt 4 1 Mini | OpenAI model pageReviewed 2026-05-28 GPT-4.1 mini output pricing is listed at $1.60 per 1M tokens. |
Public API price benchmark for GPT-4o output tokens. | dollars per 1M output tokens About 10.00 dollars per 1m output tokens. | First-party report Inference PriceGpt 4o | OpenAI model pageReviewed 2026-05-28 GPT-4o output pricing is listed at $10.00 per 1M tokens. |
Estimated total value of the OpenAI-News Corp content licensing agreement. | millions of dollars About 250.00 million dollars. | News Deal ValueNewscorp | Reuters citing WSJ reportingReviewed 2026-03-10 Reported as worth more than $250M over five years; stored here as a round-number benchmark. |
A conservative benchmark for paying expert contributors to produce evaluation questions. | dollars per question About 200.00 dollars per question. | First-party report Wage DataPhd | Scale AI Humanity's Last Exam leaderboardReviewed 2026-05-02 A $500k prize pool spread across 2,500 final public questions implies about $200 per retained question, excluding organizer and reviewer labor. 2 scenarios use this |
A premium expert-question benchmark based on Humanity's Last Exam prize tiers. | dollars per question About 500.00 dollars per question. | First-party report Wage DataHle Runner Up Prize | Scale AI Humanity's Last Exam leaderboardReviewed 2026-05-02 Scale reports that HLE contributors competed for a $500,000 prize pool, with $500 awards for the next 500 questions after the top 50. |
National mean hourly wage benchmark for U.S. family medicine physicians. | dollars per hour About 122.99 dollars per hour. | First-party report Wage DataFamily Medicine Physician | U.S. Bureau of Labor Statistics OEWSReviewed 2026-05-28 BLS reports a mean hourly wage of $122.99 for family medicine physicians in May 2025. |
National mean hourly wage benchmark for U.S. general internal medicine physicians. | dollars per hour About 128.46 dollars per hour. | First-party report Wage DataPhysician | U.S. Bureau of Labor Statistics OEWSReviewed 2026-05-28 BLS reports a mean hourly wage of $128.46 for general internal medicine physicians in May 2025. 2 scenarios use this |
The disclosed floor for the Microsoft-Taylor & Francis AI licensing agreement. | millions of dollars About 10.00 million dollars. | First-party report Deal ValueTaylorandfrancis Microsoft | Informa market updateReviewed 2026-03-10 Informa disclosed a $10M+ initial fee plus recurring payments; the stored value is a conservative floor, not the full contract total. |
A higher-end professional benchmark for commissioned writing labor. | dollars per word About 0.090 dollars per word. | Third-party report Wage DataGeneric Freelance Higher | Editorial Freelancers Association 2024 rate chartReviewed 2026-05-02 Uses the high end of EFA's book-proposal per-word rate range as a conservative paid-writing proxy. 1 scenario uses this |
A lower-bound professional benchmark for paid per-word labor. | dollars per word About 0.020 dollars per word. | Third-party report Wage DataGeneric Freelance Lower | Editorial Freelancers Association 2024 rate chartReviewed 2026-05-02 Uses a low-end professional editorial benchmark as a rough floor for paid per-word labor. |
Reported yearly value of the Google-Reddit data licensing deal. | millions of dollars About 60.00 million dollars. | News Deal ValueReddit Google | ReutersReviewed 2026-03-10 Reuters reported the Google-Reddit licensing contract was worth about $60M per year. 2 scenarios use this |
Training sizes and benchmark totals
Token counts, example counts, benchmark sizes, and other scale assumptions.
| Input | Current value | Metadata | Source and usage |
|---|---|---|---|
Main example A rule-of-thumb conversion between English words and tokens. | words per token About 0.75 words per token. | First-party report Training DetailOpenai | OpenAI Help CenterReviewed 2026-03-10 OpenAI's English rule of thumb is that one token is about three-quarters of a word. 1 scenario uses this |
Main example Total tokens used to pre-training a model | billions of tokens About 15.00 trillion tokens. | First-party report Dataset SizeLlama3 | Meta Llama 3 model cardReviewed 2026-03-10 Meta reports that Llama 3 was pretrained on about 15 trillion multilingual tokens. 2 scenarios use this |
Main example Total tokens Meta reports using to pretrain Llama 4 Scout. | billions of tokens About 40.00 trillion tokens. | First-party report Dataset SizeLlama4 Scout | Meta Llama 4 model cardReviewed 2026-03-11 Meta reports that Llama 4 Scout was trained on 40 trillion tokens. |
Main example Approximate number of tokens used in the released OLMo 3 7B pretraining mix. | billions of tokens About 5.93 trillion tokens. | First-party report Dataset SizeOlmo3 | AllenAI Dolma 3 mix cardReviewed 2026-03-10 AllenAI reports a 5.93T-token mix for the released OLMo 3 7B recipe. |
Main example Approximate code-token count for The Stack v2. | billions of tokens About 900.00 billion tokens. | First-party report Dataset SizeThe Stack V2 | The Stack v2 dataset cardReviewed 2026-03-11 The dataset card describes The Stack v2 as a roughly 900B-token code corpus. |
Main example Total tokens DeepSeek reports for DeepSeek-V3 training. | billions of tokens About 14.80 trillion tokens. | First-party report Dataset SizeDeepseek V3 | DeepSeek-V3 repositoryReviewed 2026-03-11 DeepSeek reports that DeepSeek-V3 was pretrained on 14.8 trillion high-quality and diverse tokens. |
Planning assumption for yearly physician time spent auditing, reviewing, and supervising medical-AI systems. | hours per physician per year About 40.00 hours. | Other Training DetailMedical Ai Oversight | Scenario planning benchmarkReviewed 2026-04-06 Uses one workweek of annual oversight per participating physician as the default assumption. 1 scenario uses this |
Internal Meta comparison point for the size of a book-heavy corpus. | billions of tokens About 30.00 billion tokens. | Third-party report Dataset SizeBooks3 Plus Gutenberg | Kadrey v. Meta Exhibit CReviewed 2026-03-11 An unsealed February 28, 2023 Meta email says Books3 plus Gutenberg is about 30B tokens. |
Approximate size of the Common Pile open corpus. | terabytes About 8.00 terabytes. | Third-party report Dataset SizeCommon Pile | Common Pile paperReviewed 2026-03-11 The paper describes Common Pile as an 8TB openly licensed corpus. |
Approximate storage size for FineWeb2. | terabytes About 20.00 terabytes. | First-party report Dataset SizeFineweb2 | FineWeb2 dataset cardReviewed 2026-03-11 The dataset card describes FineWeb2 as a 20TB corpus assembled from 96 Common Crawl snapshots. |
Approximate total storage size for The Stack v2. | terabytes About 67.53 terabytes. | First-party report Dataset SizeThe Stack V2 | The Stack v2 dataset cardReviewed 2026-03-11 The dataset card describes The Stack v2 as 67.53TB in total before deduplication. |
Planning assumption for the share of physicians who would participate in an ongoing medical-AI oversight regime. | percent of physicians About 1.00 percent. | Other Training DetailMedical Ai Oversight | Scenario planning benchmarkReviewed 2026-04-06 Starts at 1 percent to reflect a deliberately macro-scale oversight regime rather than a small benchmark panel. 1 scenario uses this |
Project-planning benchmark for physician time spent creating and reviewing a domain-specific eval set. | hours per physician About 8.00 hours. | Other Training DetailMedical Eval | Project planning benchmarkReviewed 2026-04-03 Uses one workday per physician reviewer as a default assumption for drafting examples, refining rubrics, and calibration. 1 scenario uses this |
Number of DPO preference pairs in the released Tulu 3 70B preference mixture. | generation pairs About 337.19 thousand generation pairs. | First-party report Post Training SizeTulu 3 70b | Tulu 3 70B preference mixture cardReviewed 2026-03-10 The released Tulu 3 70B preference mixture contains 337,186 generation pairs. |
Number of supervised fine-tuning examples in the public Tulu 3 SFT mixture. | examples About 939.34 thousand examples. | First-party report Post Training SizeTulu 3 | Tulu 3 SFT mixture cardReviewed 2026-03-10 The released Tulu 3 SFT mixture contains 939,344 examples. |
Approximate raw-token count across the published Comma v0.1 training stages. | billions of raw tokens About 639.80 billion tokens. | First-party report Dataset SizeComma V0 1 | Comma v0.1 training dataset cardReviewed 2026-03-11 The dataset card lists 463.6B raw tokens in the main stage and 176.2B raw tokens in the cooldown stage; this input sums both stages. |
Plaintiffs' estimate of the Z-Library and LibGen portion of Meta's alleged shadow-library downloads. | terabytes About 35.70 terabytes. | Third-party report Dataset SizeZlibrary Libgen | Kadrey v. Meta unsealed filing appendixReviewed 2026-03-11 An unsealed appendix filed in February 2025 says 35.7TB of the downloads came from Z-Library and LibGen. |
A small-network planning assumption for ongoing medical-AI oversight. | percent of physicians About 0.010 percent. | Other Training DetailMedical Ai Oversight Specialty Network | Scenario planning benchmarkReviewed 2026-05-02 Uses 0.01 percent of physicians as a deliberately small specialty-network comparison against the larger default oversight regime. |
Internal planning target for the size of a hypothetical Stackipedia corpus. | billions of tokens About 1.00 billion tokens. | Other Target MetricStackipedia | Project defaultReviewed 2026-03-10 Internal target for a hypothetical public-knowledge corpus, not an observed empirical statistic. |
Lower-bound count of books Anthropic was found to have copied from LibGen. | millions of books About 5.00 million books. | Third-party report Total BooksLibgen | Bartz v. Anthropic summary-judgment orderReviewed 2026-03-11 Judge Alsup wrote that Anthropic downloaded at least five million books from LibGen. |
Lower-bound count of books Anthropic was found to have copied from Pirate Library Mirror. | millions of books About 2.00 million books. | Third-party report Total BooksPirate Library Mirror | Bartz v. Anthropic summary-judgment orderReviewed 2026-03-11 Judge Alsup wrote that Anthropic downloaded at least two million books from Pirate Library Mirror. |
Lower-bound count of pirated books the court says Anthropic copied into its central library. | millions of books About 7.00 million books. | Third-party report Total BooksAnthropic Pirated Library | Bartz v. Anthropic summary-judgment orderReviewed 2026-03-11 Judge Alsup wrote that Anthropic assembled a central library of more than seven million pirated books. 1 scenario uses this |
Approximate document count for FineWeb2. | billions of documents About 5.00 billion documents. | First-party report Dataset SizeFineweb2 | FineWeb2 dataset cardReviewed 2026-03-11 The dataset card describes FineWeb2 as containing more than 5 billion documents. |
Number of books in Books3 | books About 196.64 thousand books. | Third-party report Total BooksBooks3 | Bartz v. Anthropic summary-judgment orderReviewed 2026-03-11 The June 2025 order identifies the Books3 dataset as containing 196,640 books. |
Total tokens Meta reports using to pretrain Llama 4 Maverick. | billions of tokens About 22.00 trillion tokens. | First-party report Dataset SizeLlama4 Maverick | Meta Llama 4 model cardReviewed 2026-03-11 Meta reports that Llama 4 Maverick was trained on 22 trillion tokens. |
Total tokens used to pre-training a model | billions of tokens About 5.00 trillion tokens. | First-party report Dataset SizeOlmo2 | AllenAI OLMo 2 model cardReviewed 2026-03-10 AllenAI reports 5 trillion training tokens for OLMo 2. |
Total tokens Alibaba reports using to pretrain Qwen3-Coder. | billions of tokens About 7.50 trillion tokens. | First-party report Dataset SizeQwen3 Coder | Qwen3-Coder official blogReviewed 2026-03-11 Alibaba says Qwen3-Coder was pretrained on 7.5 trillion tokens. |
Total tokens Alibaba reports using to pretrain Qwen3. | billions of tokens About 36.00 trillion tokens. | First-party report Dataset SizeQwen3 | Qwen3 official blogReviewed 2026-03-11 Alibaba says Qwen3 models were pretrained on about 36 trillion tokens across 119 languages and dialects. |
The number of public benchmark questions in Humanity's Last Exam. | questions About 2.50 thousand questions. | First-party report Dataset SizeHle | Scale AI Humanity's Last Exam leaderboardReviewed 2026-05-02 Scale's April 3, 2025 update says HLE was finalized to 2,500 questions. 1 scenario uses this |
The main GPQA benchmark size, useful as a smaller expert-question benchmark. | questions About 448.00 questions. | Third-party report Dataset SizeGpqa Main | GPQA paper and dataset cardReviewed 2026-05-02 The GPQA paper and dataset card describe the main benchmark as 448 expert-written multiple-choice questions in biology, physics, and chemistry. |
Plaintiffs' estimate of total shadow-library data discussed in an unsealed Meta filing appendix. | terabytes About 81.70 terabytes. | Third-party report Dataset SizeMeta Shadow Library | Kadrey v. Meta unsealed filing appendixReviewed 2026-03-11 An unsealed appendix filed in February 2025 says Meta downloaded 81.7TB from shadow libraries. |
Approximate file count for The Stack v2. | billions of files About 3.28 billion files. | First-party report Dataset SizeThe Stack V2 | The Stack v2 dataset cardReviewed 2026-03-11 The dataset card describes The Stack v2 as containing 3.28 billion unique files. |
Approximate word count for FineWeb2. | trillions of words About 3.00 trillion words. | First-party report Dataset SizeFineweb2 | FineWeb2 dataset cardReviewed 2026-03-11 The dataset card describes FineWeb2 as containing more than 3 trillion words. |
People, groups, and distribution targets
Population, audience, workforce, and contributor counts used in per-person math.
| Input | Current value | Metadata | Source and usage |
|---|---|---|---|
Rough benchmark for Wall Street Journal newsroom size. | people About 2.00 thousand journalists. | Other Deal Group SizeWsj | Historic public reportingReviewed 2026-03-10 This is a rough historical benchmark for WSJ/Dow Jones newsroom size, not a freshly disclosed audited count. |
Estimated global physician headcount, derived from WHO's worldwide doctor-density benchmark. | million physicians About 13.92 million people. | First-party report Group SizeWorld Physicians | WHO Global Health ObservatoryReviewed 2026-04-06 WHO reports a global density of 17.2 doctors per 10,000 people in 2022. This input turns that density into a rough headcount by multiplying it by the repo's January 1, 2025 world-population benchmark. 1 scenario uses this |
Approximate total employee count for News Corp. | people About 23.90 thousand employees. | First-party report Deal Group SizeNewscorp | News Corp annual reportReviewed 2026-03-10 News Corp reported about 23,900 employees as of June 30, 2024. |
Projected U.S. population. | millions of people About 341.15 million people. | First-party report Group SizeUsa | U.S. Census BureauReviewed 2026-03-10 Uses the Census Bureau projection for the U.S. population on January 1, 2025. |
Projected world population. | billions of people About 8.09 billion people. | First-party report Group SizeWorld | U.S. Census BureauReviewed 2026-03-10 Uses the Census Bureau projection for world population on January 1, 2025. 1 scenario uses this |
Number of source collections included in Common Pile. | source collections About 30.00 sources. | Third-party report Group SizeCommon Pile | Common Pile paperReviewed 2026-03-11 The paper describes Common Pile as drawing from 30 sources. |
Number of Taylor and Francis Articles | millions of articles About 5.29 million articles. | First-party report Deal Group SizeTaylorandfrancis | Taylor & Francis OnlineReviewed 2026-03-10 Uses the article count publicly shown on the Taylor & Francis platform homepage. |
Number of physicians who partnered with OpenAI to build HealthBench. | physicians About 262.00 people. | First-party report Group SizeHealthbench | OpenAI HealthBench postReviewed 2026-04-03 HealthBench gives the exact count of 262 physicians, and OpenAI's January 8, 2026 healthcare announcement separately describes a global network of more than 260 licensed physicians across 60 countries. 1 scenario uses this |
Average daily active unique users on Reddit. | millions of daily active users About 121.40 million daily active users. | First-party report Deal Group SizeReddit | Reddit investor relationsReviewed 2026-03-10 Uses Reddit's Q4 2025 daily active uniques as the latest official public scale benchmark. 1 scenario uses this |
Data mix and dataset structure
Composition shares, source slices, document size, and related dataset structure.
| Input | Current value | Metadata | Source and usage |
|---|---|---|---|
Internal planning assumption for how many contributions it takes to assemble one Stackipedia document. | contributions per document About 2.00 contributions per document. | Other Conversion RateStackipedia | Project defaultReviewed 2026-03-10 Hypothetical productivity assumption for Stackipedia. |
Average number of tokens in a single 'contribution' | tokens per contribution About 1.41 thousand tokens per contribution. | Third-party report Dataset AttributeRedpajama | RedPajama-Data repositoryReviewed 2026-03-10 Derived from the English deduplicated counts reported by the project: 20.5T tokens over 14.5B documents. |
An average number of words per book | words About 80.00 thousand words. | Other Average LengthBook | Penguin Books explainerReviewed 2026-03-10 This is an industry rule-of-thumb for a typical full-length book, not a census of Books3 itself. |
Internal planning assumption for the typical size of a Stackipedia document. | tokens per document About 1.50 thousand tokens per document. | Other Dataset AttributeStackipedia | Project defaultReviewed 2026-03-10 Hypothetical default used for Stackipedia scenarios; no external public source is claimed. |
Internal planning assumption for how much demand is needed to generate one Stackipedia contribution. | visitors per contribution About 50.00 visitors per contribution. | Other Conversion RateStackipedia | Project defaultReviewed 2026-03-10 Hypothetical engagement assumption for Stackipedia. |
Number of Aya examples in the public Tulu 3 SFT mixture. | examples About 100.00 thousand examples. | First-party report Post Training SourceTulu 3 | Tulu 3 SFT mixture cardReviewed 2026-03-10 Aya contributes 100,000 multilingual instruction examples in the released mixture. |
Number of FLAN v2 examples in the public Tulu 3 SFT mixture. | examples About 89.98 thousand examples. | First-party report Post Training SourceTulu 3 | Tulu 3 SFT mixture cardReviewed 2026-03-10 FLAN v2 contributes 89,982 examples in the released mixture. |
Approximate share of Dolma v1.6 tokens that come from scientific papers. | percent of tokens About 2.30 percent. | First-party report Pretraining CompositionDolma V1 6 | AllenAI Dolma dataset cardReviewed 2026-03-10 Derived from the published PeS2o paper count in Dolma v1.6. |
Approximate share of Dolma v1.6 tokens that come from books. | percent of tokens About 0.20 percent. | First-party report Pretraining CompositionDolma V1 6 | AllenAI Dolma dataset cardReviewed 2026-03-10 Derived from the Project Gutenberg token count in Dolma v1.6. 1 scenario uses this |
Approximate share of Dolma v1.6 tokens that come from code. | percent of tokens About 13.40 percent. | First-party report Pretraining CompositionDolma V1 6 | AllenAI Dolma dataset cardReviewed 2026-03-10 Derived from the published The Stack token count in Dolma v1.6. |
Approximate share of Qwen3-Coder pretraining tokens that came from code. | percent of tokens About 70.00 percent. | First-party report Pretraining CompositionQwen3 Coder | Qwen3-Coder official blogReviewed 2026-03-11 Alibaba says 70 percent of Qwen3-Coder pretraining data was code. |
Approximate share of Dolma v1.6 tokens that come from Reddit. | percent of tokens About 2.90 percent. | First-party report Pretraining CompositionDolma V1 6 | AllenAI Dolma dataset cardReviewed 2026-03-10 Derived from the published Reddit token count in Dolma v1.6. |
Approximate share of Dolma v1.6 tokens that come from web crawls. | percent of tokens About 81.00 percent. | First-party report Pretraining CompositionDolma V1 6 | AllenAI Dolma dataset cardReviewed 2026-03-10 Derived from the Common Crawl and C4 token counts in Dolma v1.6. |
Estimated share of C4 URLs carrying AI-restrictive terms or robots exclusions. | percent of URLs About 45.00 percent. | Third-party report Dataset AttributeC4 | Consent in CrisisReviewed 2026-03-11 Longpre et al. estimate that about 45% of C4 URLs are restricted for AI use by terms or robots exclusions. |
Number of synthetic persona-style examples in the public Tulu 3 SFT mixture. | examples About 284.92 thousand examples. | First-party report Post Training SourceTulu 3 | Tulu 3 SFT mixture cardReviewed 2026-03-10 This combines the five Persona subsets in the released Tulu 3 SFT mixture. |
Number of WildChat GPT-4 examples in the public Tulu 3 SFT mixture. | examples About 100.00 thousand examples. | First-party report Post Training SourceTulu 3 | Tulu 3 SFT mixture cardReviewed 2026-03-10 WildChat GPT-4 contributes 100,000 examples in the released mixture. |
Other inputs
Additional anchors that do not fit one of the main question buckets.
| Input | Current value | Metadata | Source and usage |
|---|---|---|---|
Main example Public training-compute benchmark for Llama 4 Scout. | millions of H100-80GB GPU hours About 5.00 million h100 80gb gpu hours. | First-party report Training ComputeLlama4 Scout | Meta Llama 4 model cardReviewed 2026-03-11 Meta reports 5.0 million H100-80GB GPU hours for Llama 4 Scout training. |
The ordinary minimum statutory damages amount for copyright infringement under 17 U.S.C. 504(c). | dollars per work About 750.00 dollars per work. | First-party report Settlement ValueCopyright Statutory Minimum | 17 U.S.C. 504(c)Reviewed 2026-05-02 The statute sets ordinary statutory damages at not less than $750 and not more than $30,000 per work, before willfulness or innocent-infringer adjustments. |
Public full-training compute benchmark for DeepSeek-V3. | millions of H800 GPU hours About 2.79 million h800 gpu hours. | First-party report Training ComputeDeepseek V3 | DeepSeek-V3 repositoryReviewed 2026-03-11 DeepSeek reports 2.788 million H800 GPU hours for the full DeepSeek-V3 training run. |
Number of identified works referenced in the Anthropic books settlement papers. | works About 482.46 thousand works. | Third-party report Settlement Group SizeAnthropic Books | Bartz v. Anthropic settlement orderReviewed 2026-03-11 The October 2025 order says the parties identified 482,460 works for settlement administration. |
Public pretraining-compute benchmark for DeepSeek-V3. | millions of H800 GPU hours About 2.66 million h800 gpu hours. | First-party report Training ComputeDeepseek V3 | DeepSeek-V3 repositoryReviewed 2026-03-11 DeepSeek reports 2.664 million H800 GPU hours for DeepSeek-V3 pretraining. |
Proposed average payout per identified work in the Anthropic books settlement. | dollars per work About 3.00 thousand dollars per work. | Third-party report Settlement ValueAnthropic Books | Bartz v. Anthropic preliminary-approval memorandumReviewed 2026-03-11 The October 17, 2025 filing describes an expected per-work award of about $3,000 across 482,460 identified works. 1 scenario uses this |
Public training-compute benchmark for Llama 4 Maverick. | millions of H100-80GB GPU hours About 2.38 million h100 80gb gpu hours. | First-party report Training ComputeLlama4 Maverick | Meta Llama 4 model cardReviewed 2026-03-11 Meta reports 2.38 million H100-80GB GPU hours for Llama 4 Maverick training. |