Data Napkin Math

Editable scenarios for asking what AI revenue, copied libraries, data deals, and training-data mixes might cost, fund, or imply.

Custom scenario builder

For questions the curated examples do not cover.

Paying for new labour

Commissioning New Datasets

How much would it cost to pay for a brand new LLM-scale pre-training dataset (say, ) assuming moderate freelance writing wages (say, )?

Dataset Cost1.01 trillion

dollars

Distributing money

Compensating an Entire Copied Library at Settlement Rates

What would it cost to compensate an entire copied library (say, ) at a settlement-style rate per work (say, )?

Total Compensation21.00 billion

dollars

Distributing money

Distributing Money from Data Deals

If we distribute the payments from recent data deal (say, ) to some group of people (say, ), how much will each person get?

Per Person Revenue0.49

dollars

Paying for new labour

Funding Expert Evaluation Questions from Data Deals

How many expert-eval questions could one data deal (say, ) fund at current expert rates (say, )?

Questions Funded300.00 thousand

questions

Paying for new labour

Funding Continuous Physician Oversight for Medical AI

If feeling comfortable with medical AI meant asking some share of the world's physicians (say, ) out of a global pool of roughly to each spend about on audits, review, and oversight, what annual labor budget would that imply at current physician wages (say, )?

Annual Oversight Cost657.00 million

dollars

Paying for new labour

Funding a Domain-Specific Physician Eval Set

How much would it cost to fund a domain-specific eval effort using a panel of physicians (say, ) for some review time each (say, ) at current physician wages (say, )?

Evaluation Cost247.35 thousand

dollars

Paying for new labour

Producing an expert evaluation set

How much would it cost to pay for an eval dataset (say, ) assuming a per-question expert benchmark (say, )?

Dataset Cost500.00 thousand

dollars

Making data scale vivid

Turning Pretraining Composition Shares into Absolute Tokens

How many absolute tokens would a books, code, or social share (say, ) represent at frontier scale (say, )?

Token Slice30.00 billion

tokens

Distributing money

Distributing AI Company Revenue Broadly

If we distribute AI revenue (say, ) to some group of people (say, ), how much will each person get?

Per Person Revenue1.24

dollars

About this page

This site is still evolving. It is built to make AI training-data and distribution debates easier to inspect with shared inputs and simple, editable math.

The resources page keeps our essays and external references in one place, and the GitHub repository is where ongoing updates live.