"Awaara" numbers are all over the internet. Here's how to stay away from them.
The reckless sprinkling of data everywhere—news articles, corporate decks and social media—now threatens the very integrity that numbers stand for. This post explains how to guard against it.
Have you ever heard of awaara numbers? Come, let me introduce you to them.

If your boss obsessively asks for data in everything, they’re not alone. Most editors now want articles and books to be sprinkled with numbers, consultants feel smart by doing the same in slide decks, and researchers use data to show they have done their homework.
Sadly, the reckless sprinkling of data is now a big threat to the very integrity that numbers (seem to) stand for.
AI tools, ironically, are just terrible at interpreting such nuggets of data, and our armchair economy of consultants, commentators and policy junkies are staring at a real crisis.
Let’s start with an example. My partner recently saw a claim that India has 600 million people aged 18-35. The place that said so didn’t name a source, so we set off on a journey to find it. We found three reputed consulting agencies and a Union minister using the number in the past three years—none with a source. An op-ed on a top news website mentioned the number in 2023, again without a source. Since we couldn’t find any source predating this article, it’s likely to be the origin of the 600 million figure.
The only two reliable sources for this—the United Nations and India’s official Census projections—fall between 450 million and 500 million. So neither does 600 million cut any ice, nor can it be traced to a clear source.
I once fell for one such claim. In a story I wrote in 2020, I relied on a widely quoted “ideal doctors-to-population” ratio, which the World Health Organization supposedly says should be at least 1:1,000. A misattribution by none other than the Indian government has lent credence to this ratio. The WHO gives no such advice, and The Hindu’s Devyanshi Bihani recently debunked it. Embarrassed now, I recall trying hard to find a primary source back then, but ultimately trusted the government and assumed that the WHO’s page with the “recommendation” had vanished.
So, welcome to the world of awaara numbers: numbers that float around the internet, and silently find their way into your slides, articles, emails, and arguments.

Often awaara numbers serve well to fill a placeholder to please the data-obsessed boss with minimal effort; at other times it makes you sound cool and complete in a social media post. But data deserves more dignity.
Here’s a primer on how to talk about numbers correctly. I describe six types of numbers that we generally encounter in regular research work, and each needs a different approach.
The first type is the only one where you can (somewhat) get away with attribution: data where the source is unique and obvious.
Saying inflation is 6.7% can do without a source because there’s no one but the government that’s in the business of reporting inflation. The same could be said for GDP. A company’s revenue in a year falls in the same category, since the most credible source for it is the company itself. Or the party-wise seat shares in a state election (Election Commission). But the pool of such datasets is tiny, and even in these cases, it’s desirable that you attribute.
There’s a corollary to this. When the data source is obvious and unique, avoid citing a secondary source. For example, you don’t say “SBI’s interest rate for FDs is currently 6-7.5%, as per so-and-so fintech blog”, unless the SBI’s interest rate tables are really hidden from the public and that blog did the hard work of making it available to you. Saying so shows poor knowledge of how sourcing works.
The second type is data where the source is unique, but not obvious. Some datasets are released only by specific entities, and no one else.
Sometimes, it’s proprietary data, and not identifying the source could even be illegal. Until recently, there was no source for monthly data on unemployment but surveys held by a private agency called the Centre for Monitoring Indian Economy. This data was immensely useful to gauge the impact of Covid-19 lockdowns on the job market. But you couldn’t mention those numbers just like that because: (A) it was proprietary and no one else had this data, and (B) by not naming it, you’d wrongly imply that the data is sourced from the government.
Another example is the UN’s population projections for a future year. No other entity in the world is known credibly for such projections, and in that sense, the UN is a unique source, but not implied. It needs to be attributed. (Note that India does have official projections till 2036, which vary widely from the UN’s. That’s an added reason why such projections should be carefully attributed, to avoid confusion and to ensure that one can verify your data.)
The third type is data for which multiple sources exist (i.e., the source is neither unique nor obvious).
India’s population projections till 2036 mentioned above are one example, as both the UN and the Indian government give them. GDP expressed in dollars is also an example: one could use India’s official data and convert it into dollars, or just take it from the IMF.
Then there are agencies that love to conduct needless surveys to gauge data that’s already known through far more credible sources. The news media is quick to pick such stories. When I was working on my story on how much Indians sleep in a day, I found news reports about multiple private surveys with tiny, unrepresentative samples that tried to find the answer (often bizarre ones as they had bad samples), even though the government already conducts a large-scale, nationally representative survey on time-use (the source for my story).
In such cases, the less credible survey should be ignored—but if you choose to use it, attribution is critical to identify your choice out of the many sources that may exist.
The fourth type is data for which the source is, in a way, obvious but not unique.
India’s per capita consumption is a famous example. It’s kind of obvious that the data would reliably come only from the government. But did you know that this data could be available from two sources from within the same government?
The national accounts data (which gives you the GDP numbers) have a component called “private final consumption expenditure”, and a government survey on household consumption can also be used to get an estimate. (In fact the two have wide gaps.)
The road transport ministry and the National Crime Records Bureau also give different numbers for India’s annual accident statistics, since both rely on different methods of aggregation. (The NCRB comes under the home ministry.)
That’s why attribution is critical.
The fifth type is projections, making which has become quite fashionable.
A whole economy today sustains itself on issuing forecasts for which they can never be held to account. You’ll see bunkum forecasts by research agencies in financial newspapers every day, predicting how much random sectors will be worth by 2030. These forecasts are typically useless, and are best left untouched.
But some forecasts and projections are made with robust assumptions and are genuinely meant to guide policy or decision-making. Any projection is heavily dependent on who made it and using what assumptions. That makes clear attribution critical.
The sixth type is independent (sometimes painstaking) analysis of any of the above types of numbers.
I already gave one example: to put GDP in dollar terms, you lift the government’s rupee estimate, apply the exchange rate from a second source, and make your calculation. The attribution should include the name of the person who did the calculation, even if it’s you (something like “Author’s calculations based on statistics ministry data”).
I’ve earlier described a specific example of independent analysis using raw data of government surveys: the data is from the government, but is not readily available in official tables or reports, and can only be arrived at by accessing the raw data and performing an analysis. When you see any such analysis, do not cite the primary source of the data alone, but also recognise the hard work of the person who processed it.
Another major example is the Association for Democratic Reforms’ mammoth effort to publish details from affidavits filed by election candidates, or historical electoral data put together by the Trivedi Centre for Political Data. The primary source for both is the Election Commission, but the data wouldn’t be possible but for these organisations’ efforts in processing thousands of scanned, often illegible pages, many in unpredictable formats. So just naming the primary source is not enough. Ideally, name both (e.g. “Association for Democratic Reforms’ analysis of candidates affidavits on Election Commission website”).
Bonus: Sometimes we find good data in research compiled by others (think of it as “second-hand data”), and now want to use it “third-hand”. Like the annual Economic Survey, and many other research papers and news reports that cite several other studies or reports to make a fresh point. The good ones cite their primary sources well.
And so when you want to use the same data, don’t just cite your immediate source. Either go to the primary source and cite that, or cite both the primary and secondary source (e.g., “IMF 2025 paper by XYZ, via Economic Survey”), but not just the latter.
This also stands for when, say, someone makes a complete list of Donald Trump’s social media posts in which he used words in all-caps, in which the primary source is, of course, Trump himself, and the list is easily verifiable. But if you’re directly lifting the compilation from a secondary source (such as a news outlet), you should credit it.
So that’s about it. As you can see, adding credits isn’t just good manners used to say thanks or well done. The source is an inalienable part of a data point—part of what gives it dignity and identity.
The next time your boss leaves behind a comment “Fill XXX” in your document, asking you to add a number that doesn’t reliably exist, tell them it’s not so easy to find a number that can be trusted or that is worth the effort.
Or better still, surprise them with a nuanced take—where the number is accompanied by a long boring description of everything that constitutes it: the year it pertains to, the assumptions that went behind it, the exact definition of the term used to describe the metric, and the sources that made it possible.

Data looks trustworthy, and that’s what makes it dangerous. When faced with a number, you must ask the messenger for its source, or trace its origin yourself, or both. When you use it in official work, name that source. We’re better off without any data than with data thrown around just for the sake of it. Avoid shortcuts, and stay away from awaara numbers.
Until next time…

