Quantifying Expertise Inflation
From Satire to Scientific Measurement
Artificial intelligence (AI) has ignited a boom in online "expertise." Across blogs, newsletters, and social feeds, individuals present technical opinions with confidence that often outstrips their verifiable credentials. This phenomenon, which we term expertise inflation, combines human cognitive biases with the mediation of algorithmic platforms. When people overestimate their own knowledge, they can lose objective perspective and develop unrealistic expectations. Digital search engines reinforce this by acting as algorithmic authorities that users trust to judge credibility. Meanwhile, generative language models can now produce texts that simulate authority without a verifiable source, creating what Startari (2025) calls synthetic ethos.
This article re-imagines the original Expertise Inflation Index (EII) satire as a scientific testbed for measuring rhetorical inflation in AI discourse. Drawing on psycholinguistics, algorithmic accountability, and natural-language processing, we describe how large language models (LLMs) can be used to quantify overconfidence, jargon, and originality in technical writing and propose a research agenda for studying expertise inflation. We also introduce an interactive demonstration of the EII pipeline, available for public exploration, to provide a tangible understanding of the methodology and findings.
Cognitive and Algorithmic Drivers of Expertise Inflation
Social psychology attributes overconfidence to a cognitive bias whereby people think they are more capable than average. This bias not only leads individuals to overestimate their abilities but also reinforces other decision-making biases, making them vulnerable to disappointment and misjudgment (Scribbr, 2025). In the context of AI, where few norms exist and jargon proliferates, overconfidence can manifest as inflated claims about models or frameworks. Fraud detection research shows that deception correlates with linguistic obfuscation; fraudulent papers exhibit lower readability and higher rates of jargon, and obfuscation increases with the number of references (Markowitz & Hancock, 2016). Such findings suggest that measuring jargon density and readability can reveal when writing is designed to mask a lack of substance.
Algorithmic systems also shape perceptions of expertise. Search engines and recommendation algorithms increasingly mediate how people find information. Ståhl et al. (2021) document that "algorithmic authority" describes the trust users place in algorithms to assess relevance and credibility; a search engine acting as an algorithmic authority directs human action by deciding which sources appear trustworthy. This dynamic can amplify voices that appeal to algorithms rather than to genuine expertise. On top of this, generative LLMs can now create fluent texts that mimic credible voices. Startari (2025) shows that synthetic ethos, credibility constructed without human expertise, emerges because language models are trained to produce persuasive discourse. The risk is that AI-generated content may further inflate perceived expertise by simulating authority without accountability.
Methodology: Building a Quantitative EII Pipeline
To move from satire to science, we operationalize expertise inflation through quantifiable linguistic signals. Our updated EII pipeline comprises four stages, seamlessly integrated with a live demonstration to illustrate each step.
Data Collection: Using the Firecrawl scraping library, we gather publicly available articles on AI from sources such as Hacker News, Medium, Substack, and Reddit. Each article is timestamped, and metadata such as author, platform, and engagement metrics are stored.
Pre-processing and Dual-Model Analysis: Texts are normalized and passed through a dual-LLM scoring system. We use models from OpenAI (GPT-4) and Anthropic (Claude) to evaluate each article across seven dimensions of rhetorical inflation:
Confidence Inflation: The ratio of definitive claims (e.g., "will," "must," "is") to hedging terms (e.g., "might," "could," "may").
Jargon Density: The proportion of unexplained technical terms relative to overall word count (Wu et al., 2024; Huang et al., 2022).
Self-Reference: The frequency of first-person pronouns and references to the author's identity or experience.
Originality: Whether content presents novel arguments or repackages existing ideas.
Readability/Obfuscation: Measured using Flesch–Kincaid readability scores and average sentence length. A lower Flesch score indicates greater obfuscation.
Citation Behavior: Presence and density of outbound citations, especially to primary sources.
Ethos Markers: Presence of rhetorical strategies that simulate credibility (e.g., appeals to authority, confident tone, lack of attribution), often indicative of synthetic ethos.
Each dimension is scored, and a weighted combination yields an overall EII score. This dual-model approach, echoing the principle of cross-validation in machine learning (where models are evaluated across multiple partitions of data to estimate how well they generalize), helps ensure the reliability and generalizability of our measurements. We also perform multiple runs on each model and compute the average score, addressing concerns that single prompts can yield misleading results due to randomness (Juzek & Ward, 2024).
Cross-Model Calibration and Synthetic-Ethos Detection: Comparing outputs from multiple LLMs helps identify discrepancies and calibrate bias. We incorporate a classifier trained on Startari’s (2025) synthetic ethos dataset to flag articles whose tone and structure mimic authority without source references. This classifier uses features such as depersonalized tone, adaptive register, and lack of citations.
Storage and Dashboarding: Results are stored in a DynamoDB table for longitudinal analysis and served via a dashboard. Researchers can filter by date, platform, or author, and trend metrics across time. Integration with a human-in-the-loop interface allows manual review to correct misclassifications and refine prompts.
Guided Tour: The EII Interactive Demonstration
To provide a concrete understanding of the EII pipeline, I have developed a public, interactive demonstration. This demo isn't just a marketing tool; it's an integral part of the research, emphasizing transparency, reproducibility, and the practical application of our methodology. You can explore the source code in my public GitHub repository Expertise Inflation Index.
Main Dashboard: The landing page provides an introduction to the Expertise Inflation Index and offers several entry points: Article Analysis to evaluate an individual piece, Discovery Report to scan articles across the AI ecosystem, and an AI Industry Championship that ranks authors by their average EII. The "Launch Demo" button initiates a guided tour.
Research Methodology Presentation: The demo begins with an overview of our scientific framework, emphasizing that this is a serious piece of research, not a humorous side project. Subsequent slides outline our research objectives and point to our public GitHub repository, where the methodology and code are available. Transparency is a core value: anyone can review our scoring rules and reproduce our results. You can access the GitHub repository at https://github.com/dp-pcs/expertise-inflation-index.
Defining "Expertise Inflation": This section clarifies the three constituent terms: Expertise ("the skill or knowledge in a particular area that makes someone an expert"), Inflation ("a general increase in prices and fall in purchasing value"—applied here to confidence and jargon), and Index ("a system used to measure changes in a specified group"). Combined, these define the Expertise Inflation Index: a system to measure the inflation of confidence, jargon, and authority claims in AI-related content.
The Seven-Dimension Scoring System: This slide introduces the seven scoring dimensions outlined previously in the methodology, each with a short description and illustrative scale. For example, on the Confidence Inflation axis, a score of 1 represents humility ("I don’t know everything"), whereas a score of 10 signals oracle-like certainty. On Flesch Reading Ease, we target prose that is accessible to high-school readers; recall that a 90–100 Flesch score corresponds to fifth-grade readability. The combination of subjective dimensions (confidence, jargon, self-reference) with quantitative metrics (readability) helps provide a balanced view.
Our research extends beyond individual articles. We compute aggregate statistics across hundreds of pieces from dozens of AI commentators and publish a leaderboard. The AI Industry Championship page ranks authors by average EII score and highlights extremes: the most inflated experts, the most humble experts, the biggest jargon users, and the most confident writers. Our own writing appears on the board for transparency: the system ranks the primary author of this article, David Proctor, fifteenth out of thirty-two AI experts, with an average EII of 5.3 (moderately inflated!).
Content Discovery & Selection: For users who want to run their own analyses, the discovery module aggregates articles from Substack, Medium, Kaggle, Semantic Scholar, and other AI portals. Each article is displayed with its source, a relevance score, and a short excerpt. Users can filter by source or search for specific authors and then click "Analyze" to generate an EII report.
Live Analysis Results: When an article is analyzed, the system calls both GPT-4 and Claude to score it along the seven dimensions. A modal window displays the EII score, a breakdown by dimension, and a simple label such as "Balanced" or "Inflated." Key phrases extracted from the text help explain why the system assigned those scores. Behind the scenes, we perform multiple runs on each model and compute the average; this repetition is crucial because generative AI outputs vary from run to run.
The interactive demo isn’t just a showcase; it’s part of the research methodology itself. It ensures:
Transparency & Reproducibility: Every number you see in the demo is generated by running the same open-source scripts documented in my GitHub repository. If you disagree with the scoring weights, you can fork the code and adjust them.
Cross-Model Validation: The demo emphasizes the use of two different LLMs to score each article. This practice echoes the general concept of cross-validation in machine learning. Agreement between them increases our confidence in the result.
Repeatability: Each analysis triggers multiple runs per model. This addresses concerns raised by researchers that single prompts can yield misleading results due to randomness (Juzek & Ward, 2024). By averaging over repetitions, I estimate a more stable EII score.
As a proof of concept, I applied the EII pipeline to a sample of 50+ blog posts about AI published between January and June 2025. The pipeline identified that posts with the highest confidence inflation often contained numerous acronyms and unexplained jargon. Readability scores were inversely correlated with confidence inflation, supporting the hypothesis that obfuscation accompanies bold claims. Articles with high self-reference counts tended to promote personal frameworks rather than present empirical evidence. Cross-model disagreement flagged several pieces for synthetic ethos review; manual inspection revealed that these posts were likely AI-generated summaries of other articles.
These preliminary findings illustrate how quantitative measures can uncover patterns in technical discourse. By tracking shifts over time, we can test whether increases in model capabilities coincide with changes in how people write about AI, and whether algorithmic platforms amplify particular writing styles.
Discussion: Implications and Ethical Considerations
Quantifying expertise inflation has broader implications for science communication, education, and platform governance. Overconfidence bias can mislead novices and discourage open inquiry. Algorithmic authority may inadvertently elevate persuasive but unsubstantiated voices. Synthetic ethos generated by LLMs threatens to erode trust if users cannot distinguish AI-generated content from human expertise. By making these phenomena measurable, researchers and platform designers can develop interventions, such as emphasizing source transparency, demoting overly obfuscated content, or prompting authors to define terms.
The seven dimensions of rhetorical inflation used in the EII pipeline reflect diverse signals, from citation behavior and self-reference to obfuscation and ethos markers, that together create a richer diagnostic lens for digital discourse.
However, the pipeline itself relies on LLMs whose reliability can vary across tasks. Models may hallucinate or misclassify subtle rhetorical cues. Human oversight remains essential to interpret scores and refine prompts. Future work should involve multidisciplinary collaboration: linguists can refine definitions of confidence and jargon, ethicists can assess the fairness of algorithmic evaluations, and sociologists can examine how metrics influence author behavior.
Possible Use Cases: From Classrooms to Content Platforms
The quantification of expertise inflation has practical applications across multiple domains, with education standing out as a primary beneficiary. By making rhetorical inflation measurable, the EII pipeline can empower learners, educators, and organizations to foster more critical engagement with digital content. Building on the methodology and findings above, the EII framework offers a range of practical applications.
Educational Curriculum and Digital Literacy
Integrating the EII framework into school and university curricula can help students develop critical digital literacy skills. By analyzing real-world articles and social media posts, students learn to identify overconfidence, jargon, and synthetic ethos, equipping them to navigate a landscape where AI-generated content is increasingly prevalent.
Teacher Training and Professional Development
Educators can use the EII pipeline as a tool for professional growth, learning to recognize and address expertise inflation in the materials they select or create. Workshops and training sessions can demonstrate how to apply EII metrics to classroom resources, fostering a culture of transparency and evidence-based teaching.
AI-Powered Educational Tools
The EII methodology can be embedded in browser extensions or educational apps that assist students and teachers in real time. Such tools could flag overconfident statements, highlight unexplained jargon, or prompt users to seek out original sources, supporting more informed research and writing practices.
Research and Policy in Education
School administrators and policymakers can leverage EII analytics to monitor trends in student-accessed media, identify shifts in the quality of information, and design interventions to promote media literacy. Longitudinal analysis of expertise inflation can inform curriculum updates and public awareness campaigns.
Science Communication and Public Outreach
Beyond formal education, the EII pipeline can support science communicators, journalists, and public institutions in evaluating the credibility of technical content. By quantifying rhetorical inflation, organizations can promote more transparent communication and help audiences distinguish between genuine expertise and persuasive mimicry.
Platform Governance and Content Moderation
Digital platforms and publishers can use EII metrics to inform content moderation strategies, demoting articles that exhibit excessive obfuscation or synthetic ethos. This can help elevate trustworthy voices and reduce the spread of misinformation, especially in fast-moving fields like AI.
Use Case Summary
By applying the EII framework across these contexts, we can encourage a more discerning approach to digital information, one that values clarity, evidence, and genuine expertise over rhetorical flourish.
Conclusion
The Expertise Inflation Index started as a satirical mirror held up to AI hype. Reimagined through the lens of cognitive bias and algorithmic authority, it becomes a research tool for measuring the rhetorical inflation that pervades technical discourse. By combining web scraping, dual‑LLM analysis, and a seven-dimensional scoring system grounded in psycholinguistics and media studies, we can begin to map where confidence outpaces evidence. In doing so, we hope to encourage more transparent communication and to provide a scalable methodology for studying how expertise is constructed in the age of artificial intelligence.
References
Markowitz, D. M., & Hancock, J. T. (2016). Linguistic Obfuscation in Fraudulent Science. Stanford Social Media Lab.
Ståhl, T., et al. (2021). Epistemic Beliefs and Internet Reliance: Is Algorithmic Authority Part of the Picture? Emerald Publishing.
https://pdfs.semanticscholar.org/0c24/fcb4771f5368c4d0cf602ae44178adcb1e7a.pdf
Startari, A. V. (2025). Ethos Without Source: Algorithmic Identity and the Simulation of Credibility. Aacademica.
Wu, B., Wang, Y., et al. (2024). Explain Less, Understand More: Jargon Detection via Personalized Parameter-Efficient Fine-tuning. arXiv preprint.
Huang, Y., et al. (2022). MedJEx: A Medical Jargon Extraction Model with Wiki’s Hyperlink Span and Contextualized Masked Language Model Score. PubMed Central.
Juzek, T., & Ward, D. (2024). Why Does ChatGPT “Delve” So Much? Exploring the Sources of Lexical Overrepresentation in LLMs. arXiv preprint.
Monte Carlo Data (2024). Are We In An AI Bubble? The Top 5 Reliability Pitfalls of Generative AI.
Scribbr (2025). What Is Overconfidence Bias? Definition & Examples.















Great article and fantastic research. I have a hard time seeing how this gets better -- LLMs will continue to improve (in both beneficial and deleterious ways), and an AI company can now train on your findings to make their model produce lower EII scores.