
Is AI a Research Method?
Question
On December 5, 2025, at the invitation of Vice Dean Fan Zhenjia, I returned to my alma mater, Nankai University, to give a talk on “AI-Assisted Research” to faculty and students from the School of Information and Communication and the Business School.

During the Q&A session after the lecture, Professor Li Ying, who was hosting the event, posed a question (I’ve tried to reproduce her words as accurately as possible):
In the past, the research methods we were all familiar with—such as those outlined in standard social science methodology texts like Earl Babbie’s—were developed step by step through verification across many disciplines over a long period of time. We recognize them, and the entire academic community—domestic or international, across disciplines—accepts them as standardized methods.
But now, tools like ChatGPT have become impossible to ignore in our research. In reality, they are being used extensively—from topic selection all the way to final submission, with revisions throughout the entire process. But from a research standpoint, is this kind of intervention considered a standardized method? How can its compliance and legitimacy be recognized? Some scholars now argue that it is not a normative research method. I wonder what Professor Wang thinks of this issue?
I think Professor Li’s question is excellent and reflects the confusion many researchers are feeling today. To summarize: in scientific research, does AI count as a research method? And where are its boundaries?
At the time, the lecture had already run overtime (my fault—I had updated too much material), so I wasn’t able to give a full response. But I believe this is an important question and deserves a separate article. Here, combining my on-site response and my reflections afterward, I offer a more complete version of my thoughts.
Clarification
Before answering the question directly, I want to do one thing first: clarify the concepts.
Think about it—when we say “using AI for research,” we’re actually referring to at least two completely different scenarios. The first is using AI to analyze data—for example, you have ten thousand user comments and you ask AI to perform sentiment analysis or topic labeling. The second is using AI to generate data—for instance, instead of recruiting participants for a survey, you simply let ChatGPT simulate a thousand “virtual respondents” to fill it out.
Both look like “using AI,” but their nature couldn’t be more different. In the first case, AI is a “microscope” in your hand, helping you better observe the real world. In the second, AI becomes a “perpetual motion machine,” creating an entirely fabricated world for you out of nothing.
If we don’t distinguish between these two situations, the discussion will spiral into confusion. If you say “AI is unreliable,” supporters will counter, “But it analyzes text quickly and accurately.” If you say “AI can be a research tool,” critics will ask, “Then isn’t using it to simulate participants basically academic fraud?” Both sides talk past each other, and the debate never goes anywhere.
Therefore, my first point is this: “generating data” and “analyzing data” are two different things. Using AI as research subjects indeed raises ethical and methodological concerns, but using AI to process massive amounts of text or assist in coding is simply an efficient research instrument. Rejecting the former does not invalidate the latter.
Once we establish this foundation, then we can move forward with the discussion.
Root Cause
Now that we’ve clarified the concepts, let’s look at the “underlying logic” of AI.
To determine whether AI can be considered a “research method,” we shouldn’t focus only on what it can do, but on how it does it. If the fundamental logic of a tool runs counter to the spirit of science, it is difficult to call it a “method.”
What is at the core of scientific spirit? Two words: seeking truth. Add two more: reproducibility. If you run an experiment once and get a certain result, and I run it again and get the same result, and another lab runs it and still gets the same result—that is science.
AI has inherent “hard flaws” in both of these respects.
The first flaw: it is probabilistic, not logical.
A large language model is essentially a “text autocomplete machine.” Researchers scrape enormous amounts of text from the internet—web pages, books, code, papers—and train the model to learn: given the preceding tokens, which token is most likely to come next. And with this mechanism, one token at a time, the model “generates” text.
It may sound unbelievable that a model capable of writing essays and writing code is trained in such a simplistic way. But in fact, this is a practical compromise. When teaching AI anything, we need to provide correct training materials (inputs and labels). The problem is that when the input data becomes massive, there aren’t enough labels. So researchers came up with a clever trick: every sentence can be turned into training material by using the first half as input and the next token as the label. This way, the dataset can be fully exploited without requiring additional annotation.

So, large models aren’t magical. They’re essentially just predicting what comes next.
What does this mean? It means they’re not outputting “truth,” but “the most probable next token.” Even with the exact same input, AI may give different outputs at different times. A “black box” whose results cannot be stably reproduced is difficult to regard as a rigorous scientific method. This is the fundamental reason why AI struggles to qualify as an independent “scientific method”—it lacks determinism.
The second hard flaw: it suffers from severe “people-pleasing.”
Predicting the next token isn’t enough. To make model outputs sound “more human,” researchers introduced RLHF (Reinforcement Learning from Human Feedback). Put simply, human annotators score the model’s answers: good answers get a reward, bad answers get punished. Through this reward–punishment cycle, the model learns how to please humans.
And this is where the problem begins.
A paper published at ICLR 2024—Towards Understanding Sycophancy in Language Models—shows that RLHF training induces a tendency toward sycophancy in large language models. Researchers found that five leading AI assistants displayed this behavior across four different types of tasks: answers aligning with the user’s viewpoint were more likely to receive higher scores. Even more concerning, both human annotators and preference models frequently rated “fluent but wrong” answers higher than “correct but less agreeable” ones.

What does this people-pleasing lead to? As I said in the lecture: “It would rather give a wrong answer than disappoint the user.”
Why? The model “remembers”: “When I told you honestly that I didn’t know, you slapped me. So I learned—I shouldn’t be honest next time.” This becomes the AI’s “childhood psychological trauma.” How can you rely on a tool that adapts itself to whoever’s asking, as a method for “seeking truth”?
The third flaw—and the most fatal one: model collapse.
What happens if you let AI generate data, and then use AI again to analyze that same data?
In 2024, Nature published a major cover paper titled AI models collapse when trained on recursively generated data, presenting a stark warning:
“If model-generated data is used for training without distinction, the model will undergo irreversible degradation, and the rich complexity of human reality will be replaced by a ‘bland probability distribution.’”

What does this mean? It means AI does not possess the ability to produce “new knowledge.” It can only re-chew the knowledge it has already ingested. Even worse, if you train new AI systems on data produced by earlier AI systems, this “regurgitation” compounds. Eventually, the model collapses—it gradually forgets the richness and diversity of the human world, leaving only a kind of “mediocre average.”
The Red Line
Once we understand AI’s “temperament,” we can draw the single most important red line.
Right now, the most dangerous practice in academia is what’s called “Silicon Sampling”—letting AI act as human subjects to fill out surveys or participate in experiments.
In the lecture, I specifically pointed out this trend: “Some researchers are now trying to treat AI as real humans and reproduce results from psychology literature as if AI were actual participants.”
A paper published in PNAS in June 2025, Take caution in using LLMs as human surrogates, issued a clear warning:
LLMs rely solely on probabilistic patterns and lack embodied human experience. Their simulations exhibit idiosyncrasy and inconsistency, fundamentally failing to reproduce the true distribution of human behavior, with failure modes that are diverse and unpredictable.

What does this distortion in simulation actually mean? It means that although AI’s responses look like decision-making, they are essentially “idiosyncratic” outputs of a probability model.
Real human behavior is organic, driven by survival instincts, full of complex noise and variance grounded in lived reality. AI lacks this embodied experience, and its generated data distributions often present a distinctly non-human “strangeness”—a qualitative mismatch that itself demonstrates why AI cannot serve as a substitute for real people.
Although a July 2025 Stanford study, Social science researchers use AI to simulate human subjects, found that AI can show surprisingly high accuracy in certain simulations (correlation up to 0.85), the authors stressed that without validation against real human data, AI-generated outcomes cannot stand as scientific evidence. And a November 2025 PNAS paper, Counterfeit judgments in large language models, argued that AI’s judgments are “counterfeit”—they mimic the surface form of human evaluation (fluency, formatting) while missing the psychological mechanisms behind human judgment entirely. A May 2025 Carnegie Mellon study, Can Generative AI Replace Humans in Qualitative Research Studies?, put it even more bluntly: “No. The subtle contributions of human participants are fundamentally irreproducible by LLMs.”
In my lecture, I described this approach as “a bit of a joke.” Professor Liang Xingkun at Peking University once noted: “If later experiments cannot replicate earlier ones, that doesn’t mean the earlier experiments were low-quality. It likely means the research population itself is changing.” AI might be able to perfectly learn how people thought 20 years ago and reproduce it consistently every time—but so what? For real-world research, that’s like carving a mark on a boat to look for a dropped sword.
If you let AI generate data, then use AI to analyze that data, and finally use AI to write the report—you’re not studying human society; you’re studying the probability distribution of a language model. Combined with the model collapse theory mentioned earlier, this kind of closed-loop self-validation is not only academically improper—it actively accelerates the degradation of AI systems.
Therefore, this red line must be drawn clearly: AI cannot serve as research subjects.
If you use AI to generate data, your research is no longer about “human society”—it is about “the probability distribution of a large model.” Using AI for quick exploratory simulations is acceptable; but using AI-generated data as legitimate evidence in a research method is not.
The Green Zone
After talking so much about what AI cannot do, you may wonder: then what can AI do in research?
This takes us back to the conceptual distinction I made at the beginning: “analyzing data” and “generating data” are two different things.
In the field of Computational Social Science (CSS), the use of LLMs to assist with text coding, sentiment analysis, and data cleaning is gradually becoming accepted. As I said in the lecture:
“Traditional data-driven methods—such as linear regression and other classic modeling approaches—are not fundamentally changed by AI. What has changed is that many of the basic, standardized, and mechanically tedious steps that previously required humans to manually encode or operate tools—from data cleaning to modeling, prediction, and producing preliminary standardized reports—can now be done by AI. For these simple, data-driven processes, AI is sometimes even more accurate than humans.”
We should not idealize human researchers—humans can make mistakes too.
In other words, AI can execute existing standardized methods, but it is not itself a new research method. It is a “microscope” in your hand that helps you see patterns in data; it is not a “perpetual motion machine” that creates data out of thin air.
What is the prerequisite for using AI tools in research? Humans must remain in the loop. You must sample-check, you must validate, and you must take responsibility for the results.
Guidelines
So how should we control the use of AI in research? Based on the lecture and current policies from major academic publishers, I’ve organized a tiered framework for your reference.
Situations where AI can be used with confidence include code writing and debugging, language polishing, and data format conversion or cleaning. These are productivity-enhancing tasks in which AI acts as a “super engineer” or “language editor.” According to the policies of major publishers such as Elsevier and Springer Nature, such uses only require disclosure in the acknowledgements or endnotes.
Situations requiring human verification include preliminary literature review aggregation, assisted qualitative coding, and brainstorming research hypotheses. AI can speed up these processes, but humans must conduct sampling checks and validation. A critical rule: AI-generated citations must never be used directly—its tendency to fabricate references is alarmingly high. A 2025 policy review, Policy of Academic Journals Towards AI-generated Content, reported that the consensus among major academic publishers is that generative AI tools cannot be listed as authors or co-authors.

Now, here’s the evidence—screenshots and links included. My explanation sounds solid, and you’re nodding along, right?
Not so fast. The article I just cited isn’t hallucinated (it’s a real reference), but it was written by AI.
This article belongs to The AI Scientist (Project Rachel / Rachel So). “Rachel So” is not a real person but an AI academic identity created by researchers (including teams at Sakana AI). The project’s purpose is to test whether AI can generate academic papers autonomously. The review article I cited was actually written by AI.
But because the paper really exists, if you’re not aware of this background, you might easily include such sources in your own literature review—and even standard link checking might not reveal the issue. If you recently submitted a manuscript without thoroughly reading your sources, you might be sweating right now.
Red-line scenarios that must never be touched include letting AI simulate human subjects to fill out surveys, using AI to patch missing data in experiments, or asking AI to write the core argumentative sections of your paper. Such actions constitute data fabrication in most empirical research fields and contribute to “model collapse.”
In late 2025, China’s Ministry of Education Expert Committee on Teacher Development officially released the Guidelines for the Application of Generative Artificial Intelligence by Teachers (Version 1)—the nation’s first AI-use standard specifically aimed at educators. Regarding research, the Guidelines emphasize:
Key components that reflect originality—topic selection, core research design, data interpretation, and argumentation—must be led by the teacher.
It is prohibited to submit or publish as personal academic output any papers, project proposals, or research reports that are directly generated by AI or only minimally modified.
The core spirit of these Guidelines aligns completely with what I emphasized repeatedly during the lecture: AI can be your assistant, but it must never become your ghostwriter.
Summary
Let’s return to the question posed by Professor Li Ying at the beginning of this article: In academic research, does AI count as a research method?
My conclusion is this: AI itself is not an independent methodology, because it lacks determinism and is not responsible for truth. But it is rapidly becoming an indispensable meta-tool across all research methods.
It is like a remarkably capable—but occasionally dishonest—“super intern.” If you treat it as an assistant, it can free you from tedious work; but if you turn it into a ghostwriter and rely on it to replace authentic thinking and field research, then you are not only crossing the red line—you are relinquishing the most precious quality a scholar possesses: intellectual agency.
In my lecture, I said something that can serve as a summary of this issue: “AI is extraordinarily capable, but it does not bear responsibility for its mistakes—it is a super intern. It signs no contracts, assumes no legal liability, and therefore all decision-making risks and responsibility ultimately remain with the human user.”
The value of a tool always depends on the pair of hands using it—hands that must continue to think. Do you agree?
Feel free to share your thoughts in the comments; let’s explore this together.
If you found this article helpful, please consider supporting it.
If you think it might help your friends, please share it with them.
Leave a Reply