Photo-illustration: WIRED Staff; Getty Images

The Surprising Synergy Between Acupuncture and AI

Now that I work in machine learning, I’m often struck by the parallels between this cutting-edge technology and traditional Chinese medicine.

I used to fall asleep at night with needles in my face. One needle shallowly planted in the inner corners of each eyebrow, one per temple, one in the middle of each eyebrow above the pupil, a few by my nose and mouth. I’d wake up hours later, the hair-thin, stainless steel pins having been surreptitiously removed by a parent. Sometimes they’d forget about the treatment, and in the morning we’d search my pillow for needles. My very farsighted left eye gradually became only somewhat farsighted, and my mildly nearsighted right eye eventually achieved a perfect score at the optometrist’s. By the time I was six, my glasses had disappeared from the picture albums.

The story of my recovered eyesight was the first thing I’d think to mention when people found out that my parents are specialists in traditional Chinese medicine (TCM) and asked me what I thought of the practice. It was a concrete and rather miraculous firsthand experience, and I knew what it meant—to begin to see the world more clearly while under my mother and father’s care.

Otherwise, I rarely knew what to say. I would recall hearing TCM mentioned in relation to “poor evidence” or “badly designed studies” and feel challenged to provide some defense for a line of work seen as illegitimate. I would feel a pull of obligation to defend Chinese medicine as a way to protect my parents, their care and toils, but also an urge to resist shouldering that obligation for the sake of someone else’s fleeting curiosity and perhaps entertainment. 

Mostly, I wished I had a better understanding of TCM, even just for myself. Now that I work in machine learning (ML), I’m often struck by the parallels between this cutting-edge technology and the ancient practice of TCM. For one, I can’t quite explain either satisfactorily.

It’s not that there aren’t explanations for how the field of Chinese medicine works. I, and many others, just find the theories dubious. According to both classical and modern theory, blood and qi—pronounced “chi,” variously interpreted to mean something like vapor—move around and regulate the body, which itself is not considered separate from the mind.

Qi flows through channels called meridians. The anatomical charts hanging on the walls of my parents’ clinics feature meridians scoring the body in neat, straight lines—from chest to finger, or from the waist to the inner thigh—overlaid on diagrams of the bones and organs. At various points along these meridians, needles can be inserted to remove blockages, improving the flow of qi. All TCM treatments ultimately revolve around qi: Acupuncture banishes unhealthy qi and circulates healthy qi from the outside; herbal medicines do so from the inside.

On my parents’ charts, the meridians and acupuncture points are depicted like a subway map and seem to float slightly upward, tethered only loosely to the recognizable shapes of intestines and joints underneath. This lack of visual correspondence is reflected in the science; little evidence has been found for the physical existence of meridians, or of qi. Studies have investigated whether meridians are special conduits for electrical signals—but these experiments were badly designed—or whether they are related to fascia, the thin stretchy tissue that surrounds almost all internal body parts. All of this work is recent, and results have been inconclusive.

In contrast, the effectiveness of acupuncture, particularly for ailments like neck disorders and low back pain, is well-supported in modern scientific journals. Insurance companies are convinced; most of my mother’s patients come to her for acupuncture because it’s covered by New Zealand’s national insurance plan.

In other words, acupuncture works, but we aren’t sure why. Langevin and Wayne, both Harvard Medical School researchers, have suggested that although acupuncture has become more empirically legitimized, it is held back by the theory behind it. The idea of qi flow as the essential variable, a body-society whose health depends on the state of its networks, is an elegant but inadequate metaphor.

The gold standard of evidence is the randomized controlled trial (RCT), believed to perfectly capture whether and how an intervention causes an outcome. All participants of an RCT are randomly assigned to either the intervention or the control group, and ideally neither the administrators nor the participants know which treatment they’re getting. This minimizes biases and allows researchers to establish that the intervention, and only the intervention, has caused the difference in outcome between the two groups.

To design an RCT for acupuncture requires answering the crucial question: What is acupuncture?

Is it just a needle puncturing the skin at fixed spots or is it more than that? If the researcher thinks it’s just skin-puncturing, they can isolate that very specific effect and make everything else exactly the same by using “sham needles,” that don’t actually penetrate the skin (or do so only a tiny bit) on the control group. The doctor can also try to isolate the effects by needling only the acupuncture points on one set and needling non-acupuncture points for the control group. This “blinds” the patient and/or the doctor to which intervention is being administered—if there are needles involved in both, it’s harder to tell.

But people can often see the fake needle for what it is: a sham. Perhaps I, the control-group patient, am not fully blinded to the treatment I get because I know what it’s like to receive acupuncture—it’s pretty easy to tell if there is a needle in one’s skin. Likewise, the doctor will probably know what treatment they’re administering because they are familiar with the sensation of inserting a needle.

What counts as acupuncture also varies between traditions. Many Japanese acupuncturists say that even superficial needling, or needling points close to the true acupuncture points, produces an effect. Empirically, the placebo effect from sham acupuncture is higher than that from placebo pills; it’s possible that the sham intervention has some real effect that then becomes nullified as placebo. In one review, the acupuncture trials studied were so varied that the authors felt it was impossible to say whether different sham techniques are associated with specific results. They even said that summarizing all of them as “placebo” seemed “misleading and scientifically unacceptable.”

There’s little evidence so far for the theory underpinning acupuncture, but there is decent empirical evidence for acupuncture itself. This is surprisingly similar to AI. We don’t really understand it, the theory is slim and unsatisfying, but it indisputably “works” in many ways.

When people ask me what artificial intelligence really is, I explain that “AI” usually refers to a specific ML technique called deep learning, which is the practice of making artificial “neural networks” that can solve problems by iteratively improving at a task. Here are a thousand pictures of dogs and not-dogs labeled as such, please figure out what dogs look like. The neural network tries its best to classify a given dog picture, receives feedback on how well it has performed, and is updated accordingly. Trial and error continues with each piece of training data: another dog picture, a picture that’s not of a dog, and so forth. The AI “learns.”

If you look at the code for what the neural network actually is, it’s numbers in uniform matrices called parameters. The input, such as a dog picture, is multiplied by these parameters to spit out some well-formed answer—yes, that’s a dog! In large neural networks like GPT-3 or Bloom, there are hundreds of billions of numerical parameters. From this, what can we deduce of the actual mechanics of the rule-set unearthed by the iterative learning loop, the reasoning steps by which the perfected neural network performs its logic? Like the causal pathways through which Chinese medicine might exert its reason, we have no idea.

You would be hard-pressed to find a TCM practitioner who thinks that acupuncture is as simple as puncturing a fixed region of skin. Acupuncture is traditionally performed as a complex intervention, specialized to the patient, and considered part of a broader treatment package. Its effects also depend on the practitioner’s needle manipulation techniques and treatment dosage, and there are cultural and geographical differences in administration that are difficult to capture.

The tradition’s philosophical approach is not to treat everyone in a standardized way—and such therapies are difficult for science to pin down. In Chinese medicine, there is a difference between identifying symptoms and arriving at a diagnosis. Diseases are believed to show up differently depending on the person’s body constitution and the environment. And in contrast to Western medicine’s more standardized treatments, symptom differentiation greatly impacts which treatment is chosen. Efforts to faithfully replicate this in a trial while controlling for variations in the prescription and recruiting enough people with the same diagnosis and symptom pattern set would invite mayhem.

All of this becomes simplified to a basic version of the practice in controlled trials. Variations that are difficult to control (such as the specific way the practitioner inserts and twists the needle) are often poorly documented. So, not only are fixed treatment protocols oversimplified, we really don’t know what specifically was administered from reading the research.

Our methods for determining causality—the most important relationship in science—are quite limited. RCTs are useful precisely because they are standardized, replicable, and impersonal. But in attempting to separate, isolate, and control the complexity of human experience—and cut out variation—we end up with a trial that looks very different from the experiences we have with our bodies and in health care. Some types of variation are undesirable noise; others are valuable context and detail not captured by the model. 

The cherry on top: While the blinded RCT attempts to exorcise the subjectivity of the human mind in order to pursue objective truth, Ted Kaptchuk, a placebo researcher at Harvard Medical School, has shown that the RCT apparatus itself can generate sources of potential bias. We painstakingly banish bias, through great sacrifice, but in doing so we introduce new biases.

I wonder if the drug-forward, prescription-driven approach to medicine has developed partly as a result of pills being the easiest kind of intervention to run RCTs with. Techniques that are easy to construct a placebo control for can lay claim to the gold standard test.

The theoretical underpinnings of TCM may be nebulous, but the techniques aim to capture and consider the interconnectedness of the body, mind, and environment, about which we still have much to learn. TCM is flawed and less rigorous than an RCT, but it does try to account for, rather than cut out, the complexity we don’t understand. Given our limited statistical tools, interventions that are too complex to be isolated and mathematically captured are more subject to doubts—but what is the human body if not a supremely complex system?

Unlike RCTs, machine learning models are developed to embrace complexity. Language models like OpenAI’s ChatGPT are trained on massive amounts of data to predict the most likely continuation of a text sequence. A language model will chew up the internet, learning billions of subtle correlations and digesting the cyberspace stew we feed it in enigmatic ways. 

If you prompt it to say something, it will somehow let out an impressive, strangely specific burp, giving you answers that are plausible, and often even appropriate or correct. For example, if you give ChatGPT the now famous prompt asking for a “Biblical verse explaining how to remove a peanut butter sandwich from a VCR,” it will, having surely never seen this in its training data, execute the request perfectly. It works—but it can’t provide an explanation for the reasoning behind these computed probabilities, much as we can’t provide anatomical correspondence for qi and meridians.

In statistics, there is a known distinction between explainability and prediction, as well as a trade-off: The most accurate explanatory model of a phenomenon is not always the best predictive model. Machine learning is an approach that accepts the predictive power of the Faustian bargain, trading away explainability.

Following a surprisingly similar framework, the herbal or acupuncture prescriptions that my father gives me are based on a holistic evaluation of my personal and environmental data points (e.g. seasonal changes in the weather, my diet, my stress levels), information that I’ve not seen conventional doctors ask for. The prescriptive outputs are also strangely specific yet mysterious in origin, just like language model generations. Dad will prescribe me mixtures of 10 or 15 different herbs, and I’ll ask him how he came up with the formulation, but I won’t understand the explanation—something about qi interacting in various parts of my body. My poor comprehension might be attributed to my limited grasp of TCM or the fact that this system, like ML, isn’t built for easy mechanistic understanding. Then again, the explanation may not actually correspond to whatever mechanisms the herbs act through.

The field of AI used to have easier-to-interpret algorithms, ones that don’t depend on inscrutably entangled learnings. People used to start with logical building blocks to put together more complex systems, assuming humans and machines can reason by following a set of formal rules that specify what they should do in any circumstance. This is now known as “good old-fashioned AI”; by and large, researchers have abandoned this in favor of computational systems that can, through trial and error, cybernetically correct themselves toward the best outcome. We embrace the messy, emergent logic that comes out of this process, rather than building up a modular logic, because we’re won over by the great predictive power of this approach.

We now try to squeeze explanations out of the more complex model, hoping that we can find some understandable structure within. But there’s no reason to expect a simple logic to fall out, and current approaches to “explainable AI” aren’t guaranteed to correspond to the real internal “reasoning” of the model. Much like the current theories of acupuncture that Langevin and Wayne criticized, a false theory is worse than no theory.

my father has always seemed unbothered by the lack of anatomical correspondence between meridians and human tissue. Nor is he compelled to figure out how to harmonize the two discordant ontologies into one consistent system. He is happy to accept the coexistence of very different philosophies of health. Back in China, before we immigrated to New Zealand, he did general surgery (i.e. Western medicine) in a hospital in Guangzhou. He would often say that Western and Chinese medical treatments are complementary, useful in different circumstances.

Chinese medicine as an ancient practice evolved millennia before anybody discovered the cell. TCM stumbled into some interesting discoveries; variolation, which is a precursor to vaccination, was practiced as early as the 10th century, when a powder made of smallpox scabs was blown up a statesman’s nose, rewarding him with immunity. According to Chinese medicine historian Paul Unschuld, British and American missionaries brought Western medicine over to China in the 1830s. It took off among locals. Christianity felt incomprehensible to them, but the foreign medical system made sense and seemed useful.

By the early 20th century, people were quarreling over Chinese medicine’s role in China’s future, especially in the face of Western public health and epidemiology. Numerous ministers decided to start abolishing Chinese medicine altogether, to the chagrin of locals. At the same time, journalist James Reston had just discovered and reported on acupuncture for The New York Times, so the Chinese ministers were surprised to find themselves receiving enthusiastic queries about this ancient practice that they saw as boring. In the ’50s, the Chinese government was fielding interest in traditional medicine from other countries as it grappled with the need to modernize its medical system in a way that respected traditional ways of life and accepted remedies.

Thus began a process of standardizing, adapting, and keeping the most rational elements of traditional medicine while making TCM compatible with Western medicine. While Westerners saw this as an alternative to their system, Chinese politicians badly wanted TCM to be valued as part of modern Western science. They tried to get legal approval to export herbal remedies to Europe and praised molecular biology as the future of TCM. They loved the (mis)translation of qi as “energy” because energy sounded scientific.

Their hopes for assimilation are perhaps being fulfilled. In recent years, many centers for “integrative medicine” have opened in the West, aiming to combine mainstream treatments with therapies such as TCM in an evidence-based way. Acupuncture tends to feature prominently, as does the study of the interrelatedness of body phenomena, complex packages of care, and the patient-practitioner relationship.

On a cultural level, the conflict between integrative and mainstream medicine in the US has reportedly decreased over time. David Spiegel, the director of the Stanford Center for Integrative Medicine, said physicians are much less opposed to such therapies now, as long as they feel alternative treatments complement conventional approaches. I do not think this would surprise my father.

Recently, I found a startup that is looking to bring acupuncture to the mass market by training—you guessed it—a machine learning model on pairs of data points that link symptoms and treatments. The ML model then recommends acupuncture points for your symptoms. On their demo website, you can type in your deafness, coughing, painful belching, saliva deficiency, thigh swelling, or any other abnormality that might be tagged in their database of thousands of anonymous case studies. Then you’d check the mandatory “This is not a substitute for a doctor” box and receive one or a few acupuncture points (LU09 for asthma and a cold abdomen), alongside a “confidence level” for the treatment, which reflects how cleanly your symptoms pattern onto the known data points.

The startup’s strategy involves selling not just access to this recommendation system, but press needles for self-application and, eventually, herbal formulations and massage chairs. Then, people can cheaply and easily apply needles to themselves, without the inconvenience of having to seek a clinic. It removes the patient-practitioner relationship, one of the pesky elements of complexity that RCTs have to contend with. While the website says this is not a substitute for a doctor, functionally, it is.

This approach concedes defeat to the inscrutability of Chinese acupuncture, giving up on attempts at explanation altogether. It does away with nebulous, finicky terms like qi in favor of the theoryless, correlation-driven method of similarity-matching your reported symptoms against other people’s reported treatments.

The data scientists breathe a sigh of relief as they plug their neural networks into the intractable problem. Prediction is so much easier, after all. But the void left behind by missing explanations remains, and it can hardly be filled by anything else.