Curated by Marina Chao, the International Center for Photography’s day-long symposium explored technology’s manifold interactions with image and language
On March 19th, The International Center for Photography (ICP) hosted Seeing Meaning: From Pictographs to AI. A symposium sponsored by the Andy Warhol Foundation for the Visual Arts and broadly organized—as stated in the promotional materials—around the “intersections between image and language,” the event featured an earnestly eclectic roster of speakers eager to probe this liminal space. One played a saxophone. Another read an autofiction-adjacent account of his experience with photography and psychoanalysis. Charles Broskoski—co-founder and CEO of Are.na (an artsy, pseudo-erudite cousin of Pinterest)—live-blogged the event. The program’s curator Marina Chao thanked those in attendance for coming out “on a school night.” In the face of a wider, hotly contested societal debate about AI, the day maintained a whimsical, academic vibe.
Yet, a sober opening presentation by Fred Ritchin couldn’t help but color the more artistic and theoretical speeches that followed. “Exiting the Photographic Universe” was a nihilistic account of photojournalism in the wake of text-to-image AI algorithms. Ritchin, dean emeritus of the ICP and a former NYU professor of photography, has for decades focused on documenting human rights abuse. As such, it was disturbing when he claimed that the age of photography was dead. It was even more disturbing when he claimed its death ushered in a world where human rights abuses can not be reliably documented and therefore believed. “We’ve overwritten the photograph,” he said. In a world where around 40 percent of people cannot distinguish between artificial and real people, there will be no more iconic pictures like Tiananmen Square’s “Tank Man.”
To underscore this point, he cited Amnesty International’s use of AI to depict the 2021 protests in Colombia. Even reputable organizations have begun to utilize synthetic images. Adobe—ironically the head of the Content Authenticity Initiative—was caught selling fake depictions of Israel’s bombardment of Gaza. Ritchin along with Dr. Yotam Ophir—a political science professor at the University at Buffalo who spoke after him—continued to show viral, AI-generated renderings of allegedly dead Israeli and Palestinian children. These images are deeply troubling and, at first glance, realistic. They’ve also been viewed millions of times. With or without a noble cause, anyone can now produce passable, engaging visual evidence in support of their ideas. We now have photos that can “prove” anything, be it for or against Palestine, Trump’s huge Black voter turnout, or Pope Francis’ uncharacteristic ownership of a Moncler-style puffer jacket.
Evidence is a thing of the past when it is faked.
Uncritically, the public hungrily consumes AI images. It seems odd that, amid an incredibly well-documented genocide, these “photos” can still find footing when there is already real, photographic evidence of the same subject matter. Something about the generated photos, even in the face of documented realities, is enticing.
Maybe it’s because of the uncannily fluid, almost painted quality the images have. They’re certainly eye-catching. Regardless, Ritchin has been sounding the alarm on doctored images since the ’80s—before Photoshop, much less DALL-E, existed. Dr. Ophir noted that talented editors with enough skill and time on their hands have always been able to generate believable fake images. Those, too, have been peddled by major, legacy news organizations—from Reuters to the LA Times.
Regardless, there are a few obvious reasons this new crop of image-generating AIs are uniquely scary. First, they poison the well, making any evidence subject to scrutiny. Now, in the public imagination, the camera’s (somewhat) objective lens has been replaced by an infinitely manipulatable algorithm. Artificial intelligence also means doctored images are much quicker and easier to make. But, there’s a more subtle conceptual, structural, and linguistic departure none of the speakers explicitly mentioned.
GANs (the most popular image generative models) craft their output from Gaussian noise: Pure visual nonsense. Over small increments, this TV-static-esque gibberish becomes clearer, eventually resulting in the classic (almost) photorealistic results. The algorithm can take random pixels (imagine a grayish soup) and tweak their values, de-noising their aggregate. In practice, this means the picture volleys back and forth between a generator (makes/adjusts the pixels) and a discriminator (an algorithm that, having looked at many images, determines whether an image is satisfactory) until it’s deemed realistic and close enough to the prompt.
This structure means that, in the beginning, however, there’s only the textual prompt and the visual nothingness. Whereas language traditionally seeks to distill experience into a linear representation system, in the generative case, linguistic representation precedes the image. The specter of language, and as a result, an authorial viewpoint, lurks behind each of the GAN’s products. Structured language has subjects, objects, and narrative forces driving it. Syntactically, prompts have clear-cut meaning. They’re spelled out.
“With text-to-image AI, this repression achieves its totalizing aim. Language becomes the basis for pictographic representation.”
With such straightforward meaning behind each of the GANs’ images, there was a discordance with the research presented by Dr. Maria Varkanitsa, a linguist and neuroscientist studying patients suffering from stroke-induced language issues. In one of her tests, she shows her patients a line drawing of a cat and a mouse. There are two ways to describe the image: the cat is chasing the mouse or the mouse is being chased by the cat. When shown the drawing, healthy minds can communicate both linguistic points of view. Ideally, one can extrapolate nuance and subjectivity from situations and the images used to represent them. Yet, in programmatically generated images the description precedes the representation. An image of a cat chasing a mouse and an image of a mouse being chased by a cat are, in a neural network, different. Encoded at their genesis is an explicit and legible perspective that the prompt—and as a result, the image—seeks to express. It’s not difficult to reverse-engineer the input from the output.
Images have always framed their subjects, unable to escape their creator’s bias. In 1994’s The Vision Machine, theorist Paul Virilio identifies the flattening of lived experience into a 2D frame as a rhetorical device, smashing together space and time. In his conception of the photograph, there is a totalitarian ambition to repress the invisible, hidden by the borders of the frame. With text-to-image AI, this repression achieves its totalizing aim. Language becomes the basis for pictographic representation. A significant structural shift, the invisible is no longer restrained, it is rendered non-existent. The generative gaze is myopic and inward. Each image has a singular, god-like speaker. Underneath the image is only one concept, already named.
With the AI images of Gaza Rithin and Dr. Ophir reference, we see children’s exaggerated pleading eyes. Some have Palestinian flags painted on their bodies. The same is true for their Israeli counterparts. Cloyingly, any messiness or harsh ambiguity is erased. With clear textual backing, they are, despite their disturbing content, easy to read conceptually. Rather than showing the reality of the horror in Gaza, AI can make a simple, convincing rhetorical case. Compounding this legibility is the dataset on which it’s trained. As a machine raised solely on existing images, its imagination is limited. Algorithms cannot conceptualize novelty (and thus situational nuance) because there is no precedent in which they could. The uniquely disturbing reality found in Gaza has collapsed into each preceding humanitarian crisis, as exemplified by whoever feeds a DALL-E or the like a written prompt. Traditional photography attempts to capture real, inhabited lives (with their differences, foibles, and unintelligibility), forcing the viewer to confront the human on the other side of the lens. Although not always dissimilar to reality, text-to-image AI restates a simple prompt. Limited by length and complexity, they can only convey so much.
The same textual forces that make AI photojournalism so reductive make AI art so mushy and unprovocative. Writer and photographer Nicholas Mulner prefaces his presentation, admitting, “Despite holding the urgent concern for AI imagery, I found myself easily bored by the discourses around artificial intelligence, art, and photography.” Later in the day, conceptual artist Chloë Bass staged a piece (“To Quote, To Praise, To Summon”) she originally performed at the MoMA last November. Asking for audience engagement, one elderly woman explained that when she first encountered photography years ago, it challenged her, encouraging her to empathize with emotions she’d never felt before. It’s difficult to imagine that AI art could provide that challenge.
As the symposium’s focus progressed from photojournalism to artistic practice, Chao’s argument became clearer: at their best, when images and language come into contact, they are disruptive forces.
JJJJJerome Ellis, another speaker, manipulates typography to represent his stutter as a challenge to normative language systems. A totalizing force itself, traditional syntax erases his speech patterns. Freeing the letter from the line and repeating characters, he advocates for his stutter’s communicative importance. In The Clearing (a book that transcribes an album, pictured below), he writes a poem dedicated to the disfluency of speech, interspersing stanzas with dozens of D’s, repeated like a keyboard smash. A disruption to normative cultural forces, his practice is antithetical to the syntactically fluent, smoothed-over, ethos AI models embody.
Entering non-traditional text like JJJJJerome’s into DALL-E does not create any meaningful visual accompaniment. It glitches—AI is not designed to create meaning outside of existing systems of knowledge. It first balks at the repeated letters, eventually giving a nondescript, unrelated image of outer space. Prompts must be linear and immediately accessible—his words are not. But, as he says, disruptions open new ways of thinking. They provoke thought. Unsure what to do, despite it having no relevance to the text, the program goes the linguistically literalist route, invoking the great unknown: a picture of space.
Many of the other artists shared his experimental, tradition-irreverent ethos. Shannon Ebner presented “HYPER-GRAPHIC-STATES,” showcasing her work recording sounds through experimental mediums: poetry spoken through a trumpet, capturing “sounds” in jars, hydrophone recording visualizations, et al; her experiments drew out the manner in which our current ways of recording impact our interactions with an environment and others. Finnegan Shannon and Bojana Coklyat looked to HTML image alt-text (an often utilitarian device) as a poetic medium; their work investigated the way technical design affects the experience of those with limited sight on the internet. Jennifer Daniel, chair of the Unicode Consortium’s Emoji Subcommittee, discussed the modularity of her pictographs and the manifold meaning encoded in their combinations; a new pictographic language with no rules, users have become creative with expressing less easily translated ideas. One of her favorites is .
Seeing Meaning attempted to peek through the cracks of graphic representation. In disjoint unions, the limitations of our technology (whether it be language or sophisticated digital algorithms) is able to be seen, and as a result, interrogated. On the other hand, with AI images namely dispensed through corporate programs like DALL-E or Microsoft’s Image Creator, textual prompts, social conditions, and the resulting synthetic output have a 1:1:1 perfect conceptual overlap. Producing ideologically consistent and ultra-decipherable ur-images, it’s impossible—unless pushed to their breaking point (Jon Rafman’s recent work comes close)—for text-based GANs to produce meaningful critique. Usually, however, it is a perfect, consistent reverberation of the user’s input, if the prompt is simple enough. The machine seeks to create its output from uncritically legible texts that affirm the status quo and the user’s POV. Seeing Meaning’s lineup had no interest in that.