SG-art: Forced choice over stated preference

Taste isn't a query term.

Role

Design, Engineering, Product

Type

Web App, Agent-Native

The problem

Taste isn't a query term.

MoMA's catalogue runs to tens of thousands of works. Filter-and-search assumes a visitor arrives with taxonomy — 'abstract expressionism', 'European', 'pre-1950' — and ranks works against stated preferences.

But taste is relational: a property of how a work feels against other works. The right mechanic isn't query refinement. It's forced choice between two works you can see.

The old mechanic

Movement

Abstract Expressionism · Minimalism · Cubism · ...

Era

Pre-1900 · 1900–1950 · 1950–2000 · Post-2000

Region

European · American · Asian · ...

Medium

Oil · Acrylic · Sculpture · Works on Paper

Show results (14,382)

— replaced by —

The new mechanic

Show two things, watch which one gets picked, infer from the pick.

What it does

Six mechanics.

Pair-wise duel interface

Two works on screen, forced choice, no preference sliders.

Six-axis psychological taste profile

Structural Order · Materiality · Abstraction · Arousal · Emotional Locus · Temporal — derived from choice patterns rather than stated preferences.

Personalized museum guide

An editorial narrative grounded in the specific works the quiz surfaced, not generic taste descriptions.

Multi-axis pair selection

The next duel is chosen to separate the visitor on two or more dimensions at once, prioritizing pairs that move axes the system isn't yet confident about.

Anti-Barnum validation

Refuses to ship horoscope-style generic prose; every description must cite specific works from the current duel sequence.

Department-routed curator voice

The narrative is voiced by one of three composite curator personas — Painting & Sculpture, Photography, or Architecture & Design — chosen by the modal department of the visitor's selected works. The profile decides what to say; the artworks decide who says it.

What the AI does

The agent loop is a tool in service of that output, not the story.

A short sequence of pair-wise duels — ten rounds, twenty works. The visitor isn't asked what they like; they pick between two things they can see. Each choice nudges a profile across six psychological axes.

Between rounds, the agent picks the next pair to maximize what's still unknown — works that contrast on the visitor's weakest-signal axes. What comes out is an editorial paragraph with actual titles and artists, written against the profile and grounded in the works that revealed it.

Key design decisions

Five moves.

Five choices that shape the system.

Forced choice over stated preference

Ten rounds of 'which of these two do you prefer?' instead of filter panels and preference sliders. The visitor picks. They don't explain. The next pair shows up.

Filters ask a visitor to declare their taste before they know it. Taste-driven catalogues defeat that contract — you can't filter for a feeling you haven't named. Forced choice inverts the mechanic: show two things, watch which one gets picked, infer from the pick. If the visitor has to name it, the product failed.

Latent axes, not genre tags

Taste is measured on six psychological axes — not on museum-curator categories like 'Impressionism vs. Cubism.'

Minimalism 28

AbEx 81

Δ 53

Materiality axis · 0 ———— 100

Genre taxonomies describe how collections are organized, not how people respond to what they see. A visitor who loves restrained geometric work and one who loves maximalist figurative work can both 'like modern art.' The axis system separates them with arithmetic: Abstract Expressionism clusters at Materiality 81 versus Minimalism's 28 — a 53-point gap on a single axis that no genre label captures. Taste lives below the genre layer; the axes are how you measure what's down there.

Pairs that do double duty

The pair optimizer prefers pairs that contrast on two or more weak axes at once. Ten rounds isn't many rounds against six axes; the system can't afford single-purpose pairs.

Structural Order
Δ 70 ← targeted

Arousal
Δ 74 ← free signal

Ten rounds against six axes is too thin for single-axis sharpening to converge. A 'deep-and-structured' pair carries information about Structural Order and Abstraction Tolerance, not just whichever axis the selector was targeting. Multi-axis scoring fixes the arithmetic: pairs that retire two axes in one round beat pairs that only move one.

Specific enough to be wrong

Every taste description has to reference at least one specific artwork the visitor actually saw in their duel sequence — by title, not genre.

REJECTED

"You appreciate art that balances form and feeling."

SHIPPED

"Your pull toward Rothko's No. 10 over Mondrian's grid placed you 81 on Materiality — you read color fields as weather, not diagrams."

Taste descriptions are the format most prone to Barnum output — prose so vague anyone could accept it as their own. The specificity gate is cheap: require the description to cite works from the diagnostic pairs of the current session. Works that couldn't have been shown to anyone else. A description that names Rothko's No. 10 when No. 10 was the pair that revealed the visitor's Materiality leaning is falsifiable — specific enough to be wrong. Generic descriptions aren't wrong; they're just useless.

Voice routes to the work

The narrative isn't voiced by a single house writer. It's voiced by one of three composite curator personas — Painting & Sculpture, Photography, or Architecture & Design — chosen by the modal department of the visitor's selected works.

P&S

Painting & Sculpture

sustained looking

Photography

conditions of the image

A&D

Architecture & Design

function before meaning

A guide built around chairs and posters shouldn't sound like a guide built around oil paintings. Painting curators reach for sustained looking and the painter's intent; design curators talk about how a thing works before what it means; photography curators talk about the conditions of the image. Voicing every visitor's guide in the same editorial register collapses that — every output sounds like the same magazine. Routing voice to the modal department keeps the prose honest to the work the visitor actually pulled toward.

A builder's account

How the axis system broke, and what fixed it.

Adding dimensions to a taste model can break the elicitation system that feeds it. Taste elicitation is a chain, not a stack.

The pool itself is stratified by movement cluster, so the separations the axes find aren't artifacts of which works happened to surface together.

The first assumption was that four psychological axes — Abstraction, Arousal, Emotional Locus, and Temporal Orientation — would carry the profile. They drew on five of the fifteen visual dimensions I scored every artwork on; the other ten felt descriptive, not diagnostic.

Then I ran the axis system against the MoMA collection baseline and Cubism came out flat: a spread of 6 across four axes, functionally indistinguishable from a random visitor. Its two strongest discriminators against baseline — geometric_vs_organic +8.9, spatial_depth −7.6 — fed no psychological axis.

Constructivism and Surrealism had the same failure. A Cubism-lover would have walked out matched to the wrong movement, and no amount of duel data would have surfaced the bug.

I added two new axes: Structural Order and Materiality. Twelve of fifteen visual dimensions got promoted into the taste layer. Cubism now separates at Structural Order 59 versus Expressionism's 37.

Cubism vs. baseline

Signal restored. Structural Order pulls Cubism away from Expressionism by 22 points. Temporal separates further.

Movements separate on the new axes

Structural Order

Expressionism 37

Cubism 59

Δ 22

Materiality

Minimalism 28

AbEx 81

Δ 53

Temporal

Impressionism 20

Minimalism 68

Δ 48

The fix broke the next thing

Six axes across ten rounds was 1.67 pairs per axis — too thin for the original sufficiency model to trust.

The original pair-selection mechanic treated a deep-and-structured pair as information about Structural Order only, ignoring the simultaneous signal on Abstraction Tolerance.

I rewrote the pair optimizer to prefer pairs that clear the delta threshold on two or more weak axes at once — thresholdClearCount × MULTI_AXIS_BONUS + avgWeightedDelta — with single-axis sharpening kept as a named fallback.

The two revisions only work together: the richer profile needs pairs that do double duty, and pairs only do double duty because the profile got richer.

And then the voice was wrong

Six axes and a multi-axis selector produced clean profiles, but every visitor's guide arrived in the same editorial register — patient, painterly, comfortable with sustained looking. That voice fits a Rothko walkthrough. It does not fit a tour of Eames chairs and Castiglioni stools. A&D-modal visitors got prose that treated their picks the way the painting desk treats picks, and the mismatch read as the system not actually understanding what they had pulled toward. The fix routes voice to the modal department — three composite curator personas, chosen by the works the visitor's choices surfaced. The profile decides what to say. The artworks decide who says it.

The ten-round budget bothered me afterward — visitors who weren't done at round 10 were getting cut off, and the binary sufficiency gate handed them a pass-or-fail verdict on a profile that was still in motion. I replaced sufficiency with a confidence-weighted distance: every axis contributes proportional to how converged it is, not whether it crossed a threshold. The ten-round budget became a research instrument — visitors can continue past round 10, and the system measures whether the profile keeps moving or has actually converged.

What ships, what doesn't

V1 ships behind a leakage harness: 20 fixture runs scored against a 14-phrase ban list, latest run 2/20 hits — both single-instance ("a testament to", "speaks to"), both flagged for V1.1 directive tightening.

“Shipping a voice system without a way to measure when it drifts is shipping a feeling, not a system.”

— Voice harness

What happened

Same six-axis profile. Two visitors. Two desks.

The visitor walks out with a personalized museum guide — an editorial paragraph with specific artists and works, voiced by the desk that matches the modal department of their picks.

Your guide · session #1147

10 / 10 rounds

P&S

voiced by the Painting & Sculpture desk

Your picks pulled toward the quieter rooms, so I'd start with Agnes Martin in Gallery 402 and stand with one painting for at least five minutes. The grids look like nothing for the first thirty seconds; that's the work. Skip the Rothko antechamber — your Materiality reading (38) says the color fields will read as weather you don't want to stand in. Judd's stacks across the hall will reward you. They read as cold from across the room and warm at arm's length.

Structure

Material

Abstract

Arousal

Locus

Temporal

Your guide · session #2103

10 / 10 rounds

A&D

voiced by the Architecture & Design desk

Your picks were almost all from A&D, so go to the third floor first. The Eames LCW — pick it up with one hand; the plywood is bent in two directions, which sounds easy and isn't. Then the Castiglioni Mezzadro: the seat is a stamped-steel tractor seat, mounted on a single curved rod. Your structural order reading (78) means you'll like that the wing bolt isn't decorative — you can disassemble the whole thing in eight seconds. Skip the textile gallery this trip; your materiality (38) says the surface argument won't carry.

Structure

Material

Abstract

Arousal

Locus

Temporal

↑ Same six-axis profile in both cards

Same profile, different modal department. The system routes voice — patience and "stand with it" for the painting room, function-first analysis for the design floor — without changing the underlying taste model.

The guide runs on the same agent architecture as SG Resale, repointed at editorial narrative instead of product listings — and routed through one of three composite curator personas chosen by the modal department of the visitor's picks.

Collection data: The Museum of Modern Art ↗