SG-art: Forced choice over stated preference logo

SG-art: Forced choice over stated preference

A pair-wise taste quiz over MoMA's collection

Web AppDesign, Engineering, Product

The Problem

MoMA's collection runs to tens of thousands of works. Filter-and-search UX assumes a visitor arrives with taxonomy — 'abstract expressionism', 'European', 'pre-1950' — and ranks works against stated preferences. But taste isn't a query term. It's relational: a property of how a work feels against other works. The right mechanic for a catalogue this size isn't query refinement. It's forced choice between two works you can see.

What It Does

1

Pair-wise duel interface: two works on screen, forced choice, no preference sliders

2

Six-axis psychological taste profile — Structural Order, Materiality, Abstraction, Arousal, Emotional Locus, Temporal — derived from choice patterns rather than stated preferences

3

Personalized museum guide: an editorial narrative grounded in the specific works the quiz surfaced, not generic taste descriptions

4

Multi-axis pair selection: the next duel is chosen to separate the visitor on the dimensions where the system still has the weakest signal about their taste

5

Anti-Barnum validation on taste descriptions: refuses to ship horoscope-style generic prose; every description must cite specific works from the current duel sequence

What the AI Does

The quiz runs a short sequence of pair-wise duels — ten rounds, twenty works. The visitor isn't asked what they like; they're asked which they prefer, right now, between two things they can see. Each choice nudges a taste profile across six psychological axes. The agent's job between rounds is to pick the next pair to maximize what the system learns — choosing works that contrast on the visitor's weakest-signal axes so ten rounds produce a converged profile rather than a noisy one.

What comes out is a personalized museum guide: an editorial paragraph written against the profile and grounded in the specific works that revealed it. Not 'you like bold, expressive art' — a sentence with actual titles and artists in it, the ones the visitor's choices pulled toward. The agent loop is a tool in service of that output, not the story.

Key Design Decisions

Forced choice over stated preference

Ten rounds of 'which of these two do you prefer?' instead of filter panels and preference sliders. The visitor picks. They don't explain. The next pair shows up.

Filters ask a visitor to declare their taste before they know it. Taste-driven catalogues defeat that contract — you can't filter for a feeling you haven't named. Forced choice inverts the mechanic: show two things, watch which one gets picked, infer from the pick. If the visitor has to name it, the product failed.

Latent axes, not genre tags

Taste is measured on six psychological axes — Structural Order, Materiality, Abstraction, Arousal, Emotional Locus, Temporal — not on museum-curator categories like "Impressionism vs. Cubism" or "pre-1950 vs. post-1950." The axes are properties of how a work feels, not what bucket it belongs to.

Genre taxonomies describe how collections are organized, not how people respond to what they see. A visitor who loves restrained geometric work and one who loves maximalist figurative work can both 'like modern art.' The axis system separates them with arithmetic: Abstract Expressionism clusters at Materiality 58.1 versus Minimalism's 39.7 — an 18-point gap on a single axis that no genre label captures. Taste lives below the genre layer; the axes are how you measure what's down there.

Pairs that do double duty

The pair optimizer doesn't just pick the sharpest contrast on the visitor's weakest axis — it prefers pairs that contrast on two or more weak axes at once. Ten rounds isn't many rounds against six axes; the system can't afford single-purpose pairs. Every duel should narrow the profile along multiple dimensions simultaneously.

Budget thinking: six axes, ten pairs, 1.67 pairs per axis on average — below the sufficiency threshold where a profile becomes trustworthy. Single-axis sharpening was myopic. A 'deep-and-structured' pair carries information about Structural Order *and* Abstraction Tolerance, not just whichever axis the selector was targeting. Multi-axis scoring fixes the arithmetic: pairs that retire two axes toward sufficiency in one round beat pairs that only move one.

Specific enough to be wrong

Every taste description has to reference at least one specific artwork the visitor actually saw in their duel sequence — by title, not genre. If the LLM tries to ship prose like 'you appreciate art that balances form and feeling', the validator rejects it and asks for a rewrite grounded in an actual work from the session.

Taste descriptions are the format most prone to Barnum output — prose so vague anyone could accept it as their own. The specificity gate is cheap: require the description to cite works from the diagnostic pairs of the current session. Works that couldn't have been shown to anyone else. A description that names Rothko's No. 10 when No. 10 was the pair that revealed the visitor's Materiality leaning is falsifiable — specific enough to be wrong. Generic descriptions aren't wrong; they're just useless.

How the axis system broke, and what fixed it

Adding dimensions to a taste model can break the elicitation system that feeds it — taste elicitation is a chain, not a stack. This is a builder's account, not a user study; the evidence below is the axis data SG-art produced against known-answer inputs, and the two revisions that followed from it.

The first assumption was that four psychological axes — Abstraction, Arousal, Emotional Locus, and Temporal Orientation — would carry the profile. They drew on five of the fifteen visual dimensions I scored every artwork on; the other ten felt descriptive, not diagnostic. Then I ran the axis system against the MoMA collection baseline and Cubism came out flat: a spread of 6 across four axes, functionally indistinguishable from a random visitor. Its two strongest discriminators against baseline — geometric_vs_organic +8.9, spatial_depth −7.6 — fed no psychological axis. Constructivism and Surrealism had the same failure. A Cubism-lover would have walked out matched to the wrong movement, and no amount of duel data would have surfaced the bug.

I added two new axes: Structural Order and Materiality. Twelve of fifteen visual dimensions got promoted into the taste layer. Cubism now separates at Structural Order 54.6 versus Expressionism's 41.6.

That fix broke the next thing. Six axes across ten rounds is 1.67 pairs per axis, below the sufficiency threshold of three that the matching system needs. The original pair-selection mechanic — pick whichever pair maximizes delta on the weakest-signal axis — was myopic. It treated a deep-and-structured pair as information about Structural Order only, ignoring the simultaneous signal on Abstraction Tolerance. The signal budget wasn't math I could rank my way out of. I rewrote the pair optimizer to prefer pairs that clear the delta threshold on two or more weak axes at once — thresholdClearCount × MULTI_AXIS_BONUS + avgWeightedDelta — with single-axis sharpening kept as a named fallback. The two revisions only work together: the richer profile needs pairs that do double duty, and pairs only do double duty because the profile got richer.

What Happened

What the visitor walks out with is a personalized museum guide — an editorial paragraph with specific artists and works, not a taste label. 'Your restraint toward the Minimalist room suggests that Judd's specific objects will read as inhabitable rather than austere' — not 'you like minimalism.'

The guide runs on the same agent loop behind SG Resale — now pointed at editorial narrative instead of product listings.

Building something complex? Let's talk.

Looking for my next role designing and building AI products.

Get in Touch

Designed & Built by Drew Miller

© 2026. Version 3.3