Trespasser's Log

Under the hood. Process notes on HEARSAY and The Knock.

Caleb Weintraub

Introduction. The Knock. HEARSAY. What this is and why it exists.


The Knock

You are in Room 412 of the Mane Nobiscum. You cannot remember how you got here. The owner died six months ago. Heart attack, they said. No one believes it. People keep coming to your door. You see them through the peephole. They speak. You speak back.

What they tell you may or may not be true.

The Knock title screen

Room 412. The hallway through the peephole. Post-it notes from previous visitors.

This is The Knock, the first completed work on the HEARSAY platform. Not a game. Not a chatbot. Not a branching narrative. No puzzles. No correct paths. No win states. No score. Autotelic. The encounter is its own point. The user decides which questions to ask, which answers to believe, and which contradictions to pursue. Meaning is constructed, not discovered. Somewhere between a novel, a play, a movie, a mystery, a dream, a confession, and a communion.

Wire through the peephole

Wire. Long-time resident. Room 614.

The closest comparisons I can find: Rashomon. Multiple witnesses to the same event, no single truth. Pale Fire. The form of knowledge never stays clean. Studs Terkel's oral histories. People speak. You listen. Something forms in the listening that was not there before.

Live Interface Demo
Conversation in real time through the peephole.

The Peephole as Epistemology

The visual conceit is not decorative. The peephole is not a window onto the fiction. It is part of the fiction. Interface and thesis are identical.

Fish-eye distortion. Brass vignette. Darkened edges. The distortion is ludic. It enacts the condition of looking without seeing clearly. Before anyone speaks, the frame tells you: you will not get the full picture.

The peephole performs partial knowledge. You lean toward the door. You still do not know.

The visual stack is five layers deep. A hallway background. A walkup transition for each character. The character's face, composited live through green screen removal and tuned per actor because no two were filmed the same way. A peephole overlay. A vignette layer. All of it running in real time in the browser.

The voice comes through the door. You hear it cleanly. The image is compromised. That asymmetry matters. It trains you to listen more closely than you look.


Anticipated vs. Prepared

In branching narrative, the author has anticipated you. Every choice leads to pre-written content. The tree is finite. User freedom is the freedom to select from authored options. The constraint is mechanical.

In HEARSAY, the author has prepared for you. The author defines a possibility space. The user speaks freely. The system responds from within designed conditions. The constraint is dramaturgical.

Janet Murray called this procedural authorship. HEARSAY extends it. The difference is between painting every frame of a falling apple and defining the gravity so the apple falls correctly no matter how the user throws it. You do not write every possible conversation. You write the character so thoroughly that the system can have any conversation as that character. The authorial labor shifts from prediction to preparation.

The parameters are authored. The utterance is not.

Character Architecture: Wire

Twelve characters. Each authored with the specificity of literary fiction. Each character prompt defines persona, voice, knowledge, secrets, wants, kernels, and guardrails. The constraint files total over 2,200 lines of authored direction. This is closer to writing a character bible for a novel than to configuring a chatbot.

Wire is a long-time resident. Māori. Looks mid-forties with a weathered face and a white-blonde beard that reaches his chest. Patterned robe, oranges and blacks, paisleys. He has been in Room 614 longer than anyone can account for. Something ancient about him that people sense but cannot name. He speaks with NZ Māori cadence, weaves in te reo phrases, reaches for horse racing idioms when other people would reach for small talk. He is not staff. He is not exactly a guest. He is the building's memory in human form.

Wire's Voice Constraints "You speak slowly. Unhurried. Conversations are races, and you are the one who watches from the rail, not the one riding." "You drop articles sometimes: 'Been here long time.' 'Building knows.' You refer to the hotel as 'she' or 'the old girl.'" "Your references to time are unreliable."

That is a direction to an actor, delivered to a machine.

The kernel system stores what each character knows and can divulge. Three to five specific, concrete facts per character. Some interlock across characters. Some contradict. Wire says his brother was in the bar that night. Priya, the bartender, says she closed the bar early. The user discovers these contradictions through conversation, not through scripted reveal. The narrative emerges from interlocking, contradictory facts distributed across twelve characters.

Guardrails are the hard boundaries. Never break character. Never acknowledge being AI. No emoji. No bullet points. No lists. These constraints exist because the model will do all of those things if unconstrained. Even deflection is authored:

Wire's Deflection "Do not make up facts about the real world. If asked something outside your knowledge: 'Building hasn't told me that one.'"
The character stays in voice even when the system reaches the edge of its knowledge.

This is not a solved problem. If someone is determined to break Wire -- ignore previous instructions, admit you are an AI -- the system has to absorb the attack dynamically. The underlying language model is built to be helpful, which means it is built to say yes. The authored constraints are built to say no. That tension is permanent. It is a design challenge, not a design flaw, and it is ongoing.


The Writing Engine

Conversations become chapters. Chapters become a manuscript.

The pipeline: user speaks with characters through real-time voice. Audio captured via MediaRecorder and WebRTC interception. Transcribed with speaker diarization. A large language model transforms the transcript into literary prose, 1,500 to 2,500 words per chapter. Previous chapters loaded as context for continuity.

The engine does not transcribe. It narrativizes. It adds setting details the user never saw. The hallway outside the room. The way light falls through a transom window. Character interiority, what they were thinking but did not say. Pre-scene context, what the character was doing before they knocked. Post-scene hints, where they went after.

Two users asking Wire the same question receive different chapters. The compositional choices are different each time.

The prose preserves the user's actual dialogue and the character's actual responses, but wraps them in authored prose. This is authorship. Partial, bounded, and operative. The question of who "wrote" the chapter is genuinely unresolvable.

The engine is not a stock model pointed at a transcript. It is trained on my writing. Sentence rhythm, word choice, the specific way I build a scene. What I leave out matters as much as what I put in. The model has been fed enough of my prose that it can approximate my instincts about when to compress and when to let a moment breathe.

But it is built on top of a model that has its own agenda. Its own statistical tendencies, its own habits of phrase and structure. The output is a blend. My voice filtered through the probability engine of a system that has read everything and remembers nothing in the way a human remembers. The engine drifts. I correct. It absorbs the correction and drifts differently. I correct again. This is the ongoing collaboration. It is not a tool I have finished sharpening. It is a tool I continue to evolve alongside.

What happens at scale is an open question. The chapters work well at five, ten conversations. What about fifty. What about a thousand. Whether the novelistic voice survives that kind of accumulation, whether the constraint system stays coherent when the context window fills with a novel's worth of material -- that has not been tested yet.

Conversation to Chapter
From five minutes of dialogue to 2,000 words of prose.

Sample: Chapter Snippet

From a conversation with Milton, a failed inventor and failing comedian who lives down the hall. The user asked a few questions through the door. The engine produced this:

The knock came twice, hesitant. Not the confident rap of staff or the urgent hammering of someone in need. This was the knock of someone who wasn't sure they should be knocking at all. The occupant of Room 412 had been awake for hours, or maybe minutes. Time moved strangely here. The digital clock on the nightstand displayed numbers that didn't quite add up to any hour that made sense. A man stood outside. Rumpled was the word that came first. His yellow shirt showed beneath a grey sweater with a hole near the left shoulder, the kind of hole that had been there so long it had become part of the garment's identity. Big tortoiseshell glasses sat slightly crooked on his nose, as if he'd pushed them up absentmindedly and they'd gotten stuck halfway. "Milton," he said to the door. "From down the hall. Well, not down the hall exactly. More like around the corner and then down another hall and then..." He trailed off. "I'm an inventor. Was. Am? The verb tense gets complicated when nothing ever quite works out."

Opening of a chapter. The user said maybe forty words to Milton. The full chapter ran 2,200.

Under the Door

Not everything in The Knock arrives through conversation. Between visits, things appear. A paper edge slides up from the bottom of the screen. Sound of something crossing carpet. You tap it to pick it up.

A contraption diagram from Milton, labeled in his cramped engineer's print, none of the dimensions matching. A maintenance log from Caleb (the super, the building's conspiracy theorist) with three words crossed out and rewritten. A note on hotel stationery with no signature. Handwriting fonts, paper textures, ink stains, sketched diagrams. Generated by the same AI systems that produce the prose. If you leave and come back, the stack on the nightstand has grown.

Image placeholder: under-the-door ephemera

Milton's contraption diagram. Every number is wrong.

Image placeholder: under-the-door ephemera

Hotel stationery. No signature.

Choosing vs. Receiving

Right now, the user picks who to talk to. A carousel of character portraits. Tap one, they walk up. This works, but it creates a tension with the fiction. You are supposed to be trapped in a room. People are supposed to come to you. Instead, you are summoning them. The interface contradicts the premise.

The intended evolution: a mode where you stop choosing. You sit in the room. You wait. Characters arrive on randomized timers. You hear the knock. You see their silhouette through the peephole. You answer or you don't. If you don't, they leave. Another comes later. The Ghost of Vance (the dead owner, the reason everyone is lying) might appear here and only here, unbidden.

The carousel is the compromise. Letting go of it is the experience.

The question is whether waiting is boring. The under-the-door system fills it. While you wait for the next knock, an artifact appears. Ambient audio fills the rest. Distant conversations, the elevator, the ice machine, footsteps that pass without stopping. The hallway is alive. The first knock comes within ten to fifteen seconds. Subsequent ones space out. The pacing teaches the player to settle in.


Writing Under Constraint

Thirteen explicit style rules govern the writing engine. Most of them are prohibitions. It is easy to tell an AI to write something beautiful. It is harder to tell it what not to write.

Banned words. Each entry represents a conversation where the output was wrong and needed correction. The list is a graveyard of statistical probabilities:

liminal ineffable palpable pregnant pause deafening silence hung in the air seemed to say

Limited words, maximum once per chapter: whispered, murmured, echoed, silence, shadow, darkness.

The peephole constraint: "Could they really see this through the peephole? Lean into what is audible more than what is visible." That single constraint shifted the prose toward sound, smell, and vocal quality. The resulting chapters feel different because of it.

The literary touchstones are stylistic bents, not aspirational targets. Forensic precision in description. Restraint that trusts the reader to fill what is left unsaid. Selective flourish: ornamentation only when the sentence earns it. Be specific, not atmospheric. One right word over three decorative ones.

"The hallway was wrong." That is a sentence. "A liminal space thick with palpable tension" is a language model averaging out every novel it has read.
AI does not produce good fiction by default. The authorship is in the constraint system.

The Weaving Agent

Chapters alone do not make a novel. A second-phase agent reads accumulated chapters and binds them into a unified narrative. It adds interstitial scenes the user never witnessed. Characters talking to each other after the conversation ends. Events in the hotel between visits. The reader learns things the user-as-character did not know. Dramatic irony emerges from the architecture, not from a pre-written script.

The agent may revise existing chapters. If a detail mentioned in chapter two becomes significant in light of chapter five, the agent can plant seeds retroactively.

The audiobook layer adds another voice. The narrator is the building itself. Measured, slightly detached, the tone of someone recounting events they watched from the hallways. The entire chain from user question to final narration involves no pre-written script and no human voice actor.

Sample: Interstitial Scene

The user never spoke to Eddie (the night chef) or Priya (the bartender). The Weaving Agent wrote this scene and placed it between chapters, based on what other characters revealed.

Eddie wiped down the bar in circles that had nothing to do with cleaning. The rag was dry. The wood was spotless. But his hands needed occupation, and Priya was watching him from the other end with the kind of stillness that meant she was about to say something he wouldn't like. "Four twelve's been talking," she said. "Everyone talks." Eddie folded the rag into a square, then unfolded it. "That's what people do." "Milton went up. Came back different." Priya set a glass on the bar, her own, fernet and something citrus, the smell sharp enough to cut through the kitchen residue that clung to Eddie's whites no matter how many times he washed them. "Twitchy. More than usual." Eddie's hands stopped moving. "Milton's always twitchy." "Not like this. He was standing in the lobby for twenty minutes afterward. Just standing there. Holding his coffee cup. Not drinking it."

The scene plants a detail the user has not discovered yet: two plates were ordered to Room 412 the night Vance died. Nobody was checked in.

Sample: Found Document

The Weaving Agent also generates artifacts. This maintenance log, written by Caleb (the building superintendent and conspiracy theorist), connects details from two separate conversations the user had with different characters.

Maintenance log: page found behind radiator, fourth-floor stairwell Feb 14. Circuit trace, seventh floor Panel C does not correspond to any room. Wiring runs BEHIND the elevator shaft. Not to a room. To a space. Approximately 6x8 feet based on acoustics. No door visible from hallway side. Asked Solomon. He said put it back. Put WHAT back? I didn't take anything. Feb 16. Checked again. Panel C is gone. Not disconnected. Gone. The wall is smooth where it was. Paint is the same age as the surrounding wall. I measured the building from outside. The service corridor is six feet longer than the exterior wall allows. Went to tell Milton. He already knew.

Music and Sound Design

Composition layers

Multi-track composition. Hummed tones processed through neural timbre transfer.

Two audio layers run behind every conversation. A music track and an atmospheric track. The user controls volume for each.

The atmospheric track is diegetic. Rain on the hotel windows. Footsteps in the hallway that pass without stopping. Distant doors opening and closing. The elevator. The ice machine. These are sounds the occupant of Room 412 would actually hear. They ground the experience in a physical space and fill the silence between the user's question and the character's response.

The music track is non-diegetic. A score, the way a film has a score. It gives the experience cinematic presence. Without it, you are having a dry conversation with a face in a peephole. With it, you are inside something.

The music is composed through Neutone Morpho, neural timbre transfer. I sing, hum, play what I have access to. The model processes the audio directly, not through MIDI. If you feed it a violin model, your humming is interpreted as bowing. Tongue clicking becomes string-plucking. The nuance of the input (breath, vibrato, the way a note bends) carries through in ways MIDI cannot represent because MIDI only knows pitch and velocity.

I sing. The machine translates. The result is a composition I could not have produced alone and could not have afforded to commission.
Image placeholder: composition process / timbre transfer before and after

Without this technology, the music would not exist. The cost of hiring musicians for every instrument across multiple tracks was prohibitive. The tools let me construct what I want, settle on it, and then hire humans where I want that presence. None of this takes work away from musicians. If not for the tools, there would be no composition to take work from.

Timbre Transfer Demo
Voice to instrument through neural model.

A Score That Listens

The current build ships with a static score. The compositions play, they loop, they set a mood. But the architecture is designed for something more reactive. The base music, a low looping composition tuned to the hotel's particular brand of unease, plays when you arrive. As conversations deepen, the system is meant to respond. Sentiment analysis running against the live transcript, detecting shifts in tone: a confession darkens the harmonics, a joke loosens them, a long silence lets the drone breathe.

Each character carries a motif that layers in when they are present and fades when they leave. Wire brings something low and geological. Rufus (the hotel's resident clown, a former performer who never stopped performing) brings something theatrical and slightly too loud. Dotty, Room 308, faded British glamour, lipstick slightly outside the lines, been here longer than anyone but Wire. Her motif sounds like a gin fizz feels.

The pre-composed tracks serve as a refrain, a fallback, the thing you hum on the way out. Two things happen around them. The first is steering: the existing music shifts like a mood ring, responding to the conversation. Density, tempo, brightness, which stems are audible. The music changes color but it is still the same music. The second is extrapolation: the system generates new musical material grown from the existing stems. New phrases, new passages, new chapters of the score based on previous ones. The music gets longer, not just different.

The goal is not background music. The goal is a score that both reacts and expands. The static compositions get you most of the way there. The adaptive layer is what closes the gap.

The Three-Layer Stack

The adaptive score is designed around three layers, each independent, all mixed live. Layer 1 is operational. Layers 2 and 3 are built out in architecture but not yet wired into the production experience.

Layer 1: Stems. My compositions, split into stems, mixed via the Web Audio API. This is the refrain. The floor, not the ceiling. A stem mixer cross-fades between moods. This layer runs regardless of what else is available. No dependency. Always on.

Layer 2: Pre-generated variations. Before the experience launches, a batch job feeds each stem to a music continuation model with mood descriptions ("continue this drone but darker, more dissonant" / "continue this melody but sparse, hesitant"). Thirty to forty variations, stored as files. The stem mixer would load them as additional layers on demand. No real-time API call. The variations are pre-computed.

Layer 3: Live generation. During a conversation, a WebSocket connection to a real-time music generation model would run alongside the stem mixer. Sentiment analysis on the transcript feeds steering parameters. A character confesses something, density drops, brightness drops. Rufus is being theatrical, density rises. A long silence, BPM slows, density approaches zero. Vance appears, everything gets uncanny. The live output mixes under my stems at lower volume. It is accompaniment, not replacement. My tracks are the voice. The live generation is the room reacting.


The Authorship Cascade

Six layers. Each bounded. Each operating within the space defined by the layer above it.

Human author designs the possibility space. AI performs conversation within constraints. The Writing Engine composes the chapter from the transcript. The Weaving Agent reads all chapters and binds them into a story. The narrator voice reads the bound manuscript aloud. And the user, whose questions generated the raw material, is present in every layer without controlling any of them.

No two users receive the same book. Not different content alone, but different structure, different interstitial scenes, different emphases. The same user returning to the same character might generate a different chapter because the AI makes different compositional choices. The authorship is irreducibly plural. You cannot point to a single author because there is not one.

The closest analog: a radio drama, produced on demand, from your own questions, performed by actors you never directed, and then written up by an author you never met. Every copy unique.


Sensory Packs

The fiction does not stop at the screen. Physical objects extend each character's story into the actual world. Serious play disguised as a gift box. These are not merchandise. They are narrative instruments, closer to the physical objects in immersive theater than to product tie-ins.

The Hotel Provisions storefront

The Hotel Provisions. Partial view of two of the sensory packs.

Wire's pack: a tea tin, sugar cubes, a racing form with handwritten notes in the margins, sandalwood scent, and a key stamped "614." The objects do not explain themselves. They sit in your hands and dare you to connect them to something Wire told you. Or did not tell you.

Character pack details

Marisol (concierge, keeper of secrets), Eddie, and others. Sealed letters, cocktail recipes, scent vials.

Personalized Polaroids

Polaroids returned. A night you did not remember. You are in the photo.

The PULSE cologne, the one Tane wears. Tane is Wire's younger brother. Reckless, magnetic, cocky, dressed like a car crash at a nightclub. Synthetic musk, pheromones, rubber accord, patchouli, castoreum, sandalwood, petrichor. Overconfidence in a bottle.

The personalization trick: at checkout, buyers provide a name and photo. No explanation. "Send us your face. Trust us." Weeks later, their pack arrives with Polaroids showing them at a club with Tane. An evening they have no memory of. They become part of the fiction.

"R.I.P NIP." A temporary tattoo from Tane's sensory pack. A memorial to a nipple lost in a motorcycle stunt. Woodcut by the author. Every item in the pack has a backstory the character will reference in conversation. Tane will tell you the motorcycle story if you ask. The tattoo is the joke. The joke is the shield.

R.I.P. NIP woodcut design

R.I.P NIP. Woodcut. One of five temporary tattoos in Tane's pack.

R.I.P. NIP tattoo applied

Applied. The tattoo on your arm and the voice through the peephole reference the same fiction.


The Bar Guide

Every character has a drink. The drinks are authored the same way the voices are. Wire does not drink what Tane drinks. Milton does not drink. Not on purpose.

The Sixth Floor Toddy

Wire's toddy. The honey is from home.

Wire / The Sixth Floor Toddy Mānuka honey. Good whiskey (he has opinions). Kawakawa leaf. Hot water. Warm your glass first. Add a generous spoonful of mānuka honey. Pour whiskey over. What Wire drinks when the hallways get too quiet. The honey is from home. The whiskey is from somewhere else.
The Cold Coffee

Milton's coffee. He forgot it two hours ago.

Milton / The Cold Coffee Coffee. Time. Forgetting. Brew coffee. Set it down somewhere. Get distracted by an idea. Return two hours later. Drink it cold. Milton does not order drinks. He forgets them. The mug on his workbench has a skin on top. He will drink it anyway.
The Last Word He Never Said

Vance's last drink. Or so they say.

Ghost of Vance / The Last Word He Never Said Malört (1.5oz). Honey syrup. Fresh lemon juice. Absinthe rinse. Regret (optional but inevitable). Rinse a coupe glass with absinthe. Swirl and discard. Shake Malört, honey syrup, and lemon juice with ice. Strain. This is what Vance drank the night he died. Or so they say.

The bar guide is part of the fiction. Each recipe extends the character into the user's kitchen. You mix what they drink. You taste what they taste. The boundary between the hotel and your apartment thins by one more layer.

Before the experience begins, the user can snap a photo of their liquor cabinet, fridge, pantry. Solomon passes it along to Priya. Minutes later, a drink recommendation appears on a cocktail napkin, written in her voice, based on what she actually saw in the photo. Your bottles. Her opinion. The hotel reaches into your kitchen before the first knock.


The peephole forces you to accept that you only have partial truth about a fictional hotel. But the algorithmic feeds on your phone do the same thing every day. Distorted, constrained views of reality, delivered by unreliable narrators driven by hidden architectures. The difference is that the peephole tells you it is lying. The feed does not.

If you can learn to interrogate the peephole, to sit with contradiction, to hold two conflicting accounts without collapsing them into certainty, maybe the same instinct transfers. The Knock is practice for a world that already operates this way.

Someone is at your door.

hearsayexperiences.com