# Development log # November 19th, 2025 ## Rhythm Developing a theory of rhythm for unheard. There is a natural tension between relative and absolute positioning of time objects in time. Let me try to define relative positioning. Absolute positioning is positioning a time object at an absolute location on a timeline. For example, an absolutely positioned time object might span [1 4]. Part of the definition might include the constraint that the position of absolutely-positioned time objects is known at compile time. A relatively positioned time object is positioned relative to some other entity (a container or another time object, for example). A time object is relatively positioned if the related entity's position is not known at compile time. Maybe a more interesting differentiation is time objects that are known to exist at compile time vs time objects that are not. Relative positioning makes the following challenging: - Relative positioning makes seeking challenging, since you have to play through the timeline to get to an event. It is also not obviously clear how temporal indexing would work with Absolute positioning is substantially less flexible. How do loops work I think this is probably one of the hardest pieces for me to get right. I'll be doing a lot of writing to help me better understand the problem space. Here are some of my high-level goals: - A composer should be able to decompose their composition into phrases, just like a programmer can decompose a program into functions. (Hint: phrases are functions!) A composer should be able to work on a phrase in isolation, at times iterating upon it without regard for the entire composition. At other times, it should be easy for the composer to hear a phrase in concert with other phrases. - It should be intuitive for a composer to write a piece that changes between time signatures. - It should be intuitive for a composer to express polyrhythms. - The playback engine should provide affordances for starting loops or phrases at the next downbeat. This interacts with the polyrhythm constraint in various ways. Regardless, I want to minimize surprise. - Seeking (that is, jumping to a point in a composition) should be instantaneous. This has far-reaching consequences: in particular, it precludes certain kinds of iterative composition, where state t2 = f(state t1). - Looping should be easy and flexible. Note, though, that the ban on iterative composition imposed by the previous bullet point seems to imply that looping can't exist at all! Fortunately, I have some theories for how to resolve this conflict. They need to be worked through, though. - It should be possible to slow down the tempo of a song to (nearly) arbitrarily small values. Same for speeding up. This would allow for some kind of rhythmic fractals, where tempo slows down forever (while new subdivisions of tempo appear continuously). I've been looking into how "infinite zoom" fractal renderers represent zoom steps numerically, and hope to lift some of that into the playback engine. - Related to the above, I want to provide some kind of mechanism for expressing recursive rhythms – that is, a rhythm that plays concurrently at tempo t, 2t, 1/2t, 4 t, 1/4t, etc. (A composer would specify the number of iterations they want to play above and below t.) This mechanism would account for tempo zooming automatically. As the theory of rhythm develops, I'll find that some of the above goals are fundamentally incompatible, so I'll have to make choices. That will be part of the fun. ## Key questions - The monotonic clock is ticking. That's cool. But what is the relationship between monotonic clock time and a note's actual input? That is, how does a clock (wall or pulse) query a note's presence? Especially when they are composed? ## Things I'm pretty sure about - Base unit should be a beat, and durations should be expressed as fractional beats (e.g. 1/4, 2/1). - Take a look at my "static" topology diagram, and consider that flows can bind to positions. Given that, maybe it is possible to have dynamic topologies. Put differently: As long as every dynamic topology input flow is continuously defined, then going "back in time" can mean "reversing the current state". This _DOES NOT_ mean reversing time, or even guarenteeing that what you play backward is the same as what you play forward. What might happen when you hit play under this model? You pass a flow into a player The player sets time to init-time The pulse clock starts pulsing ## Random notes What is the "smallest time unit" that we want to represent? We have to specify the number of "smallest time units" in a pulse. I _think_ this is tightly tied to recursive zooming in. We can specify durations as [128th notes, zoom level] We increment zoom level when zooming in We can set a "max frequency" for control information (say 100hz). Then, our time-recursive function can short-circut any recursive objects whose children have a minimum frequency that falls below the max frequency. This will require that time objects emit min-frequency metadata, which can be derived by finding the longest child. This is a wild theory - test this tomorrow. Actually, a better solution would be to just put the composer in control. The composer should specify that they want n doublings or halvings of a given phrase played concurrently. ## Solving for: ## Time signature changes How does this interact with looping? ## Concurrent phrases with differing time signatures It should be possible for concurrent time signatures to emit their: - Offset - Numerator They can be then merged together to create a data structure containing lazy seqs of "beats" at each combination Actually, just provide a function that takes offsets and numerators and the "combination" that you're looking for, and returns a lazy seq of beating indices ## Tick frequency Monotonic clock frequency is tied to doubling or halving of tempo, along with "discrete nyquist" ### Looping What would it mean for each "phrase" to have its own derived timeline? (Maybe not literally a phrase, maybe some container designed for this). and what would it mean if these phrases could have timeline offsets that repeat? ### Jump back Similar to a repeat in music theory. A timeline GOTO. ### Static repeat Not actually a loop. This is calling a phrase again and again, adding the phrase's length to the offset at each repetition. ## Phrase isolation It should be easy to play a phrase by itself. In fact, it should be possible to loop a phrase so yo can just hear it while you're working on it Should create repl.play and repl.repeat: (play phrase) (play phrase 130) (repeat phrase) (repeat phrase 130) (stop) ### Start a phrase at the next downbeat ### Start a phrase at the next polyrhythmic co-downbeat ### Abstract phrases away as functions ### Make seeking both easy to reason about and performant ### Sensible reverse playback (rolling a tempo from pos to neg) ### Express fractional notes sensibly # November 20th, 2025 ## Summary of yesterday's "theory of rhythm" noodling: Must support: - Playing a phrase in isolation - Looping - Seeking - Time signatures and time signature changes - Coming in at the next downbeat Don't box out: - Polyrhythms - Unlimited slowing and speeding Looping means a few things: - Jumping back, identical to a repeat in music notation. Timeline goto. - Static repeat. That is, play phrase x n times. Note that this is is repeating, not looping. Insights: - Don't support iterative composition, where f(t2) = f(f(t1)). Iterative composition makes seek time linear with composition length. - The monotonic clock frequency is closely related to the maximum number of events per second that we want to emit. - Theoretically, infinite zoom could automatically derive the number of doublings and halvings to play by looking at the monotonic clock frequency and the longest event in a phrase. Doublings that would result in no event changes due to all events falling below the monotonic clock sampling rate could stop upward recursion. This seems complicated to implement in practice, but would technically work. (Think nyquist.) - Concurrent time signatures could emit their start offset and numerator. We could provide a function that takes offsets and numerators and returns a lazy seq of polyrhythmic downbeat positions. Open questions: Should base unit should be a beat, with durations based as fractional units of a beat (e.g. 1/4, 4/1)? Or should base unit account for time signature denominator? My hunch is that the internal representation should be fractional beat, with helper functions that convert current time signature to fractional beats. ## More brainstorming Let me try to describe how the various decisions suggested above could compose. I'll start at the end, and work backward. You have a musical phrase representing your composition or a part of it, and you want to play that phrase. The phrase is a `timeline` object. The playback engine will query the timeline object. First, you connect it to the playback engine: ``` (on-deck phrase input-map-f output-map-f) ``` `input-map-f` is a function that accepts the playback engine's available input subsystems (initially just `{:midi midi-message-flow}`, but potentially containing things like dmx and osc,) and returns a flow of a map wiring up the input arguments of the composition. Now, play the song: ``` (play 120) ``` `play` starts the clock, querying the timeline object as needed. ``` (pause) ``` `pause` pauses the piece. ``` (resume) ``` `resume` resumes the piece. ``` (play) ``` `play` with no arguments starts the piece from the beginning. ``` (loop) ``` `loop` with no args automatically resets the playback position to the beginning at the end of the piece. ``` (loop start end) ``` This form of `loop` loops a segment of the composition. (Aside: A pulse clock emits integers at the pulse rate. The emitted integer represents the current _zoom level_.) Open question: The playback engine queries the timeline at clock positions. How does this relate to zoom levels? Is zoom level part of the query? ### Random thoughts Right now, phrases are eagerly composed together. This means that if you redefine part of a piece, you have to redefine all of its dependents, too. This might make interactive development annoying. # November 21st, 2025 ## Key insights There is no need for a minimum rhythmic division! Just use fractions of a beat, all the way down to the time-object / interval tree. It seems obvious in hindsight. ## New ideas Phrases can be named (at invocation time, not definition time). This will allow you to quickly jump to a phrase. Then, in the UI, we can the tell you where you are and where you can jump to. Note that since phrases can be nested, phrase names are concatenated into a vector of [outer inner], arbitrarily many deep. Looping can be enabled / disabled for a named phrase. When the playhrad rolls into an enabled loop, it will play until the end of the phrase, at which point it will jump the timeline back to the start of the looped phrase. This works with nested phrases / loops. If nested loops are enabled, the innermost loop takes precedence. ## TODO: Document the bug that could occur if two tim object flows invocations shared an identity Write down idea about the difference between a tree and a dag Write down the static topology decision / summarize theory of time. ## November 22nd, 2025 To think about: - What about wanting to change note durations live? Coupling notes directly to time objects seems like it might be too strict - What if phrases / groups were also time objects? I feel like I'm fundamentally questioning everything :/ - Maybe create "static-note" which is a time object, and "dynamic note" which is not? Maybe this panic is not a big deal. I think I should go ahead and continue with "note as fixed duration". After all, a note doesn't _have to_ play during the entire duration. It's more like it _can_. Swing, variability, etc. can all still be accomplished via other means. ## November 25th, 2025 Okay, time to write down some confusion. What do all the time objects eventually get merged into? Something with the following qualities: - It is a data structure representing various output effect providers, e.g. MIDI - Each output effect provider has its own... mergin semantics? - Time objects are created and destroyed? Is it possible / does it make sense to differentiate over the stream of all events emitted from a phrase? One dimension to differentiate over would be the flow associated with a particular time object. I already know how to do this - I could modify the reconcile-merge function to accomplish this. I think there is probably another dimension that I'm not thinking of. In my original group operator, I emitted a value for content in the "enabled" state, and I emitted the empty set in the "does not exist" state. My original `poly` performed a set union of all active notes. This then would have been differerntiable; I could have used the differentiate function from my new missionary.util. What is less clear to me is how to achieve the same behavior with time objects. I think that I could modify note to return e.g. the empty set in my note function. But how would I then group them? Would I want to couple the grouping context with note? OH! Maybe we want to group-by e.g. :note? That is, each time object-producing function would be able to direct its contents to a differentiable grouping operator for that particular type? Remember that this was the original definition of poly and group: (defn poly [& notes] (m/signal (m/cp (apply union (m/?< (apply m/latest vector notes)))))) ;; TODO: Group could actually wrap note, rather than using explicitly ;; WIll introduce a lot of GC churn, though (defn group [clock start end content] (m/cp (let [content (m/signal content)] (if (m/?< (m/latest #(<= start % end) clock)) (m/?< content) (m/amb #{}))))) Oh, here's an idea. What if we merged the union semantics from poly with the lifecycle semantics of reconcile-merge? That is, rather than emitting from each time object's flow, we instead unioned the latest elements from each time object? Can I do that? Why did poly originally work? Each note emitted either the empty set or a set containing its value. Multiple notes' groups were merged emit-wise: #{1} #{} #{} -> #{1} #{1} #{2} #{} -> #{1 2} #{} #{2} #{3} -> #{2 3} That is, _each note's state_ was sampled on each emit. This is due to the behavior of m/latest. latest is fundamentally not a differentiable operator. The behavior of reconcile-merge is more like: #{1} -> #{1} #{} -> #{} ;; Failure! We forgot about the presence of #{1} Could we turn that into: {} -> #{} {1 true} -> #{1} {2 true} -> #{1 2} {1 false} -> #{2} {3 true} -> #{3} Yes! We could! And in fact this is what the differentiate function does. Now, how do we get {1 true} and {1 false}? That is, where do we define that 1 has been created and then destroyed? One of the things that m/latest relies on to work is the notion of "being able to sample everything at once." That is, "sampling all notes at once."" The set-events function is allowing us to group lifecycle events by time-object flow. We could use time-object lifecycle information to emit {id true} at the start of a time object's lifecycle and {id false} at the end, but this would require that we have a notion of identity for each time object. The question is, do we always? Imagine (note ... (m/ap (m/?< clock))), e.g. the value of a note is dynamic. Then, what is its identity? Ah! So this is one of the core differences between emit-wise grouping using m/latest, and differentiated lifecycles. The former allows for _anonymous_ object identities, while in the latter, objects really do need IDs. But do the IDs need to be visible? I think maybe not. (might be wrong, though.) It might be possible to create identities for them within the body of reconcile- merge. (Generate one, then create :up, then include on emit, and then :down). I like this idea.