Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reasons for modules #569

Open
masak opened this issue Oct 6, 2021 · 69 comments
Open

Reasons for modules #569

masak opened this issue Oct 6, 2021 · 69 comments

Comments

@masak
Copy link
Owner

masak commented Oct 6, 2021

Issue #302, that enormous discussion that hovers around fexprs and Kernel but sometimes veers out into the unknown and asks really deep and important questions about computation itself (and how it's managed to manifest itself in our timeline's history of programming languages), contains the following paragraph (by me, replying to @raiph):

This week, as I've been pondering writing this down, I've been struck by something else, which I'm at even more of a disadvantage to express clearly: namely, how practically every major innovation or focal point in computer science manages to place itself on a spectrum between lambda and process: functions, co-routines, libraries, modules, tasks, threads, OS processes, cores, processors, network nodes, services... all of them "proto-actors" in the sense that they stake out some things that are separate, and others that are shared. Actors go the whole hog, jumping to the obvious end point where everything's separate. Only asynchronous messaging remains. The proto-actors are of course still interesting to study on their own, but maybe partly because they help tease out different aspects of actor separation.

I'm on vacation, so I thought I would tackle this "tower" of abstractions, describing them one by one. I will try to do all of them justice, extracting both their angelic "essence" or perfect form of each feature — what remains, in a sense, when you don't have to be all that concerned with the inconveniences of actual implementation — and also their demonic "substance", the imperfect form of each feature — what inevitably happens when you try to trap it in a machine.

(As a concrete example: the angelic essence of functions is something like the isolation guarantees and re-use that they give you, whereas the demonic substance is what leads to stack overflows or references outliving their stack-allocated data.)

As much as possible, I will also refer to real actual languages out there, especially when they make for pure exemplars of the feature.

I will try to get through all this in a single evening, aiming for coherence of thought rather than comprehensiveness. I'm not 100% sure what I will get out of this, except that I'm hoping to have a better overview afterwards of these different rungs on the abstraction ladder:

There is still a clear connection with Alma: during the design of Alma I've sometimes encountered parts of the challenge of introducing these things into a language. (Maybe most keenly felt with modules and types, but also to some extent with coroutines.) I'm not saying that this issue will provide any insights that I wish I'd had earlier, but... it's something to aim towards, I guess. Or if not insights, then at least a much-needed vantage point.

This issue is called "reasons for modules", which is not a particularly good name. But it's how I've been thinking about it — modules sit somewhere towards the middle of the ladder of abstractions, and the word "module" means different things to different people, but they are the prototypical thing which helps bring structure and separation to a program. In some vague sense, all the features on the abstraction ladder are modules, of one kind or another.

@masak
Copy link
Owner Author

masak commented Oct 6, 2021

Functions

I've written "Functions (units of functionality)" in my notes. I'll have "units of <noun>" written down for each of the rungs on the abstraction ladder, but I realize that those easily become just empty words, so let me try to back it up: you put code inside a function, and that code forms a unit.

In the early days of assembly code, the whole code section was a single thing, and there was no strong syntactic boundary separating a function from its outside. It all ran on convention. When you went through the motions of jumping back to what called you, that's when the function ended. It was an easy thing to jump into the function at different starting points — somewhat weakening (what we now think about as) the integrity of the function boundary.

I recall vividly that in Knuth's TAoCP books, the MIX machine uses as its calling convention a kind of self-modifying code which sets the return address of the caller on the jump instruction at the end of the function. (Yes, really!) MIX is and old standard, and that particular design decision didn't stand the test of time — indeed, the newer MMIX architecture uses a more traditional stack-based approach. More on this under "First-class functions", below.

Functions provide a rudimentary separation of concerns. Many of the programs in "BASIC Computer Games" just ran on and did what they did with GOTOs. Maybe there was the occasional GOSUB and RETURN; I'll need to go back and check. If there was, it was still not fundamentally more "bundled up" into individual functional units than assembly code.

In the paper The Use of Sub-Routines in Programmes, David Wheeler begins with this sentence: "A sub-routine may perhaps be described as a self-contained part of a programme, which is capable of being used in different programmes." Already from the start, the motivation was a kind of modularity.

The separation of concerns that functions provide takes many forms, each helped by the pure idea of a function:

  • Isolation. What happens in Vegas stays in Vegas. The added idea of lexical scope helps splendidly with this.

  • Re-use. If two parts of a program (or larger system) both want to call the same function, they can do so. The cost of more and more callers of a function scales linearly with the number of callers, and scales more slowly than (say) re-writing the function by hand at each call site. (This is true whether we refer to program size or programmer time.)

  • Coupling/cohesion. By encouraging re-use, functions can help decrease coupling. That's very abstract when put like that; sorry. I guess the coupling they remove is the kind that would have appeared in the form of duplicated code... Heaven knows different forms of coupling and other problems are still possible, though. (Wikipedia lists a large flora of couplings.) Similarly, functions encourage but do not guarantee cohesion. There's a rabid "clean code" posse out there that insist methods should be small, sometimes indefensibly so. I'm pretty sure what they're after is this kind of "do one thing" ideal, poorly expressed.

  • By existing, a function establishes a consumer/provider relation. The ideal form of a function takes parameters as input and gives a return value as output. (We'll stay silent on whether a function is "allowed" to perform side effects, by which I mean we'll skirt the issue and implicitly allow it. Part of the reason is that the story about side effects is still very much being written/explored, by languages like Rust and Koka.)

  • Building on that last point, functions are also what opens the door to Design by Contract, and specifications. Whether these are specified formally and machine-checked, or informally with documentation and targeting the human reader, the main idea is that the function is no longer just its code, but also a set of assumptions and guarantees. This idea will come back later, several times.

Why does lexical scoping (also known as "static scoping") fit so well with functions? From the angelic perspective, it's because this scoping discipline provides a form of stability, tying variable scopes to "blocks" in the code — that is, in the written program text. It doesn't get much more clearer than that: your variable lives until the } of the block in which it was declared. From the demonic perspective, it doesn't fit particularly well, and is not the first scoping discipline that tends to come to mind — dynamic scoping is.

Or rather, dynamic scoping is the natural thing for an interpreter to adopt. The interpreter follows the control flow of the program itself; if it carries around a shallow ("one level deep") set of name bindings, then dynamic scoping is pretty much inevitable. Lexical scoping is the perspective taken by the compiler, which follows the program text rather than the control flow. It took from the late 50s to the 80s for the Lisp world to make the transition from one to the other; indeed, they started with interpreters and only gradually became keen on compilation.

Functions provide a guarantee which will be cast in a sharper light when it is broken later. It doesn't have a name in the literature that I know of, so let's call it the dynamic embedding guarantee. It has two parts:

  • The entire machine has a single thread of control.
  • When a function is called, the single thread of control is transferred/borrowed from caller to callee. When the function returns, it is handed back.

Think of it as a "spark" running around executing your program. (This is the entity that debugging/stepping through code makes manifest.) Calling a function means transferring that spark. Because the dynamic extent of the function's execution (the interval on the timeline when the function was active) is strictly contained in that of the caller's function's execution, the whole program run forms a perfectly nested structure, a tree of calls.

It is at this end of the spectrum that functions are perfectly oblivious, not "aware" at all of the process that's running them. Looking ahead towards the other end of the spectrum: actors are the total opposite — not only are they aware of the process running them, they've taken it over to the extent that they are the process that's running them.

Function calls compose well. This is why we can talk of call trees, because a function can take on the role both as a callee and as a caller. I'll save the discussion about re-entrant functions till the "First-class functions" section. Same goes for when functions compose in two other ways. Here I'll just note that it's interesting that the angelic nature of functions demands a lot of them, and functions, it seems, are up to the task.

There is a slight asymmetry between parameters and return values: in almost all languages, you pass 0 or more of the former, but receive exactly one of the latter. The introduction of tuples means there's a way to reconcile this apparent conflict — if what you're passing in is always a tuple, it can be both a single thing (to the implementation) and 0 or more things (to the user). It's a really nice idea, but I can't recall seeing it implemented perfectly in any language in practice. The same trick allows a language to emulate multiple return values — for some reason, I see this implemented well in many languages (C++, Perl, Raku, Bel).

@masak
Copy link
Owner Author

masak commented Oct 6, 2021

Coroutines

Known under many names: fibers, green threads. (The term "green threads" comes from the fact that some part of Java implementation was called "Green" at some point.)

I've written "units of cooperative multitasking" in my notes. The operative word is "cooperative" — if one agent refuses to pass on the baton, the multitasking doesn't work.

Recall the dynamic embedding guarantee. We now weaken this in the smallest way possible: there is still only a single thread of control, but there is now a way to transfer it between active functions. ("Active" here simply means "currently being called", that is, we're "in" that function's body; there's an activation frame for it.)

This breaks the LIFO assumption of call completion: functions now get the ability to "pause" execution and hand the baton to some other active function.

In a sense, it also breaks the assumption of "single thread of control" — but it depends what you mean, actually. It does break it if you mean "we have one and only one call stack". It doesn't break it if you mean "at any given moment in time, exactly one function is executing". (Real threads break that, but not green threads.)

Interestingly, this affords a greater separation of some concerns, even though it surely feels as if we're making things more mixed-up. The reasons for this I cannot verbalize well, but it has something to do with separating concerns along producer/consumer lines.

Coroutines enable stream processing. A producer can generate data in individual sequence items, and the consumer can accept those sooner than if it had to wait for the whole sequence to be computed. The total "delay" involved is the same if we don't have real threads, but we might see partial results sooner.

Unix pipelines are an excellent example of this. (Except that they can sometimes use more than cooperative multitasking.)

Russ Cox has a good article on the original reason for CSP (communicating sequential processes) being separation of concerns. That is, just as functions help us separate some concerns, coroutines help us separate others.

A demonic implementation concern that gets mentioned here is a "spaghetti stack" (or "cactus stack"). This is just the idea that the stack is no longer a singly linked list of call frames, but a tree of them. The leaves of this tree correspond to the active-or-paused functions.

Since this is our first encounter with suspending the execution of a function, we also need to bring forth what (demonically) needs to be saved for the function to be able to resume later:

  • The "instruction pointer" (which statement or part of expression we should continue on later)
  • Local state (local variables if you're the human reader, local registers if you're the machine)

This is enough as far as "information that needs to be saved" goes, but there's one additional detail that matters from an angelic perspective: assume that the keyword for "cede control to caller" is yield. What would be an appropriate value of a yield expression? The question at first appears not to make sense — we've just ceded control. But at some point, it will be handed back to us. And at that point, it might be handed back with a value, much as a function is called with an argument.

Python implements exactly this model. I think it was Erik Meijer who said that the bidirectional aspect of yield is what makes it especially difficult to give Python coroutines a sensible type.

Coroutines anticipate CSP (Communicating Sequential Processes) — more on which later — in the sense that you can implement the latter with the former. Or maybe better said that coroutines look like a line or a singly linked list (like Unix pipelines), but CSP looks more like a directed graph.

@masak
Copy link
Owner Author

masak commented Oct 6, 2021

First-class functions

Taking a step in an orthogonal direction: given how absolutely awesome functions are, is there any way they could be made more awesome?

According to Christopher Strachey, yes: we can make functions first-class, that is, reified values in the same sense booleans and numbers are values.

"First-class" is not a complicated concept. It means we're allowed to do these things:

  • Pass as a parameter
  • Store as a variable
  • Return as a return value

For some reason, Computer Science rewards those who make concepts first-class citizens. My best guess as to why: Denotational Semantics already talks about "valuations" of program fragments: turning a bit of syntax into an actual value. By making those values first-class, we are allowing the program to hold and refer to those valuations directly. This tends to bring more direct power into the language. I've written elsewhere about the drawbacks of making things first-class.

The Lisp family of languagues ended up being the ones who struggled with these questions. With the comfort of hindsight, it was simply a transition from dynamic scoping to lexical scoping... but in the coincidences of actual history, they were called "downward funargs" (function values passed as arguments) and "upwards funargs" (function values returned as return values). (John Shutt remarks on the fact that "funargs" implies that the "arguments"/downwards case somehow takes priority, which is indeed how the Lisp world thought of it.)

Take the "downwards funarg" case first. This is what happens when you give an anonymous function to map. Lexical scoping and dynamic scoping actually agree on what should happen here, which is what makes this the easier case: variables accessed that were declared in the outer function should still be accessible normally. Lambda calculus and the semantics of binders makes this a non-issue; it is the demonic aspects of making this work on a machine that causes any trouble at all.

Now then, "upwards funargs". These were, historically, the real problem. Lexical scoping says it should just work: the function value returned should still have access to the outer function's variables. But from the perspective of dynamic scoping, and from the demonic perspective of "returning from a function means de-allocating its activation frame", upwards funargs are a real problem.

Upwards funargs "escape" from the function that defined them. This is the first push from the world of stack allocation to the world of heap allocation.

The historical answer to all this is closures: the joining of a function with the (lexical) environment in which it was created. The environment is what gives the closure access to the necessary outer variables.

In the history of Computer Science, there is no greater story than that of FP and OOP setting off in two different directions and yet reaching the same goal. From the point of view of closures, the story is how closures accidentally take on all the power of objects. The koan about Qc Na recounts this. From a demonic perspective, the act of returning a closure from its active function means the closure has to save the active function's environment, thus turning into something the shape of an object.

To me personally, JavaScript was the language that demonstrated this, since its first-class functions are faithful enough... but historically, Scheme was the language in which these realizations were made, and embodied.

Next up is continuations. I would say that the closures⇄objects analogy gets even stronger in the form of continuations⇄actors. But all in due time.

@masak
Copy link
Owner Author

masak commented Oct 6, 2021

Continuations

I have nothing in my notes about continuations. Fortunately, I've spent the last week implementing them, again, in Bel. I needed them for my "fastfuncs" mechanism, which is a cheaty way to make the Bel interpreter faster without having a full-fledged compiler. (I had falsely believed the fastfuncs could make calls of their own freely, but this ends up interacting badly with other continuation-based Bel functionality (such as exceptions), and so in the end, the fastfuncs need to implement continuations as well.)

Continuations are what you get if you start from regular second-class functions, and then add both coroutine functionality and first-class-ness:

  • Continuations store exactly the same data as coroutines: the instruction pointer, and the environment to use at that point. (And they accept a single value as a parameter.)

  • Continuations don't have to be first-class, but when they are, they are indistinguishable from first-class functions.

Holey manoley! I've never seen anyone describe it like that in the literature, but I believe it's true: continuations are simply first-class coroutines. Categorically, they are the pushout of coroutines and first-class functions, taking plain functions as their common base.

A corollary of this is that if your languages has first-class continuations, you don't separately need it to have coroutines; they are but a special case.

My mental model is this: functions abandon their role as the fundamental unit of computation, and hand it to a smaller entity: what typically is called a "basic block", characterized by the fact that jumps/state transitions can only happen between basic blocks, never within them.

If it gives you warmth and comfort, you can think of these basic blocks as "mini-functions". They have the additional restrictions that they do not contain internal calls — a call counts as a jump, and so needs to happen at the end of a basic block.

There is a one-to-one correspondence between the "control-flow graphs" people used to draw in the 70s, and basic blocks/"mini-functions".

Continuations turn the call tree into a directed graph. In doing so, they also erase any distinction between "calling a function/passing a parameter" and "returning from a function/passing a return value" — these are both just instances of a invoking a continuation. This seems to be a difference that stems from turning coroutines first-class: by mixing "control" and "values", we destroy the idea of "caller".

Continuations are useful as a basis for advanced language features: coroutines, exceptions, finally semantics, backtracking semantics... The implementation can still choose to keep them continuations hidden and "second-class" in userspace, if it does not wish to confer such awesome power to the language implemented.

Coming from the other direction, even a language which itself does not expose first-class continuations could use a programming style which imitates them (using first-class functions). This is known as "Continuation-Passing Style" (CPS). The clearest example of this that I know about is Node.js with its callback parameters, circa 2009. It's now 12 years later, and async/await is recommended instead, but there was a point when CPS was your best bet.

Since I've spent the past week writing (Perl) code in continuation-passing style, I feel I'm somewhat an authority at this style, more than I ever was when I wrote Node.js code in that style. You're writing your basic blocks as functions, which either return normally, or invoke another basic block. All you need to make sure is that all the basic blocks you need from the current one are lexically in scope. You can do that like this:

  • Declare all the basic blocks that mutually call each other (but don't initialize them)
  • Initialize them (in any order)
  • Call one of them to get things started

This covers the hardest case, that of looping and forward jumps. The cases of sequencing and choice are easier, and I won't cover them. Altogether, we can express things in such a CPS style, using calls for anything that previously required if statements, while loops, or even more exotic control flow.

Pro tip: if you try this at home, you should really do it in a language that implements Tail Call Optimization. The reason is that, since continuations don't believe there's a call stack at all, they will be sorely disappointed if there is one in the implementing language, and it overflows. Amusingly, the way this resolves itself in Node.js (and in my Bel fastfuncs) is that the stack gets reset regularly by async calls.

(Very Late Edit: And, whatever you do, don't mutate your local variables! (Something like a for (int i = 0; i < L; i++) { ... } is a bit no-no.) What you want to do is emulate mutation by passing such values around, kind of like the SSA approach to variables. If you don't do this, someone will take a continuation and catch you red-handed. I learned this the hard way.)

It is perhaps here it's also worthwhile to mention that, as part of the development of Scheme, Steele and Sussman found out (by implementing it) that continuations and actors are equivalent... under certain other reasonable assumptions. More precisely, passing a value to a continuation and passing a message to an actor can be seen as equivalent in power. Perhaps the biggest difference in practice is that actors are fundamentally isolated on the process level (and therefore asynchronous), whereas continuations are agnostic in this dimension: they allow it, but don't require it. To Steele and Sussman, that didn't matter much; to Hewitt, it did.

I feel I could write a lot more about continuations. I also have literally dozens of papers queued up about continuations, most of which I have not even skimmed. It's clear to me that they are powerful and fundamental, and that I still do not understand them well enough.

The removal of the basic block's power to call other functions is significant. Here is the first time we're saying "you can do many things, but you cannot do this". Continuations do not work properly if we also allow normal function calls. It is by limiting them that we make them work as they should. This act of limitation is the start of removing control from functions, and making it a global/distributed concern rather than a local one.

@masak
Copy link
Owner Author

masak commented Oct 6, 2021

This is as far as I got tonight. Looking back at my progress, I think I would need two more evenings to finish up. I'll try to rise to that challenge.

@vendethiel
Copy link
Collaborator

The same trick allows a language to emulate multiple return values — for some reason, I see this implemented well in many languages (C++, Perl, Raku, Bel).

That's a weird list of languages with tuple support. I'm not sure what those share in common wrt multiple return values?

@masak
Copy link
Owner Author

masak commented Oct 11, 2021

That's a weird list of languages with tuple support. I'm not sure what those share in common wrt multiple return values?

Aye, I fear I did not do that bit justice. Not sure it's about tuples what I'm after. Let me unpack (pun totally intended) my given list of language examples in search of some unifying trend:

  • C++ actually does have tuples. Someone with a better understanding of compiler optimizations might do a better job explaining them, but my impression is that they are essentially zero-cost; the tuple's existence on the returning and receiving side of a return can be completely eliminated, and only the values themselves transferred. I guess this is only really true with a number of caveats involving inlining and whatnot. My main point is that under some circumstances, C++ truly allows multiple values to be returned with zero overhead.

  • Perl 5 does not have tuples, and makes none of the claims to efficiency that C++ makes. But what it does have is an admirable symmetry in its functions' inputs and outputs. The inputs are found in a dedicated array @_; the most general form of the outputs is a list of values. This list can easily be assigned en masse on the receiving side. Even without the efficiency, it gets full points for allowing multiple values to leave the function and be taken care of on the other side.

  • Raku is much the same (although it got civilized parameter lists on the way). I honestly forget whether what return returns in Raku is a List or a Seq, but at least it can contain multiple values. What's more, the binding operator can make it absolutely clear that passing values into a function and passing them out of a function are symmetric acts.

  • I can't speak for Common Lisp, which I know does something with multiple-value returns as well, but... Scheme has a mechanism called values which seems absolutely bizarre to me, like someone had an honest momentary lapse of language design sense. The Kernel spec similarly rips into R5RS on this point, pointing out how only a confusion of ideas could have led to something like values. Interestingly, the solution both Kernel and Bel arrive on instead is just good ol' destructuring — even JavaScript has that nowadays.

Python doesn't have destructuring as such, although it does have some mitigating ways to assign from an iterator into a tuple of lvalues.

Which only leaves Java among the more popular languages. Java has absolutely nothing in this area of design space, which seems strange until you realize that Java was put on this earth to make the von Neumann bottleneck as narrow and uncomfortable as possible.

@vendethiel
Copy link
Collaborator

vendethiel commented Oct 11, 2021

Common Lisp and Scheme share that values mechanism. I'm not sure why they thought stealing that particular feature was a good idea, but they did nevertheless.
If you don't explicitly ask for all the return values, you only get one.

This is similar to how this happens in Lua, which is the only language with different semantics for "return values" than just unpacking (that lots of languages have):

In Lua, a multiple value is collapsed in several positions, but it "auto-splats" when used as the last argument of a function, inside an object, and is blocked by parentheses.

function f()
  return 2, 3
end
function g(...) end

local x = f() -- Cannot splat: assigns x = 2
local v = { f() } -- Can splat: assigns v = {2,3}

g(f()) -- No parentheses: calls g(2, 3)
g((f()) -- Parenthesized: calls g(2)

g(1, f()) -- Last arg: calls g(1, 2, 3)
g(f(), 1) -- Not last arg: calls g(2, 1)

@masak
Copy link
Owner Author

masak commented Oct 12, 2021

If you don't explicitly ask for all the return values, you only get one.

This isn't even my main beef with the feature (although that sounds like a pretty horrible default).

My main beef is that it feels like their goal was to deliberately create a second-class value in the language just for passing many values... OK in general maybe, but not in the context of a language whose central feature is to build rich tree structures out of a universal structure type. It's like the values feature was designed by someone who hadn't used Lisp/Scheme for an entire day.

In Lua, a multiple value is collapsed in several positions

hnn 😞

but it "auto-splats" when used as the last argument of a function, inside an object

eww 😱

and is blocked by parentheses

One thing that is really really hard in language design, for some reason, is to make parentheses in expressions keep meaning "just grouping, nothing else".

@vendethiel
Copy link
Collaborator

vendethiel commented Oct 12, 2021

It's like the values feature was designed by someone who hadn't used Lisp/Scheme for an entire day.

Or most programming languages, really. I'm just describing it for the sake of completeness, but I certainly won't say I like it at all.

One thing that is really really hard in language design, for some reason, is to make parentheses in expressions keep meaning "just grouping, nothing else".

It'd be better, yes. Well, the language with the worst example of this that comes to mind right away is C++ (if a function is decltype(auto), the deduced type is different for return 3; and return (3);).

@masak
Copy link
Owner Author

masak commented Dec 25, 2021

I haven't forgotten about this thread, but getting back into it is going to require bigger chunks of time — realistically, the next time I'll have those will be around Spring Festival.

In the meantime, I just found From Procedures, Objects, Actors, Components, Services, to Agents – A Comparative Analysis of the History and Evolution of Programming Abstractions. It comes without a strong recommendation — I have only skimmed it very lightly, not read it in detail — but the title itself was similar enough to the thrust of this issue that I wanted to mention it.

Might as well mention Sans-Papiers as First-Class Citizens, which I will need to pick apart and review in detail at some point as well.

@masak
Copy link
Owner Author

masak commented Feb 24, 2022

On Abstraction by Eric Scrivner:

Substituting a call to a subroutine for the contents of a subroutine builds a lexicon which can be used to reduce duplication and heterogeneity. Using a pointer or reference instead of a value reduces duplication and copying. Both facilitate compression of data as well as compression of semantics. Both of these give us leverage - the ability to do a lot with a little.

It is precisely here that a risk arises. We can view the whole purpose of abstraction as the dissection of a program into its conceptual categories. This was the profound error and misstep of object-oriented programming (OOP). This style of abstraction leads to the traditional problems of reason, logic, and language as thoroughly covered by Beiser. Alternatively, we can view the purpose of abstraction as compression guided by resemblance and analogy. This style of abstraction has poetry and literature as its exemplary arts. We are given these techniques of substitution in order find the boundaries and limits of resemblance and analogy in pursuit of compression. The choice of path on this fork is the momentous decision that determines the whole of the method we pursue in construction.

This reminds me of two things. One is the point about premature abstraction that sometimes pops up when discussing abstraction with @jnthn — the hard part isn't creating an abstraction, the hard part is the mental work necessary to arrive at the right/useful abstraction. In the limit, refactoring makes us brave in constantly re-evaluating and re-creating abstractions as needed.

The other point is that Bel's predicate-based type system seems to fit the description about "compression guided by resemblance and analogy". Unlike in OOP, there's not really a reified "thing" corresponding to a class or type. Just a predicate that answers t or nil when you give it an object.

@masak
Copy link
Owner Author

masak commented Mar 9, 2022

First-class functions

I just realized one thing.

Here's a sentence from cppreference:

When a user-defined class overloads the function call operator, operator(), it becomes a FunctionObject type.

So, a "function object" (which I've heard people call a "functor") is an object that acts like a function, in that you can use the () function call operator on it.

This is approaching first-class functions from the other direction, adding "it's a function" to an object instead of (as is traditional) adding "it's a first-class value" to a function. It's first-class functions from an OO perspective.

It clicked for me when reading this SO answer:

There are a couple of nice things about functors. One is that unlike regular functions, they can contain state.

This is a very C/C++ perspective, when "regular functions" mean "lower-order functions" by default, and these don't contain state. (Um. Modulo the static keyword, of course. The thing they don't contain is the kind of "instance-level state" that comes from evaluating a function declaration into a closure at runtime.)

Of course, I realized all this on some rough level when writing the original comment about first-class functions:

In the history of Computer Science, there is no greater story than that of FP and OOP setting off in two different directions and yet reaching the same goal.

Despite this, there's something concrete and satisfying about seeing the path actually walked in an objects-and-classes language, adding () to an object just because we can, and seeing a closure pop out.

JavaScript doesn't have operator overloading, but the JavaScript MOP consists only of Object and Function, and the only real difference is that Function allows you to () on the value. Instead of directly overriding the () on a class, you just make sure to create it as a function declaration, and then you can treat it as a regular object with properties. If you squint, that's actually a kind of operator overloading, too.

@masak
Copy link
Owner Author

masak commented Mar 16, 2022

My mental model is this: functions abandon their role as the fundamental unit of computation, and hand it to a smaller entity: what typically is called a "basic block", characterized by the fact that jumps/state transitions can only happen between basic blocks, never within them.

I don't know if it helps in any way, but "jumps/state transitions can only happen between basic blocks, never within them" — or, equivalently, "basic blocks are stretches of sequentially executed instructions" — is a kind of quotient construction; roughly a way to decrease resolution, making the notion of "instruction" bigger and more inclusive, and the notion of "(conditional) jump" more universal. Something related happens when looking for strongly connected components in a directed graph; the result is lower-resolution, but a part of the structure remains.

If it gives you warmth and comfort, you can think of these basic blocks as "mini-functions". They have the additional restrictions that they do not contain internal calls — a call counts as a jump, and so needs to happen at the end of a basic block.

There is a one-to-one correspondence between the "control-flow graphs" people used to draw in the 70s, and basic blocks/"mini-functions".

Continuations turn the call tree into a directed graph. In doing so, they also erase any distinction between "calling a function/passing a parameter" and "returning from a function/passing a return value" — these are both just instances of a invoking a continuation. This seems to be a difference that stems from turning coroutines first-class: by mixing "control" and "values", we destroy the idea of "caller".

All of this is true, but (as I come back to re-read this) applies equally well to coroutines. I think the CPS transform has already happened, conceptually, at least, the moment we introduce yield into a language.

@masak
Copy link
Owner Author

masak commented Apr 2, 2022

The removal of the basic block's power to call other functions is significant. Here is the first time we're saying "you can do many things, but you cannot do this". Continuations do not work properly if we also allow normal function calls.

It bothers me a little bit that we need to talk about a removal of power here; continuations in general feel incredibly enabling, in that they literally enable other features like generators, exceptions, nondeterminism, and other wonderful types of control flow.

Note that in Continuation-Passing Style it is the style itself which imposes a restriction, and with basic blocks it is our definition of what a basic block is that imposes the restriction. It doesn't come from continuations themselves, which are highly unconstrained and non-constraining.

Also important that the "stack paradigm" naturally suggested by regular LIFO function calls is also very restrictive, and by introducing continuations, we're transferring to a "heap paradigm". The non-obviousness and pain of doing so was the essence of the funarg problem, especially the upwards half where the stack is insufficient.

The stack paradigm is great, not just because it's faster in practice but because it ties an important kind of RAII thinking into the structure of the code itself. One of the tricky parts of good compilation is to get enough static knowledge of data flow inside of a function to determine that nothing continuation-shaped escapes, and therefore things can be stack allocated and therefore faster. The nature of such determinations means that we can do this less often than we'd like (and even when we can, it takes a bit of work).

@raiph
Copy link

raiph commented Apr 2, 2022

I'm curious if you get anything useful out of this.

(And/or a reddit discussion about that article.)

@masak
Copy link
Owner Author

masak commented Apr 8, 2022

I'm curious if you get anything useful out of this.

Yes! My immediate reaction is that "substructural type systems are cool and useful" — vide Rust, and Perceus, and, I dunno, unique_ptr (for all its flaws and weaknesses). Linear types can change the world, and all that.

But I think I'll leave it at that. There's a sense in which the whole substructural thing falls under the heading "Types", which I almost didn't even include in the abstraction ladder/table of contents/topic plan for this issue. I have an irrational fear of type systems, equal and opposite to Bob Harper's love of them. I mean, clearly the regimentation they impose ends up helping and helps us see further. But on whose authority except its own does the type checker reject my program? I'm not hating on type checkers, I just don't think we should be slaves under them.

I have more to say/write about well-foundedness, and the distinction between what Harper calls T++ and PCF++ (which mirrors what Hofstadter calls BLooP and FLooP) — it seems to me that ownership-tracking is to memory what T++/BLooP is to termination/totality/decidability. Possibly there's even a direct connection — but even if there isn't, the topics feel a little bit related.

@masak
Copy link
Owner Author

masak commented Apr 8, 2022

Relevant to the way continuations generalize regular functions and the stack paradigm (from a 2012 blog post by John Shutt about guarded continuations):

When a function f calls another function g, f gives up some of its ability to regulate the future evolution of the computation. The programmer tends to assume that not all such power has been lost — that (1) the call to g won't terminate without control passing back through the point at which f called it, and (2) that particular call to g won't return through that point more than once. In the presence of first-class continuations, though, even these weak assumptions needn't hold.

I called (1) above the "dynamic embedding guarantee". I don't think that I spelled it out that continuations obliterate the dynamic embedding guarantee.

Also note how (1) is about (returning) "at least once", while (2) is about "at most once". Continuations but with (2) are sometimes called "one-shot continuations", I think.

@masak
Copy link
Owner Author

masak commented May 25, 2022

A quote I just ran into, which corresponds to where I was planning to make this issue end up:

"The most important problem right now in computing is — how do we deal with concurrency and distribution?" — Philip Wadler - Propositions as Types (Q&A)

@masak
Copy link
Owner Author

masak commented Jul 12, 2022

Parts of this story gain a sharper edge by more carefully separating the pure/applicative/expression world from the monadic/heap-memory/sequential world, like PFPL does, which I'm currently reading with great appreciation.

On page 307 (of the paper book, second edition), a bit after the midpoint of the book, things culminate in this factorial procedure declaration:

proc (x:nat) {
  dcl r := 1 in
  dcl a := x in
  { while ( @ a ) {
      y ← @ r
    ; z ← @ a
    ; r := (x-z+1) × y
    ; a := z-1
    }
    ; x ← @ r
    ; ret x
  }
}

Given how what we're expressing here could be written in Raku as sub ($x) { my $r = 1; for 1..$x -> $i { $r *= $i }; $r }, some explanation of Harper's Modernized Algol syntax is in order:

  • dcl introduces a (stack-allocated) assignable
  • @ reads from an assignable (into the monadic world)
  • binds the results of a monadic computation to a variable (in the pure world)
  • := outside of a dcl evaluates its (pure) rhs and writes the value to an assignable
  • ret is the only way to communicate a value from the monadic world back to the pure world

Assignables only exist in the monadic part. Variables are typically associated with the pure part.

I wrote this earlier:

(We'll stay silent on whether a function is "allowed" to perform side effects, by which I mean we'll skirt the issue and implicitly allow it. Part of the reason is that the story about side effects is still very much being written/explored, by languages like Rust and Koka.)

The entire point of Harper/PFPL's separation is that side effects are locked into the monadic half of the language, letting the pure part keep many of its algebraic/equational properties. This is nothing new, of course; Haskell does exactly this with monads, and Koka does exactly this with effects.

All the talk about side effects anticipates the later talk about process isolation and actors. It's by making the side effects and the heap memory explicit (like PFPL does) that we can later isolate those things, and hide/encapsulate them inside something.

@raiph
Copy link

raiph commented Jul 13, 2022

It's by making the side effects and the heap memory explicit ... that we can later isolate those things, and hide/encapsulate them inside something.

In recent years I've been thinking of 6model as "actor model ready". Do you get what I mean? Am I deluded? Have I mentioned / are you aware of Pony and ORCA?

@masak
Copy link
Owner Author

masak commented Jul 14, 2022

In recent years I've been thinking of 6model as "actor model ready". Do you get what I mean?

I think so, but to the extent I do, I'm not sure I agree. I'll quickly add that this might be because you have pieces of the 6model puzzle that I lack.

In order to answer in more detail, I need to anticipate a few points from the future of this thread, specifically the last two topics in the projected list of "module-like things":

  • Communicating Sequential Processes
  • Actors

But as a baseline, the story must start with the status quo of shared memory, threads, and locks. I'll take all those mechanisms as given — there's oodles of literature on it — but I will mainly point out here that locks don't compose and threads don't scale. This book, specifically, spends the early parts showing how a perfectly well-designed class for single-threaded use can break under many-threaded use — it's humbling — the problem is a bit like "monsters from the 9th dimension" in that it manifests orthogonally to the correct code you wrote — and then spends the later parts advocating safer/higher-order building blocks, like atomic data, built-in cuncurrent data structures, and architectures built on the producer-consumer metaphor. (Further confirmed with BG's answer here of also covering "fork-join, parallel decomposition, and the new parallel bulk data operations".)

Communicating Sequential Processes takes the initial interesting step that assignment (to shared memory) is not a primitive operation, communication is. Specifically, this is a two-party action, so every such communication is also a synchronization point, and establishes a happened-before relation à la Lamport. This simple shift in perspective makes things scalable, and the focus now shifts to the (quite possibly dynamic) structure of communication channels between processes. Shared mutable memory is no longer a problem, because there's no global memory to share; you can only share along channels.

Actors similarly take a new primitive operation: (asynchronous) messaging. Actors are by construction so well-protected from each other that not only is the "shared mutable memory" problem completely defined away — but by themselves, actors don't seem to be able to synchronize anything between themselves. I'm still learning about this, but one solution seems to be synchronizers, which could maybe be seen as a common region in which synchronization on shared resources between actors can happen.

These two models are clearly related. In fact, we can encode either model using the building blocks of the other. Which means we can zoom out to a sufficiently high-altitude perspective where they both look the same, and present a common solution next to threads/locks: while threads and locks start out with a fundamentally un-synchronized resource (shared mutable memory) and then tries unsuccessfully to apply massive amounts of cleverness and discipline to apply synchronization in just-enough places to restore correctness, CSP and actors start out with a fundamentally synchronized resource, and then spend the rest of their lives happily ever after. (Although actors also need to add synchronizers as an extra component.)

Threads and locks are still popular despite being fundamentally unworkable, because people naturally tend to think of a solution as a concrete feature. CSP and actors start out by removing and restricting; more of a "negative feature". There's a quote about Dijkstra somewhere about how his main insight about concurrency is that it's not about adding something (such as "threading"), it's about removing a restriction/assumption (such as "the order of the statements (alone) determines the sequencing of the operations at runtime").

The story doesn't end there, either. There's something really interesting going on with what Rust and Koka are doing, basically exploiting the fact that "has a unique owner" is a necessary and sufficient condition for mutability. This is like carving out a "safe space" from the shared mutable memory, and the fact that something shaped a bit like a type system can do that is... interesting. You don't need to write actor { ... } around things to protect them, you just show statically that unique ownership is maintained.

Tying everything back to whether 6model is "actor model ready" — it would need to be in terms of some of the above primitives, I think. CSP/channels, or asynchronous messaging with provably no sharing, or (à la Rust) just provably no sharing. I'm not sure if 6model is more or less ready for any of those than other object systems.

@masak
Copy link
Owner Author

masak commented Jul 14, 2022

Have I mentioned / are you aware of Pony and ORCA?

@raiph I don't think so, on both counts.

Skimming the linked page, it sounds amazing, but it's also extremely short on specifics. I would be a thousand times less doubtful of the claims (sound + complete + concurrent) if there was a link to a paper containing algorithms and a number of proofs. Several questions arise, such as:

  • How can a dedicated actor collect other actors using just message-passing?
  • How, specifically, are the very different form factors of reference counting (which is usually centralized) and actor distribution (which has no built-in synchronization mechanism) reconciled?
  • When the word "transitively" is used, that implies a non-local property that needs checking in non-constant time. How is that property checked in a world where a number of synchronization mechanisms are proudly enumerated as not being used as the solution? (I.e. once you have traversed the object graph in order to confirm the "transitively blocked" property, how do you know that the nodes you checked at the start are still unchanged? "Blocked" implies "has no pending messages in its queue" — what prevents new messages from arriving anywhere in the graph while we are traversing it to assert all actors have empty queues?)

MoarVM's garbage collector is pretty cool, but it's far from this design along several dimensions: it's tracing/generational, not reference-counting; and it's based around threads and condvars, not actors and a mysterious lack of a need for synchronization.

@masak
Copy link
Owner Author

masak commented Jul 14, 2022

There's a quote about Dijkstra somewhere about how his main insight about concurrency is that it's not about adding something (such as "threading"), it's about removing a restriction/assumption (such as "the order of the statements (alone) determines the sequencing of the operations at runtime").

Ah; the quote I thought about is from this blog post:

Non-determinism

In 1967 R.W. Floyd had introduced non-determinism as an additional language feature. The proposal concentrated on how to implement it. For EWD it was a subtraction: by avoiding what he considered overspecification, it became a natural consequence of the semantics of the guarded-command language [footnote].

@masak
Copy link
Owner Author

masak commented Jul 19, 2022

A further thrust at the question of 6model's actor-model-readiness. Here's a quote by Robin Milner from his Turing award lecture:

Now, the pure lambda-calculus is built with just two kinds of thing: terms and variables. Can we achieve the same economy for a process calculus? Carl Hewitt, with his actors model, responded to this challenge long ago; he declared that a value, an operator on values, and a process should all be the same kind of thing: an actor.

This goal impressed me, because it implies the homogeneity and completeness of expression ... But it was long before I could see how to attain the goal in terms of an algebraic calculus...

So, in the spirit of Hewitt, our first step is to demand that all things denoted by terms or accessed by names—values, registers, operators, processes, objects—are all of the same kind of thing; they should all be processes.

I think this pinpoints something important: when Hewitt says "everything is an actor", he means something concrete/specific about his imagined system. 6model can reasonably claim that "everything is an object", and back that up with evidence in terms of abstraction/encapsulation boundaries. For it to take the step to claiming that "everything is an actor", it would need to witness that in terms of process boundaries and inviolable message-passing mechanisms, as the foundation of the whole model. (But when it's put like that, I'm not sure that is, or should be, the goal of 6model.)

If what we mean is "some things can be actors" (rather than the whole system being actor-based), then I believe this was conclusively asserted with the release of jnthn's oo-actors module in 2014. 😄

@masak
Copy link
Owner Author

masak commented Jul 21, 2022

Here's another reference for future reading and incorporation into the themes of this thread: The Purely Functional Software Deployment Model (Eelco Dolstra's PhD thesis). It's the basis for the Nix package manager, whose main selling point is that identical inputs give identical outputs — that is, builds are reproducible.

As a relevant aside, it would appear to me that golang is following a convergent evolution here, independently discovering the value of reproducible builds [1] [2]. I hope to get back to both Nix and golang/vgo when I write about "Packages", listed in the OP as one of the tour destinations. The thrust of my argument will be just this, that as an abstraction, packages are not stable/usable unless they guarantee reproducible builds. It's a wonder we get anything done in mainstream software engineering, absent that guarantee.

@masak
Copy link
Owner Author

masak commented Jul 29, 2022

The abstraction ladder, identified in the OP, also spans another axis: that between values and resources. Computer science tends to talk about values, but software engineering has an understandable focus on resources.

What's a resource? A database handle is a resource, a file handle is a resource — or they stand in for/represent resources, maybe. It depends how you categorize things. Threads and green threads are resources. Memory (including too-big-to-copy values in memory) is a resource. Input/output on a port is a resource (or the port itself is a resource?). Communication bandwidth is a resource. Network access is a resource. The abstract concept of compute (as in, computational throughput) is a resource. If I plugged a quantum co-processor from 2043 into my machine, it could constitute a resource.

I can give these specific examples, and I feel there is something that they have in common, but I can't clearly delineate that something. Values can avoid being about "the real world" in some sense, but resources are anchored in the real, physical world, and inherit limitations/properties from it. There are probably physicists who love that connection, whereby physics kind of bleeds into computer science.

It feels like there would be some straightforward connection to effects (and coeffects). Are effects simply about acting on a resource? The resource is the noun, the effect is the verb?

At its most pure, lambda calculus can be all about values. At their most expansive, CSP and actor systems can be all about resources. The abstraction ladder seems to straddle the values-resources axis. I wish I understood better how these things are connected.

@masak
Copy link
Owner Author

masak commented Jul 30, 2022

But as a baseline, the story must start with the status quo of shared memory, threads, and locks. I'll take all those mechanisms as given — there's oodles of literature on it — but I will mainly point out here that locks don't compose and threads don't scale.

I just stumbled over the userver framework, which looks nicely designed and quite flexible/powerful. This is what met me in the first tutorial I decided to click on:

Warning

Handle* functions are invoked concurrently on the same instance of the handler class. Use synchronization primitives or do not modify shared data in Handle*.

In other words:

  1. Here be dragons (race conditions).
  2. By notifying you (the framework user) of these risks, we (the framework authors) have done our due diligence; race-free semantics is not part of the framework itself, it's part of a general vigilance/discipline wherein sharing/modification is always guarded by the appropriate synchronization.
  3. What, you thought this would be easy?

Later edit: It's all good and well to be sarcastic, but when coming back and reading the above, I didn't want that to be my main point/final word. Specifically, the authors of the userver framework are operating well within the expectations given to them. This (concurrent access) is just that big of a challenge — I think it's fair to say none of us knows the final answer — people of the Rust and Koka communities can see a bit further, sure, but in the end we're all still hunting for solutions, trying them out, and learning for the next iteration.

@masak
Copy link
Owner Author

masak commented Jul 30, 2022

I want to call attention to this polemic post whose main message is this:

Provided your problem admits certain concessions that let you retreat from full-on garbage collection into reference counting, you're going to come out way ahead in the bargain.

I realized that I agree with this particular sentence because it's qualified, but then disagreed with the message of the rest of the post because I don't believe the qualification ("...your problem admits certain concessions...") applies in a general-purpose language.

In the post above about first-class functions, we showed that the special subtype of first-class function that people historically called "upwards funargs" have a tendency to escape their location of birth, thereby bringing their lexical context with them in a closure:

Upwards funargs "escape" from the function that defined them. This is the first push from the world of stack allocation to the world of heap allocation.

Of course I should have defined a static embedding guarantee that this breaks along the way:

  • When a function is called, the environment of the caller is a compatible subset of the environment of the called function.

This is true (a) if there's only ever one global environment and not many small lexical ones, or (b) if functions are not first-class/mobile. I know that's a little bit abstract, so let's break the guarantee three times:

In Bel:

$ perl -Ilib bin/bel
Language::Bel 0.58 -- msys.
> (def make-closure () (let env "closure" (fn () env)))
> (let c (make-closure) (let env "caller" (c)))
"closure"

In Perl 5:

$ perl -wle'
    sub make_closure {
        my $env = "closure";
        sub { $env }
    }

    my $c = make_closure;
    my $env = "caller";
    print($c->());
'
closure

In Alma (untested, but I have no reason to think it wouldn't work; #577):

func makeClosure() {
    my env = "closure";
    return func () { return env };
}

my c = makeClosure();
my env = "caller";
print(c());            // closure

Most modern mainstream languages have "given in" and introduced a mechanism that allows this kind of freedom of movement, whether via escaping first-class functions, or via heap objects being allowed to reference other heap objects. This is where the abstract graph of objects and references in memory turns from a tree into a graph, specifically one with cycles.

That's also the point where reference counting becomes problematic, and garbage collection turns into a more robust solution. The qualification in the post ("Provided your problem admits certain concessions...") is no more nor less than the static embedding guarantee: things are not allowed to escape their location of birth, or mutually reference each other, directly or transitively.

(Wait, what? Where did the "mutually reference" condition come from, weren't we talking about functions and lexical scopes? It comes from the Y combinator, which can be seen as a device that transforms firstclasshood of functions into function self-reference. Similarly, any "tying the knot" technique transforms freedom of heap reference (during object construction) into a cyclical object graph.)

The HN discussion seems pretty balanced. The top comment right now is saying that this is a never-ending and unwinnable debate. The top answer to it is reporting spectacular success with (mostly) reference counting.

The abstraction ladder also spans a spectrum between the specific (like reference counting) and the general (like garbage collection). Maybe the real lesson is that there is no one-size-fits-all point on that spectrum.

@masak
Copy link
Owner Author

masak commented Aug 9, 2022

I'm getting ahead of myself here, but not by a whole lot. I realized on the way to work today that there's a great transition between the idea of functions and the idea of modules.

Briefly, it's this: thanks to lexical scoping, functions own much of their data, and it's perfectly hidden from the world. (We'll sweep complexities around stack/heap allocation, first-class closures, and escaping under the rug right now. Just assume they are appropriately taken care of.) Modules are primarily that, a "perfectly hidden" guarantee for some data, but wrapping a group of functions. Keywords like export tend to switch off this perfect hiding; sometimes, inspired by the OO vocabulary, we'll be speaking of private and public instead, even though it's a module.

Various related concepts make this effect clearer. Separate compilation for optimizing the data parts that are private to the module. Types for communicating (statically) across module boundaries. Contract-based programming for refining ensure/provides relationships even further, and separation logic to reason about what memory is owned by what module.

That's it. Modules are, in this sense, a kind of "jumbo function" — the "jumbo" concept borrowed from Levy — they are the answer to the question "what is the biggest programming language feature we can imagine with the features we appreciate in functions?".

@masak
Copy link
Owner Author

masak commented Apr 26, 2023

So "nondeterministic [algorithm]" really means "not necessarily deterministic [algorithm]".

I'm almost sure a similar relationship holds between "parallel" and "sequential". Rather than those two being opposites, "sequential" is a special case of "parallel".

I forget the source, but I'm quite sure I've heard some famous person state this, in a lecture video or in text. Something like "a sequential run is a perfectly good ordering or a parallel program". I thought it would be in Parallelism is not Concurrency by Bob Harper, but it seems I misremember. (Edit: It's in Harper's summer course on YouTube, part 2, at the 1:22:40 mark: "The essence of parallelism is sequentiality. (What?) Because, how do I give rise to parallel execution? [By saying, these two processes P and Q are sequenced before their common join point, but not sequenced wrt each other.]")

Conceptually, race conditions in parallel programs can be eliminated by judicious insertion of synchronization, which makes the code "more sequential". Making the code fully sequential is just the ultimate or non-judicious form of inserting synchronization.

I did find this course, though, with a relevant quote from Dijkstra:

From the past terms such as "sequential programming" and "parallel programming" are still with us, and we should try to get rid of them, for they are a great source of confusion. They date from the period that it was the purpose of our programs to instruct our machines, now it is the purpose of the machines to execute our programs. Whether the machine does so sequentially, one thing at a time, or with considerable amount of concurrency, is a matter of implementation, and should not be regarded as a property of the programming language.

(I think Dijkstra is using first "parallel programming" and then "concurrency" but meaning the same concept both times. Which is a bit ironic/dissonant, because the section I linked to is all about explaining the difference between the concepts "parallel" and "concurrent".)

@masak
Copy link
Owner Author

masak commented May 9, 2023

I wanted to write down a kind of taxonomy about concepts like "nondeterministic", "probabilistic", "concurrent", "parallel", and "distributed" — these are all different concepts, and distinguishable ones, but I have the nagging feeling that people use them in inconsistent and/or overlapping ways.

It gets worse.

In this 1961 paper, John McCarthy introduces amb, an "ambiguity operator".

Ambiguous functions are not really functions. For each prescription of values to the arguments the ambiguous function has a collection of possible values.

In modern terminology, McCarthy is describing a multivalued function. (Which, by the red herring principle, isn't technically a kind of function, although the converse holds true.)

I decree that this usage of "ambiguous" is exactly what Floyd and Lamport mean when they say "nondeterministic". It's all about the ability to transition to a set of N states, instead of at most one state.

Paul Levy uses the term "erratic choice" in his dissertation (section 6.5), but I think this is simply the same as probabilistic choice.

The meaning of choose { ..., i.M_i, ... } is "choose some iI, then execute Mi".

I can imagine some people meaning the same thing by "distributed" as they mean by "parallel". Then again, there are other people who don't.

I thought I would be able to say something coherent about the difference between "concurrent programs" and "probabilistic programs", but it turns out that when I try, I can't.

@masak
Copy link
Owner Author

masak commented May 17, 2023

Before the #302 quote that launched this whole issue, there's another paragraph that name-drops Newton and Einstein, pointing to a similar shift from "absolute" to "relative" in models of computing:

Special relativity replaces Newton's concept of an absolute, unmoving space, showing that (at high speeds) observations are dependent on one's frame of reference. A similar revolution happened in mathematics (much more silently) when Groethendieck introduced his "relative point of view" (of which I won't say more since I'm not an expert). Quantum mechanics replaces Newtonian mechanics, showing that (at small scales) observations are always participatory and invasive into the observed system. All of these examples take a simple, "obvious" paradigm that's worked for a long time, and complicate it irreversibly in a way that turns out to be necessary. All of them, incidentally, complicate the simple model in just about the same way: by suggesting that there's a relational view, a non-solipsistic perspective, an "outside" or second party which cannot be ignored in the modeling. It's such a common trend that it almost feels a bit like a common failure mode of western science.

(The me of two years later agrees with all of the above, except perhaps "failure mode" as a choice of words — starting from the monistic and absolute is not a sign of failure, it's just a natural application of Occam's razor.)

I explore an aspect of that in this Bel issue comment, asking "what if calling a function also switches the interpreter"? The result, it seems to me, is the actor model.

Any number of normal metacircular interpreters, from LISP 1.5's half-page classic to Reynolds' "Definitional Interpreters" gem, are Newtonian/absolute in the sense that they don't switch the interpreter when going into the called function. The interpreter is outside the system being interpreted, same as in classical physics, there's a clean separation between observer and what's being observed.

Function calls in Dodo are not a single piece of code. Instead, they initiate a message-passing/session between two (possibly different) interpreters, which we can think of as actors. There's no "one true interpreter" orchestrating all that; there's just the caller's interpreter sending a message to the callee's interpreter. (One thing that immediately falls out of that is the possibility for asynchronous calls. We talked about coroutines/green threads earlier in this thread. This is similar but distinct; this is more like shared-nothing "true concurrency", something like what we'd call processes, perhaps. Async is the default between those, and synchronization takes extra work.)

I dunno, I just think that all this is fascinating. There really is a bigger story here. And I haven't even mentioned positive types and negative types about which I've learned a lot recently; they seem to follow the same kind of absolute/relative subdivision. (Positive types are like trees or abstract data types or syntax or least fixpoints. Negative types are like streams or actors or FSMs or graphs or greatest fixpoints.)

@masak
Copy link
Owner Author

masak commented May 27, 2023

I wanted to write down a kind of taxonomy about concepts like "nondeterministic", "probabilistic", "concurrent", "parallel", and "distributed" — these are all different concepts, and distinguishable ones, but I have the nagging feeling that people use them in inconsistent and/or overlapping ways.

Here's another stab at this nomenclatorial behemoth.

This time starting out from my own conception (correct or not) that "concurrent" means "several coroutines", and "parallel" means "several processes". Going to check that against jnthn's definition in his 2019 keynote:

Concurrency

Trying to get the right result when we have multiple, possibly competing, tasks with overlapping start/end times

We don't choose concurrency
Concurrency chooses us

Parallelism

Exploit multi-core hardware to do the same task, and deliver equivalent results, but in less wallclock time

Concurrency is part of the problem domain.
Parallelism is part of the solution domain.

With concurrency, correctness is domain specific.
With parallelism, correctness is just equivalence.

Looking closely at these definitions, I would say my definition of "concurrency" as "several coroutines" simply doesn't hold up. I mean, having several coroutines places us in a situation where we have "multiple possibly competing tasks", but here's a non-exhaustive list of non-coroutine scenarios that also do that:

  • Server/client communication
  • CSP
  • Kahn networks
  • Actors
  • Threads, for goodness sake
  • The user pressing a key as some JavaScript code is running
  • The next value of a stream showing up
  • Generator functions
  • Several read/write requests to the same database

Concurrency is part of the problem domain. We don't choose it, it chooses us. (Well, some of the above are implementation strategies, but still.)

I think I would summarize concurrency as "things happen at the same time". It's not really a kind of "instantaneous simultaneity", though — it's more like overlapping start/end times.

Parallelism is easier to pin down, and I think I already had a correct understanding of it. It's all about using multicore or multi-something else in order to reduce wallclock time. Yes, there are variants such as "data parallelism" and "task parallelism", but those are details.

@masak
Copy link
Owner Author

masak commented May 27, 2023

Found this on Wikipedia:

Concurrent data structures are significantly more difficult to design and to verify as being correct than their sequential counterparts.

This is true. Still, I wonder what it would be like to live in an alternative time line, where concurrency was assumed from the start, and all data structures were concurrent (thread-safe, etc.), otherwise they were considered broken.

As it is, we all kind of start with the unsafe ones because they are easy to think about and usually much more within reach (grabbing HashMap before even thinking about ConcurrentHashMap), and then we act all surprised when later, things go wrong and we get data races.

@masak
Copy link
Owner Author

masak commented Jun 16, 2023

While lurking around on nlab a while ago, I found this:

  • A partial function f: A → B is like a function from A to B except that f(x) may not be defined for every element x of A. (Compare a multi-valued function, where f(x) may have several possible values.)
  • A multi-valued function f: A → B is like a function from A to B except that there may be more than one possible value f(x) for a given element x of A. (Compare a partial-function, where f(x) may not exist at all.)

I want to say several things at once. (Heh.) First off, I think that this contrasting formulation was genuinely new to me — where previously I had been fully aware of partial functions and (separately) multi-valued functions, I had considered them completely separate models/problems, and had not drawn any parallels between them. I had seen them show up (separately) in many places in both PL theory and programming practice, but I hadn't considered them part of the same "continuum", as it were.

Secondly, both "partial function" and "multi-valued function" are instances of the red herring principle. To spell it out, a function maps exactly one element from the domain into an element in the codomain. This "exactly one" is the characteristic feature that make a certain small class of relations functions. But the "partial" in "partial function" changes "exactly one" to "zero or one", or "at most one", or "lone". A partial function is not a type of function, it's a widening of the concept of function (that is, all functions are (unintuitively) partial functions); it's a loosening of the characteristic constraint.

Exactly analogously, the "multi-valued" in "multi-valued function" changes "exactly one" to "any number of", "zero or more", or "set of". A multi-valued function is not a type of function, it's a widening of the concept of function (that is, all functions are (unintuitively) multi-valued functions, as was pointed out above); the "exactly one" in the original constraint gets re-imagined as a singleton set, which is a special case of all possible subsets of the codomain.

Thirdly, I now think I see how these two conceptual models (partial functions and multi-valued functions) keep popping up as language features. Just two examples:

  • Optional chaining is a way to manage failure; specifically to turn mindless nested "pyramid of doom"-style reasoning about possible failure at many points into a flat, linear access path that joins the rails into a monochromatic failure rail that can be handled uniformly at the end.
  • Imagine you're implementing a function to generate all the permutations of a list of values. You'll probably settle on some backtracking solution, maybe using recursion, or maybe some loop-and-stack solution. And then Curry comes along, and implements it like this:
perm :: [a] -> [a]
perm []     = []
perm (x:xs) = insert (perm xs)
 where insert ys     = x : ys
       insert (y:ys) = y : insert ys

Note there's no backtracking logic at all. Instead, the insert function has "overlapping clauses", in that for a list with at least one element, both clauses can match. Curry's semantics "takes all the options", non-deterministically. It doesn't really matter how it does it, and that's the point — this code is so beautiful because it only states what should be done, and it doesn't encode a backtracking tree search or anything. It's just naturally multi-valued/nondeterministic.

Somewhere else among the Alma issues I wrote about "changing the meaning of expressions/evaluation". It feels to me that in both these examples, that's what's going on here. There's a wrapper, enabling/encapsulating either partiality or multi-valuedness/nondeterminism, and then there are individual features inside which (like ?. or the overlapping clauses) tweak the behavior to take advantage of the altered evaluation.

I think I haven't seen it this clearly before, partly because the wrappers are not always very explicit. In JavaScript, there's like an invisible wrapper around the whole expression, or at least to the nearest coalescing ??. In Curry, well, I'm not sure exactly how the wrapping works, although I notice findAll from the standard Prelude can convert a nondeterministic computation into a list of results.

Fourthly, even though the partial case and the multi-valued case are "opposites", in the sense that they mostly care about different sides of "exactly one", it's also true that the multi-valued case subsumes/includes the partial case. However, I think that's theoretically interesting, but not always called out as relevant in a practical design.

@masak
Copy link
Owner Author

masak commented Aug 4, 2023

It's by making the side effects and the heap memory explicit ... that we can later isolate those things, and hide/encapsulate them inside something.

In recent years I've been thinking of 6model as "actor model ready". Do you get what I mean? Am I deluded? Have I mentioned / are you aware of Pony and ORCA?

@raiph Hat tip to Deny Capabilities for Safe, Fast Actors. I wasn't aware of this paper before, even though it's from 2015. There's a talk video, where the connection is made with Pony (which it isn't in the paper, from what I can see).

(Edit: This page on the pony website also mentions a blog post:)

If you are looking for a slightly more academic take, Adrian Colyer did an excellent write up of Deny Capabilities for Safe, Fast Actors. It’s an overview of Pony’s reference capabilities.

The idea of deny capabilities (and specifically, saying what aliased references are not allowed to do) seems both non-obvious and "obvious in retrospect" — as in, I think some variant of this might well be the way forward for the ever-vexing challenge of sharing mutable state while also preventing data races "by construction".

@masak
Copy link
Owner Author

masak commented Aug 4, 2023

So what is a nondeterministic algorithm? I make here a provisional attempt at a definition.

A transition relation describes the transitions that are allowed between states inside the system. If this relation is univalent (or functional), that is, from each state (and event) the relation maps to exactly one target state, then the algorithm described by this transition relation is deterministic. Whether or not the univalence restriction holds, the algorithm is nondeterministic.

I believe that's weird but true. Nondeterminism (as defined and used in CS) is not the opposite of determinism; rather, it's a generalization of it.

I stand by this definition, and I don't wish to change it. But (as mortals embedded in a single time line), our perspective on running nondeterministic algorithms is interesting.

This is where people, like Floyd, jump directly to backtracking.

Rather, [nondeterministic algorithms] are convenient representations of systematic search procedures.

So, people read "nondeterministic choice" in the algorithm, and their brain immediately codes up the corresponding Floyd-style backtracking search.

This is not exactly wrong, but it's a bit of a false equivalence. Instead, imagine this:

my choices = [];
for 1..8 -> row {
    my col = chooseCorrect(1..8);
    assertNoThreat(row, col, choices);    # implementation elided
}
print(choices);

This program makes 8 iterations, doesn't backtrack anywhere, and always prints one of the 92 solutions to the 8-queens puzzle. Its only minor flaw is that (as far as we know) it doesn't run on a universal Turing machine.

The secret is in the implementation of chooseCorrect: it picks a number out of 1..8 with uniformly-random probability, except that it never picks anything that would make assertNoThreats fail, in this iteration or any future one. If you will, you can think of chooseCorrect as infinitely lucky; via a process neatly hidden from view, it always makes good choices and never bad ones.

Or, you know, lest someone accuse you of not thinking 4th-dimensionally: imagine all possible runs of the program as a directed graph of states and transitions. (It's actually not that big. A couple of thousand states? 554249 15721 states.) Now prune this graph, removing any choices that always lead to assertNoThread failing. (551 states.) chooseCorrect makes uniform choices based on this graph, always picking a choice that wasn't pruned. Most of those uniform choices are "forced", in the sense that there is only one possible choice; it's only the first two-or-three choices that exercise any actual nondeterminism.

Yet another pons asinorum explanation: chooseCorrect "knows what is going to happen", seemingly for free. It can see the future, presented to it as an already-available thing, just like the AI elevators in The Hitch-Hiker's Guide to the Galaxy, or like the Predictor in Newcomb's problem.

That is what nondeterminism is about. I've heard this particular style of it called angelic,

from the Christian religious conventions of angels being benevolent and acting on behalf of an omniscient God.

Just like you can't embed a Klein bottle into 3-space, you can't embed an angelic-nondeterministic program into determinism-space. I believe this statement is ultimately equivalent to "P ≠ NP".

@masak
Copy link
Owner Author

masak commented Aug 4, 2023

...of course, this true nondeterminism, with a magical chooseCorrect acting as your personal Gladstone Gander, is closely connected to Floyd's backtracking.

Namely, the backtracking approach is the canonical way to compensate for not having a magical chooseCorrect. Instead, we have to guess, and (assuming P ≠ NP), we will guess wrong much of the time. Backtracking represents saying "oops, this was the wrong branched-universe to be in, let's back up and try another branch". It's an (exponential-time) "tax" we pay when imperfectly embedding a nondeterministic algorithm into our one-branch-at-a-time trial-and-error timeline.

@masak
Copy link
Owner Author

masak commented Aug 8, 2023

I've heard this particular style of it called angelic

Although whether your program is in the presense of an angel or a demon/adversary is, of course, domain-dependent.

I just came upon the quote from this paper again:

The possible instantiations of our generic framework include non-deterministic systems and probabilistic systems, but do not yet include systems with both non-deterministic and probabilistic branching. The importance of having both of these branchings in system verification has been claimed by many authors e.g. [48, 60], with an intuition that probabilistic branching models the choices “made by the system, i.e. on our side,” while (coarser) nondeterministic choices are “made by the (unknown) environment of the system, i.e. on the adversary’s side.” A typical example of such systems is given by probabilistic automata introduced by Segala [48].

In the case of chooseCorrect, a supernatural angel/oracle is helping you make choices which ultimately depend on as-yet unseen parts of the trace graph. In cases which are more like proofs or system-level game semantics, an adversary/opponent is literally trying to make choices that cause you to abort.

Whether you are playing against someone who is helping or hindering you depends on what phenomenon you are studying, I guess. In the case of nondeterministic search, the 92 solutions are what we are interested in, so it makes sense to have an oracle push you in that direction. In many other situations where a real world imposes itself whether we like it or not, it makes sense to model this real world as either non-aligned with our interests, or even actively against them.

@masak
Copy link
Owner Author

masak commented Sep 1, 2023

From You Don't Know JS: Async & Performance:

It's very common to conflate the terms "async" and "parallel," but they are actually quite different. Remember, async is about the gap between now and later. But parallel is about things being able to occur simultaneously.

The most common tools for parallel computing are processes and threads. Processes and threads execute independently and may execute simultaneously: on separate processors, or even separate computers, but multiple threads can share the memory of a single process.

An event loop, by contrast, breaks its work into tasks and executes them in serial, disallowing parallel access and changes to shared memory. Parallelism and "serialism" can coexist in the form of cooperating event loops in separate threads.

The interleaving of parallel threads of execution and the interleaving of asynchronous events occur at very different levels of granularity.

This distinction between "async" and "parallel" meshes well with what, in my head, I had as the distinction between "concurrent" and "parallel" (but I'll think of it as "async vs parallel" from now on): namely, there are two fundamental kinds of multitasking:

  • Cooperative multitasking: On a single event loop/scheduler, tasks get queued up and then handled in FIFO order. A built-in sense of "fairness" is necessary in this scheme, as a bad player might hog the spotlight and not let go, which causes other tasks to not get handled. Preemptive multitasking is a fix to this, but at some cost in the form of complexity — in preemprive multitasking, the scheduler gains the capability not just to make a task the "currently running task", but also to take them off that role by force.
  • Multiprocessing fundamentally means more than one event loop at the same time ("simultaneously"); this requires several cores, or several processors, or several network nodes, each doing computational work. In a way, even the term "event loop" now becomes a distraction, because it was introduced in the above point in order to get multitasking on a single CPU or core; here we get multitasking by having several.

This distinction also puts into perspective the difference between "green thread" and "thread"; the former is always handled by a scheduler, the latter has the opportunity to run on a separate core, for example.

@masak
Copy link
Owner Author

masak commented Oct 2, 2023

Modules

You know, it's funny. A module, at its barest definition, is a "managed namespace" — literally a set of statically known key/value bindings. But that also means that it's the "step up" that we're looking for, from individual functions (or other types of "behavior", including continuations) to groups of them.

Why would you want to group functions together? I think quite possibly it's a human thing. The computer doesn't much care — the code all gets bundled together in the end, and damn the abstraction boundaries. But when I code up (say) a red-black tree, then there are functions that belong to the red-black tree abstractions, and functions that don't. From that point of view, modules are bounded contexts. No-one's forcing me to; I want to. I like boundaries.

The perfect storm that was ES3 could do modules, but it did it with a function (acting as a "module constructor" and as a privacy boundary) and a returned object (acting as the "managed namespace" of public exports). I'm not saying that's the only way to do it; but seeing it done with as little as functions and objects makes you think, doesn't it?

I think the one who got forever associated with this idea was David Parnas, with his 1971 paper "On the criteria to be used in decomposing systems into modules". It's still a good text, even though it shows its age. tl;dr: Don't modularize according to the "flow chart" (control flow and/or data flow) of your projeact — modularize according to separation of concerns and "information hiding" (that is, the separation of non-public implementation details from public API details).

The ulterior motive when grouping my red-black tree functions together into a module isn't just that I like to group things that I think thematically go together. It's that they share a common representation of the red-black tree that they are operating on. Any change to that representation, or assumptions surrounding it, would affect those functions within the module boundary together, and no functions outside of the boundary.

Things like types and interfaces, contracts and invariants, further help to make explicit those assumptions — which can serve both the module author inside of the boundary, and module users outside of it. I still think something like GOSPEL should be made part of the "you have to be at least this tall to ride" criterion of publishing a module. What especially thrills me about it is that it does for static analysis and interaction what classes do for API/language — it creates a new level of abstraction on which to be absolutely precise.

@masak
Copy link
Owner Author

masak commented Oct 12, 2023

But as a baseline, the story must start with the status quo of shared memory, threads, and locks. I'll take all those mechanisms as given — there's oodles of literature on it — but I will mainly point out here that locks don't compose and threads don't scale.

Case in point: this blog post by Chris Wellons, Emacs 26 Brings Generators and Threads, spends most of its expository text about the new Emacs threads pointing out that they don't work:

ThreadSanitizer (TSan) quickly shows that Emacs’ threading implementation has many data races, making it completely untrustworthy. Until this is fixed, nobody should use Emacs threads for any purpose, and threads should disabled at compile time.

As of this writing, we're up to Emacs 29.1, so the situation might for all I know have improved. But the bigger question is why this kind of thing happens with thread implementations — Emacs is just an example here. My main point is — nobody sets out to write a broken thread implementation! And still it happens all over the place.

On the way back to lunch, I got a mental image of a Turing Machine with one tape and N (uncoordinated) read/write heads, all competing and sterically clashing to do their work on the single tape. That's threads. It's a fundamentally unworkable idea.

The solutions, while obvious, are more complicated, or have more moving parts, or require more maintenance.

  • Processes (or actors) are like N Turing machines each with their own tape. There's no conflict and there can be no conflict. Pretty quickly there is a need for sharing and coordination, though.

  • Ownership/borrowing/immutability/uniqueness solutions do more to track, statically, the read/write capabilities of some data and references to it, as well as who (uniquely) owns the data or has borrowed it. This clearly seems like a good idea, and leads to a nice "shared XOR mutable" state of affairs; it's just more work, kind of like how maintaining your static types and keeping them consistent is more work.

@masak
Copy link
Owner Author

masak commented Oct 17, 2023

On the way back to lunch, I got a mental image of a Turing Machine with one tape and N (uncoordinated) read/write heads, all competing and sterically clashing to do their work on the single tape. That's threads. It's a fundamentally unworkable idea.

As a further (damning) data point on this, check out the opening paragraph of this recent blog article by Rachel by the Bay:

There are more than a few bear traps in the larger Unix environment that continue to catch people off-guard. One of the perennial favorites is thread safety, particularly as it applies to the environment manipulation functions under glibc. The usual warning is that if you run multiple threads, you'd best not call setenv, because if someone else calls getenv, you run a decent chance of segfaulting.

"If you do threads, you'd best not write to shared memory." Wise advice; what's stupid is that we created the possibility in the first place.

@masak
Copy link
Owner Author

masak commented Oct 19, 2023

Functions provide a guarantee which will be cast in a sharper light when it is broken later. It doesn't have a name in the literature that I know of, so let's call it the dynamic embedding guarantee. It has two parts:

  • The entire machine has a single thread of control.
  • When a function is called, the single thread of control is transferred/borrowed from caller to callee. When the function returns, it is handed back.

I'm watching Kevlin Henney's 2017 talk Procedural Programming: It's Back? It Never Went Away. He mentions the book Software Architecture in Practice which outlines an architecture style called "Main program and subroutine" (in other words, procedural programming), and which has this quote:

There is typically a single thread of control and each component in the hierarchy gets this control (optionally along with some data) from its parent and passes it along to its children.

(Emphasis mine, not Henney's or the authors'.)

Henney says as an aside about the "single thread of control": "yeah, that's gonna come back and bite us later".

It doesn't mention the children handing control back (in LIFO order) to their parents. Which, fair enough, that's not always the case, I guess. There's such a thing as tail calls. Other than that, it's quite stunningly similar to what I described as the "dynamic embedding guarantee" above.

@masak
Copy link
Owner Author

masak commented Oct 19, 2023

Whoa. And then later, towards the end of that same talk, Henney quotes a paper by David Gelernter and Nicholas Carriero named Coordination Languages and their Significance:

We can build a complete programming model out of two separate pieces — the computation model and the coordination model.

That is, lambda calculus in the small, pi calculus (or actors) in the large.

I'd call that a pithy restatement of the point of this whole thread.

(Edit: Two further good, clarifying quotes from the same paper.)

The computation model allows programmers to build a single computational activity: a single-threaded, step-at-a-time computation.

The coordination model is the glue that binds separate activities into an ensemble.

That is, mutability + sharing are bad when too close together; but tease them apart into computation and (separately) coordination, and they become manageable.

(Another edit: I just feel I need to quote verbatim from the conclusions section, which again aligns with the goals of this whole thread.)

A broad research effort aimed at the development of general-purpose coordination languages is long overdue. The tangible result would be a tool of great power and significance. The intangible one would be a better understanding of the root problems of computer science. There appears to be an unspoken consensus in much of the research community that every twist and turn in the hardware development path, particularly where parallel machines or networks are concerned, calls for a new language or programming model, a new design, new implementation and new coding methods. In the long run, this approach is intellectually crippling. What are the fundamental questions here?

@masak
Copy link
Owner Author

masak commented Oct 31, 2023

On the topic of modules, one of the most significant bits in the the design of a module system is whether to allow cyclic imports. This even hit Alma's very own module issue, #53, and is in some sense still unresolved.

  • The case for forbidding cyclic imports: They make the module system more complicated. More specifically, the naive/straightforward idea of a module fully binding its dependencies before itself loading, doesn't quite work.
  • The case for allowing cyclic imports: Mother Nature doesn't care if your modules are mutually dependent or not. More precisely, such a mutual dependency will happen sooner or later (for example, I have one here and here). And yes, they can always be eliminated using enough architectural contortions and dependency injection and stuff. But at that point we're talking about punishing the user on behalf of the implementor.

As you can probably tell, I'm currently leaning towards allowing cycles. In a sense, allowing cycles between modules is tantamount to defining modules via letrec; and when you put it like that, it doesn't sound so bad.

@masak
Copy link
Owner Author

masak commented Nov 2, 2023

While lurking around on nlab a while ago, I found this:

  • A partial function f: A → B is like a function from A to B except that f(x) may not be defined for every element x of A. (Compare a multi-valued function, where f(x) may have several possible values.)
  • A multi-valued function f: A → B is like a function from A to B except that there may be more than one possible value f(x) for a given element x of A. (Compare a partial-function, where f(x) may not exist at all.)

A while after writing the above, I took the time to understand what nlab means by span, and how that ties into the understanding (or category-theoretical representation/implementation, perhaps) of partial functions and multivalued functions.

It's actually quite simple. No, really.

Start with a normal function, f: A → B; drawn as a cateogry, this looks like two objects A and B, with an arrow f going between them.

Now "split" this diagram into A ← D → B; the new object D in the middle (for "domain") is identical with A for the time being, and so the "→" arrow is the same f as before. The "←" arrow is an identity, since A and D are equal; being an identity, this arrow is both surjective and injective by definition. We have turned the thing into a span, but we've also effectively done nothing.

Now we're set up to implement partial functions and multivalued functions:

  • Partial function: by making A bigger than D — or perhaps better said, by making the A ← D arrow an embedding — we're saying that the span represents a function that only maps some elements of A (namely, those mapped to from D, the domain) to B. In other words, a partial function. A ← D is no longer surjective.
  • Multi-valued function: by making the A ← D arrow no-longer-injective — in other words, we allow distinct D elements to map to the same A element — we're saying that the span represents a function that maps certain elements of A to more than one element of B. In other words, a multi-valued function.

Sweet. I particularly like how we create a new backwards arrow from nothing, and it is then "tuned" to successfully explain the otherwise-baffling partial and multivalued functions.

Maybe a good way to say it is that the "→" arrow corresponds to a normal function, while the new factored-out "←" arrow corresponds to a kind of "multiplicity" knob we can tweak.

@masak
Copy link
Owner Author

masak commented Nov 13, 2023

Actors similarly take a new primitive operation: (asynchronous) messaging. Actors are by construction so well-protected from each other that not only is the "shared mutable memory" problem completely defined away — but by themselves, actors don't seem to be able to synchronize anything between themselves. I'm still learning about this, but one solution seems to be synchronizers, which could maybe be seen as a common region in which synchronization on shared resources between actors can happen.

Allow me to throw in here, without further comment, the first paragraph of the abstract of the paper When Concurrency Matters: Behaviour-Oriented Concurrency which I just found:

Expressing parallelism and coordination is central for modern concurrent programming. Many mechanisms exist for expressing both parallelism and coordination. However, the design decisions for these two mechanisms are tightly intertwined. We believe that the interdependence of these two mechanisms should be recognised and achieved through a single, powerful primitive. We are not the first to realise this: the prime example is actor model programming, where parallelism arises through fine-grained decomposition of a program's state into actors that are able to execute independently in parallel. However, actor model programming has a serious pain point: updating multiple actors as a single atomic operation is a challenging task.

@masak
Copy link
Owner Author

masak commented Nov 25, 2023

Inserting a small reminder here to myself, to later expand this comment into a small discussion about progress indicators, which are a nice illustration of... something. I guess independent processes (UI and task) with bidirectional communication.

As part of researching this, I also found a concept by Donald Norman called Gulf of Execution, a kind of reification of the lack of insight the user has of what the system is currently doing. That feeling when you hit a submit button and... nothing. Is the page submitting my form data now, or did my click accidentally miss the button? There's no feedback, so there's no way to know.

There's also the labor illusion, which arises from the UI and the task being two disconnected activities. The psychological feeling of seeing the progress bar move (either in real percentage or some indeterminate animation) is broken the moment we understand that the task will not progress any more, but the UI claims it will, or does. A bad or unreliable "remaining time" estimate is like a watered-down version of this.

@raiph
Copy link

raiph commented Jan 6, 2024

Hi, and happy 2024,

I've been disciplined about remaining in pure lurk mode until I'm confident of contributing with a healthy signal-to-not-even-a-tangent ratio, so I will not mention stuff such as concurrency, actors / independent processes with communication, in case I accidentally mention other stuff like what relationship there is, if any, between things like the busy beaver game, unbounded indeterminacy, and quantum indeterminacy.

Instead I am convinced you are exactly the right person, and this is exactly the right thread to be in, to ask that, if you have time this year, please consider reading and doing justice to what is being discussed in this article that was published yesterday: scheme modules vs whole-program compilation: fight, with commentary like this:

This is an annoying result! What do other languages do? Well, mostly they aren’t programmable, in the sense that they don’t have macros. There are some ways to get programmability using e.g. eval in JavaScript, but these systems are not very amenable to “offline” analysis of the kind needed by an ahead-of-time compiler.

For those declarative languages with macros, Scheme included, I understand the state of the art is to expand module-by-module and then stitch together the results of expansion later, using a kind of link-time optimization. You visit a module’s definitions twice: once to evaluate them while expanding, resulting in live definitions that can be used by further syntax expanders, and once to residualize an abstract syntax tree, which will eventually be spliced into the compilation unit.

@masak
Copy link
Owner Author

masak commented Jan 8, 2024

Happy 2024, @raiph.

what is being discussed in this article that was published yesterday: scheme modules vs whole-program compilation: fight

Interesting. I will read. And yes, that does feel relevant here to this thread, maybe even to #302.

Will get back to you once I've read the article.

@masak
Copy link
Owner Author

masak commented Jan 8, 2024

On the way back to lunch, I got a mental image of a Turing Machine with one tape and N (uncoordinated) read/write heads, all competing and sterically clashing to do their work on the single tape. That's threads. It's a fundamentally unworkable idea.

I'm not the first one to rage against threads, of course. How about, for example, this slide deck Why Threads Are A Bad Idea by John Ousterhout from nineteen ninety five? (Also summarized here.)

Slides are two a page, so on page 3/slide 6, after some contextualization of threads, you finally find out why they are a bad idea:

  • Synchronization:
    • Must coordinate access to shared data with locks.
    • Forget a lock? Corrupted data.

"...one tape and N (uncoordinated) read/write heads, all competing and sterically clashing to do their work..."

  • Deadlock:
    • Circular dependencies among locks
    • Each process waits for some other process: system hangs.

"Use locks, they said. Your threads will be fine if you just use locks, they said." How utterly unfair! Not only do locks, as their primary job, cancel out the concurrency that was the entire point in the first place; there are also ample opportunities to simply hold them wrong and introduce new and painful ways to shoot yourself in the foot. As I wrote in an earlier comment:

locks don't compose and threads don't scale

It doesn't have to be this way. Threads and locks are just poor building blocks. Take somehing utterly comparable, Kahn networks, and see them rush away towards the horizon, accelerating all the while, leaving threads and locks in the dust. Kahn networks scale. Kahn himself is highly aware of this, as seen in his final words in the article:

Our last conclusion is to recall a principle that has been so often fruitful in Computer Science and that is central in Scott’s theory of commutation: a good concept is one that is closed

  1. under arbitrary composition
  2. under recursion.

This (well-earned) pride in how well Kahn networds compose still haunts me. How can we make sure to do more of that, instead of investing in the castle built in a swamp that is locks and threads?

@masak
Copy link
Owner Author

masak commented Jan 22, 2024

So what is a nondeterministic algorithm? I make here a provisional attempt at a definition.

A transition relation describes the transitions that are allowed between states inside the system. If this relation is univalent (or functional), that is, from each state (and event) the relation maps to exactly one target state, then the algorithm described by this transition relation is deterministic. Whether or not the univalence restriction holds, the algorithm is nondeterministic.

The inimitable Stephen Wolfram writes about something very similar which he calls multicomputation. The rough idea being this: when we deal with deterministic algorithms, running them is an unambiguous, stepwise activity. But when we deal with a nondeterministic algorithm, the interesting thing is the entire global transition graph — since, almost by definition, most of that graph constitutes the road not taken in any particular run.

Put a bit more simply or poetically, deterministic computation is narrow, but nondeterministic "multicomputation" is wide. There are things going on which involve choices of paths. Things such as winning a chess endgame, or angelic nondeterminism, take as their starting point that we can make informed choices now based on the desired result we want later.

@masak
Copy link
Owner Author

masak commented Jan 24, 2024

I find I keep thinking about concurrency. I just stumbled over Wikipedia's article about it, with a surprisingly cogent first two paragraphs, which I will quote verbatim (stripped of links and footnotes, though):

In computer science, concurrency is the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or in partial order, without affecting the outcome. This allows for parallel execution of the concurrent units, which can significantly improve overall speed of the execution in multi-processor and multi-core systems. In more technical terms, concurrency refers to the decomposability of a program, algorithm, or problem into order-independent or partially-ordered components or units of computation.

According to Rob Pike, concurrency is the composition of independently executing computations, and concurrency is not parallelism: concurrency is about dealing with lots of things at once but parallelism is about doing lots of things at once. Concurrency is about structure, parallelism is about execution, concurrency provides a way to structure a solution to solve a problem that may (but not necessarily) be parallelizable.

Several takeaways, in some order:

  • The description vibes well with jnthn's description (reproduced above), that "concurrency is in the problem domain, parallelism is in the solution domain". Also Russ Cox's piercing comment that concurrency "is interesting for reasons not of efficiency but of clarity", that is, it's the decomposition that matters here; a kind of design/analysis of the problem.
  • The whole thing reminds me of samefringe. Summarizing that problem, quickly: you have two binary trees. Define a tree's fringe as the sequence of its leaves in traversal order. We want to find out if the two tree's fringes are equal. A non-lazy solution will just generate two lists in memory, and compare them elementwise. A lazy solution can do traversal (of both trees) and comparison in an interleaved fashion, using less memory and break earlier upon finding a difference. But the beauty is indeed in the decomposition: the two traversals, and the task that governs them and does that equality comparison. A lazy language like Haskell will have a solution that is basically two lines of code, and which to the untrained eye looks sequential (because laziness/concurrency is baked in). In C#, the concurrency is moderated via the IEnumerable interface, and operators like Zip and All. Whereas Ada... seems to throw reams of code at the problem, involving tasks and whatnot, but in the end still not get a solution that stops at the first detected difference.
  • Finally, this put me in mind of Applicative vs Monad from Haskell. Conor McBride has a good description here (section 4.7):

I came to Haskell from Standard ML, where I was used to writing programs in direct style even if they did naughty things like throw exceptions or mutate references. What was I doing in ML? Working on an implementation of an ultra-pure type theory (which may not be named, for legal reasons). When working in that type theory, I couldn’t write direct-style programs which used exceptions, but I cooked up the applicative combinators as a way of getting as close to direct style as possible.

When I moved to Haskell, I was horrified to discover the extent to which people seemed to think that programming in pseudo-imperative do-notation was just punishment for the slightest semantic impurity (apart, of course, from non-termination). I adopted the applicative combinators as a style choice (and went even closer to direct style with "idiom brackets") long before I had a grasp of the semantic distinction, i.e., that they represented a useful weakening of the monad interface. I just didn’t (and still don’t) like the way do-notation requires fragmentation of expression structure and the gratuitous naming of things.

That’s to say, the same things that make functional code more compact and readable than imperative code also make applicative style more compact and readable than do-notation. I appreciate that ApplicativeDo is a great way to make more applicative (and in some cases that means faster) programs that were written in monadic style that you haven’t the time to refactor. But otherwise, I’d argue applicative-when-you-can-but-monadic-when-you-must is also the better way to see what’s going on.

Pure expressions are tree-like, and the data dependencies are more like the "partial order" mentioned in the Wikipedia article. (That is, there are usually many possible ways to topologically sort and execute the individual subexpressions; as long as everything's pure, the result comes out the same.) This expresses a kind of concurrency, a decomposition of the problem where "happens-before" relationships do not appear exactly everywhere. But in monads and strictly sequential computation, they do — the Applicative viewpoint represents an kind of loosening-up of the strict sequentiality, which can lead both to more beautiful decomposition of subproblems, and perhaps even to opportunities for optimization.

@jnthn
Copy link

jnthn commented Jan 24, 2024

Which a succinct description can't be expected to get into subtleties, the final clause of this:

In computer science, concurrency is the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or in partial order, without affecting the outcome.

Is glossing over the heart of why I say concurrency is part of the problem domain: how do we define an outcome as "not affected"? That's very much domain specific.

When I used to teach on these topics, I had an example involving a data structure where we first applied locking to all of it, and then switched to fine-grained locking to get better throughput on updates, and then I sprung the question: is this still correct? It was a trick question: the problem specification wasn't actually clear either way on whether the partially applied update that we'd now made observable was problematic or not.

@masak
Copy link
Owner Author

masak commented Jan 25, 2024

how do we define an outcome as "not affected"? That's very much domain specific.

Paraphrasing you in order to understand better myself, here's what I got:

Yesterady I happened to read about trace monoids (aka dependency graphs, aka history monoids), which are like regular strings except that some pairs of adjacent characters are "loose", commuting with each other freely in zero or more substrings of the string. (Other parts are "fixed", and work like regular substrings.) We define equivalence on traces so that two traces with the same fixed sections, and loose sections which can be permuted into each other, are considered equal. Seen differently, we "quotient away" all the commuting that can go on, and consider each big chunk of inter-commutable substrings as a single individual thing (and the fixed parts a boring special case) — comparison then becomes a matter of directly comparing such things.

The choice is in which things get to commute. ("Things"? They're characters when the traces are presented as strings, but more realistically we're talking about observable events. Hm, event structures seem a slightly distinct thing from trace monoids.) There's a spectrum of sorts between "nothing commutes" (and our trace degenerates to a string) and "everything commutes" (and our trace degenerates to a Bag, I guess). But the spectrum is not a line; it has some interesting lattice-like structure.

and then switched to fine-grained locking to get better throughput on updates, and then I sprung the question: is this still correct? It was a trick question: the problem specification wasn't actually clear either way on whether the partially applied update that we'd now made observable was problematic or not.

By switching to fine-grained locking, some new pairs of events get to commute in the trace monoid. Is this still correct? It depends if we were fine with those events commuting. Were we fine with that? That's a question of specification, and most projects do not have a formal specification process (at least not to that level of detail), and do not ask that question before it arises in practice. "We don't quite know what we want" is the norm, or rather, formal specifications are considered to be for NASA and Boeing, but not for most of the rest of us. Certainly not a random project choosing between a ConcurrentHashMap and a synchronized HashMap.

@masak
Copy link
Owner Author

masak commented Apr 22, 2024

Now we're set up to implement partial functions and multivalued functions:

  • Partial function: by making A bigger than D — or perhaps better said, by making the A ← D arrow an embedding — we're saying that the span represents a function that only maps some elements of A (namely, those mapped to from D, the domain) to B. In other words, a partial function. A ← D is no longer surjective.

  • Multi-valued function: by making the A ← D arrow no-longer-injective — in other words, we allow distinct D elements to map to the same A element — we're saying that the span represents a function that maps certain elements of A to more than one element of B. In other words, a multi-valued function.

This may or may not have been obvious when I wrote the above, but now I think think the above is about trying to invert a function.

  • If the function f is bijective, its inverse f^{-1} is a (bijective) function
  • If the function f fails to be surjective, its inverse is a partial function
  • If the function f fails to be injective, its inverse is a multi-valued function

Because the red herring principle applies in the latter two cases, what we're saying is that inverting f didn't quite give us a function; the inverse is partial to the extent f fails to be surjective, and multi-valued to the extent f fails to be injective.

@masak
Copy link
Owner Author

masak commented Apr 25, 2024

Coming back to Pony and ORCA:

Skimming the linked page, it sounds amazing, but it's also extremely short on specifics. I would be a thousand times less doubtful of the claims (sound + complete + concurrent) if there was a link to a paper containing algorithms and a number of proofs. Several questions arise, such as:

  • How can a dedicated actor collect other actors using just message-passing?
  • How, specifically, are the very different form factors of reference counting (which is usually centralized) and actor distribution (which has no built-in synchronization mechanism) reconciled?
  • When the word "transitively" is used, that implies a non-local property that needs checking in non-constant time. How is that property checked in a world where a number of synchronization mechanisms are proudly enumerated as not being used as the solution? (I.e. once you have traversed the object graph in order to confirm the "transitively blocked" property, how do you know that the nodes you checked at the start are still unchanged? "Blocked" implies "has no pending messages in its queue" — what prevents new messages from arriving anywhere in the graph while we are traversing it to assert all actors have empty queues?)

I just found this grimoire of memory management techniques, and Pony+ORCA are number 11 on the list:

11: Tracing garbage collection is familiar to all of us, but there's a surprising twist: there's a secret way to make a garbage collector without the stop-the-world pauses! Pony does this: by separating each actor into its own little world, each actor can do its own garbage collection without stopping the other ones. Its ORCA mechanism then enables sharing data between the worlds via an interesting reference counting message passing mechanism.

This is reassuring (especially as it's written by someone who seems to know what he's talking about). I will try to find out more about how, specifically, Pony achieves this.

@masak
Copy link
Owner Author

masak commented May 8, 2024

I found another quote by Dijkstra, which talks about parallelism, nondeterminism, and concurrency. (It's in an EWD which criticizes Ada, then called "GREEN".)

When they write (RAT p.1-25) "It was also felt that whereas _nondeterminism_ _is_ _the_ _essence_ _of_ _parallelism_, its introduction for sequential programming would not appear natural to most users." (my underlining) they reflect several misunderstandings in a single sentence. First of all they ignore that it has been recognized already several years ago that nondeterminism and concurrency are separate issues in the sense that both nondeterministic programming languages whose obvious implementation does not introduce concurrency and deterministic programming languages whose implementation obviously admits concurrency are not only quite conceivable, but even worth our attention. Secondly they confuse in their appeal to what seems "natural to most users" the notion "convenient" with the notion "conventional".

Two takeaways:

  • Dijkstra's main point is that the deterministic/nondetermistic axis is orthogonal from the non-concurrent/concurrent axis. That point is well taken, and I agree.
  • Dijkstra says "parallelism" when he literally quotes the GREEN rationale, and then consistently uses the word "concurrency" himself. I would write that off as confusion or coincidence, but he did exactly the same in that other quote I found. To me, the most likely hypothesis is that Dijkstra thinks of "parallelism" and "concurrency" as identical concepts, but he prefers to use the term "concurrency" himself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants