ludd - User contributions [en]

Elixir/Ports and external process wiring

2025-10-24T11:52:38Z

Adamw: /* Asynchronous call and communication */ link to doc heading

A deceivingly simple programming adventure veers unexpectedly into piping and signaling between unix processes.

== Context: controlling "rsync" ==
{{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}}

My exploration begins while writing a beta-quality library for Elixir to transfer files in the background and monitor progress using rsync.

I was excited to learn how to interface with long-lived external processes—and this project offered more than I hoped for.

{{Aside|text=<p>[[w:rsync|Rsync]] is the standard utility for file transfers, locally or over a network. It can resume incomplete transfers and synchronize directories efficiently, and after almost 30 years of usage rsync can be trusted to handle any edge case.</p>
<p>BEAM<ref>The virtual machine shared by Erlang, Elixir, Gleam, Ash, and so on: [https://blog.stenmans.org/theBeamBook/ the BEAM Book]</ref> is a fairly unique ecosystem in which it's not considered deviant to reinvent a rounder wheel: an external dependency like "cron" will often be ported into native Erlang—but the complexity of rsync and its dependence on a matching remote daemon makes it unlikely that it will be rewritten any time soon, which is why I've decided to wrap external command execution in a library.</p>}}

[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|300x300px]]

=== Naïve shelling ===

Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
System.shell("rsync -a source target")
</syntaxhighlight>
This has a few shortcomings, starting with how one would pass it dynamic paths. It's unsafe to use string interpolation (<code>"#{source}"</code> ): consider what could happen if the filenames include unescaped whitespace or special shell characters such as ";".

=== Safe path handling ===
We turn next to <code>System.cmd</code>, which takes a raw argv and can't be fooled special characters in the path arguments:<syntaxhighlight lang="elixir">
System.find_executable(rsync_path)
|> System.cmd([~w(-a), source, target])
</syntaxhighlight>For a short job this is perfect, but for longer transfers our program loses control and observability, waiting indefinitely for a monolithic command to return.

=== Asynchronous call and communication ===
To run a external process asynchronously we reach for Elixir's low-level <code>Port.open</code>, nothing but a one-line wrapper<ref>See the [https://github.com/elixir-lang/elixir/blob/809b035dccf046b7b7b4422f42cfb6d075df71d2/lib/elixir/lib/port.ex#L232 port.ex source code]</ref> passing its parameters directly to ERTS <code>open_port</code><ref>[https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2 Erlang <code>open_port</code> docs]</ref>. This function is tremendously flexible, here we turn a few knobs:<syntaxhighlight lang="elixir">
Port.open(
{:spawn_executable, rsync_path},
[
:binary,
:exit_status,
:hide,
:use_stdio,
:stderr_to_stdout,
args:
~w(-a --info=progress2) ++
rsync_args ++
sources ++
[args[:target]],
env: env
]
)
</syntaxhighlight>

{{Aside|text=
'''Rsync progress reporting options'''

There are a variety of ways to report progress:

; <code>-v</code> : list each filename as it's transferred

; <code>--progress</code> : report statistics per file

; <code>--info=progress2</code> : report overall progress

; <code>--itemize-changes</code> : list the operations taken on each file

; <code>--out-format=FORMAT</code> : any custom format string following rsyncd.conf's <code>log format</code><ref>[https://man.archlinux.org/man/rsyncd.conf.5#log~2 rsyncd.conf log format] docs</ref>
}}

Rsync outputs <code>--info=progress2</code> lines like so:<syntaxhighlight lang="text">
overall percent complete time remaining
bytes transferred | transfer speed |
| | | |
3,342,336 33% 3.14MB/s 0:00:02
</syntaxhighlight>

The controlling Port captures these lines is sent to the library's <code>handle_info</code> callback as <code>{:data, line}</code>. After the transfer is finished we receive a conclusive <code>{:exit_status, status_code}</code> message.

As a first step, we extract the overall_percent_done column and flag any unrecognized output:
<syntaxhighlight lang="elixir">
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
percent_done_text when percent_done_text != nil <- Enum.at(terms, 1),
{percent_done, "%"} <- Float.parse(percent_done_text) do
percent_done
else
_ ->
{:unknown, line}
end
</syntaxhighlight>The <code>trim</code> is lifting more than its weight here: it lets us completely ignore spacing and newline trickery—and ignores the leading carriage return before each line, seen in the rsync source code:<ref>[https://github.com/RsyncProject/rsync/blob/797e17fc4a6f15e3b1756538a9f812b63942686f/progress.c#L129 rsync/progress.c] source code</ref>
<syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>Carriage return <code>\r</code> deserves special mention: this is the first "control" character we come across and it looks the same as an ordinary byte in the binary data coming over the pipe from rsync, similar to newline <code>\n</code>. Its normal role is to control the terminal emulator, rewinding the cursor so that the current line can be overwritten! And like newline, carriage return can be ignored. Control signaling is exactly what goes haywire about this project, and the leaky category distinction between data and control seems to be a repeated theme in inter-process communication. The reality is not so much data vs. control, as it seems to be a sequence of layers like with [[w:OSI model|networking]].

{{Aside|text=
[[File:Chinese typewriter 03.jpg|right|200x200px]]

On the terminal, rsync progress lines are updated in place by beginning each line with a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>. Try this command in a terminal:<syntaxhighlight lang="shell">
# echo "three^Mtwo"
twoee
</syntaxhighlight>
You'll have to use <control>-v <control>-m to type a literal carriage return, copy-and-paste won't work.

The character is named after the pushing of a physical typewriter carriage to return to the beginning of the current line without feeding the roller to a new line.

[[File:Baboons Playing in Chobe National Park-crlf.jpg|left|300x300px|Three young baboons playing on a rock ledge. Two are on the ridge and one below, grabbing the tail of another. A meme font shows "\r", "\n", and "\r\n" personified as each baboon.]]
[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreement about carriage return]] vs. line feed has caused eye-rolling since the dawn of personal computing.
}}

== OTP generic server ==
The Port API is convenient enough so far, but Erlang/OTP really starts to shine once we wrap each Port connection under a <code>gen_server</code><ref>[https://www.erlang.org/doc/apps/stdlib/gen_server.html Erlang gen_server docs]</ref> module, giving us several properties for free: A dedicated application thread coordinates with its rsync process independent of anything else. Input and output are asynchronous and buffered, but handled sequentially in a thread-safe way. The gen_server holds internal state including the up-to-date completion percentage. And the caller can request updates as needed, or it can listen for push messages with the parsed statistics.

This gen_server is also expected to run safely under an OTP supervision tree<ref>[https://adoptingerlang.org/docs/development/supervision_trees/ "Supervision Trees"] chapter from [https://adoptingerlang.org/ Adopting Erlang]</ref> but this is where our dream falls apart for the moment. The Port already watches for rsync completion or failure and reports upwards to its caller, but we fail at the critical property of being able to propagate a termination downwards to shut down rsync if the calling code or our library module crashes.

== Problem: runaway processes ==
[[File:CargoNet Di 12 Euro 4000 Lønsdal - Bolna.jpg|thumb]]
The unpleasant real-world consequence is that rsync transfers will continue to run in the background even after Elixir kills our gen_server or shuts down, because the BEAM has no way of stopping the external process.

It's possible to find the operating system PID of the child process with <code>Port.info(port, :os_pid)</code> and send it a signal by shelling out to unix <code>kill PID</code>, but BEAM doesn't include built-in functions to send a signal to an OS process, and there is an ugly race condition between closing the port and sending this signal. We'll keep looking for another way to "link" the processes.

To debug what happens during <code>port_close</code> and to eliminate variables, I tried spawning <code>sleep 60</code> instead of rsync and I found that it behaves in exactly the same way: hanging until <code>sleep</code> ends naturally regardless of what happened in Elixir or whether its pipes are still open. This happens to have been a lucky choice as I learned later: "sleep" is daemon-like so similar to rsync, but its behavior is much simpler to reason about.

== Bad assumption: pipe-like processes ==
A pipeline like <code>gzip</code> or <code>cat</code> it built to read from its input and write to its output. We can roughly group the different styles of command-line application into "pipeline" programs which read and write, "interactive" programs which require user input, and "daemon" programs which are designed to run in the background. Some programs support multiple modes depending on the arguments given at launch, or by detecting the terminal using <code>isatty</code><ref>[https://man.archlinux.org/man/isatty.3.en docs for <code>isatty</code>]</ref>. The BEAM is currently optimized to interface with pipeline programs and it assumes that the external process will stop when its "standard input" is closed.

A typical pipeline program will stop once it detects that input has ended, for example by calling <code>read</code><ref>[https://man.archlinux.org/man/read.2 libc <code>read</code> docs]</ref> in a loop:<syntaxhighlight lang="c">
size_read = read (input_desc, buf, bufsize);
if (size_read < 0) { error... }
if (size_read == 0) { end of file... }
</syntaxhighlight>

If the program does blocking I/O, then a zero-byte <code>read</code> indicates the end of file condition. A program which does asynchronous I/O with <code>O_NONBLOCK</code><ref>[https://man.archlinux.org/man/open.2.en#O_NONBLOCK O_NONBLOCK docs]</ref> might instead detect EOF by listening for the <code>HUP</code> hang-up signal which is can be arranged (TODO: document how this can be done with <code>prctl</code>, and on which platforms).

But here we'll focus on how processes can more generally affect each other through pipes. Surprising answer: without much effect! You can experiment with the <code>/dev/null</code> device which behaves like a closed pipe, for example compare these two commands:

<syntaxhighlight lang="shell">
cat < /dev/null

sleep 10 < /dev/null
</syntaxhighlight><code>cat</code> exits immediately, but <code>sleep</code> does its thing as usual.

You could do the same experiment by opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file. Interestingly, what happened here is that <control>-d is interpreted by bash which responds by closing its pipe connected to standard input of the child process. This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe. My entry point to learning more is this stty webzine<ref>[https://wizardzines.com/comics/stty/ ★ wizard zines ★: stty]</ref> by Julia Evans. Dump information about your own terminal emulator: <code>stty -a</code>

Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a hard reality. A program with a chaotic disposition could even reopen stdin after it was closed and connect it to something else, to the great surprise of friends and neighbors.

Back to the problem at hand, "rsync" is in the category of "daemon-like" programs which will carry on even after standard input is closed. This makes sense enough, since rsync isn't interactive and any output is just a side effect of its main purpose.

== Shimming can kill ==
A small shim can adapt a daemon-like program to behave more like a pipeline. The shim is sensitive to stdin closing or SIGHUP, and when this is detected it converts this into a stronger signal like SIGTERM which it forwards to its own child. This is the idea behind a suggested shell script<ref>[https://hexdocs.pm/elixir/1.19.0/Port.html#module-orphan-operating-system-processes Elixir Port docs showing a shim script]</ref> for Elixir, and the <code>erlexec</code><ref name=":0">[https://hexdocs.pm/erlexec/readme.html <code>erlexec</code> library]</ref> library. The opposite adapter can be found in the [[w:nohup|nohup]] shell command and the grimsby<ref>[https://github.com/shortishly/grimsby <code>grimsby</code> library]</ref> library: these will keep standard in and/or standard out open for the child process even after the parent exits, so that a pipe-like program can behave more like a daemon.

I used the shim approach in my rsync library and it includes a small C program<ref>[https://gitlab.com/adamwight/rsync_ex/-/blob/main/src/main.c?ref_type=heads rsync_ex C shim program]</ref> which wraps rsync and makes it sensitive to BEAM <code>port_close</code>. It's featherweight, leaving pipes unchanged as it passes control to rsync, here are the business parts:<syntaxhighlight lang="c">// Set up a fail-safe to self-signal with HUP if the controlling process dies.
prctl(PR_SET_PDEATHSIG, SIGHUP);</syntaxhighlight><syntaxhighlight lang="c">
void handle_signal(int signum) {
if (signum == SIGHUP && child_pid > 0) {
// Send the child TERM so that rsync can perform clean-up such as shutting down a remote server.
kill(child_pid, SIGTERM);
}
}
</syntaxhighlight>

== Reliable clean up ==
{{Project|status=in review|url=https://erlangforums.com/t/open-port-and-zombie-processes|source=https://github.com/erlang/otp/pull/9453}}
It's always a pleasure to ask questions in the BEAM communities, they deserve their reputation for being friendly and open. The first big tip was to look at the third-party library <code>erlexec</code><ref name=":0" />, which demonstrates emerging best practices which could be backported into the language itself. Everyone speaking on the problem generally agrees that the fragile clean up of external processes is a bug, and supports the idea that some flavor of "terminate" signal should be sent to spawned programs when the port is closed.
[[File:Itinerant glassworker exhibition with spinning wheel and steam engine.jpg|thumb]]
I would be lying to hide my disappointment that the required core changes are mostly in an auxiliary C program and not written in Erlang or even in the BEAM itself, but it was still fascinating to open such an elegant black box and find the technological equivalent of a steam engine inside. All of the futuristic, high-level features we've come to know actually map closely to a few scraps of wizardry with ordinary pipes<ref>[https://man.archlinux.org/man/pipe.7.en Overview of unix pipes]</ref>, using libc's pipe<ref>[https://man.archlinux.org/man/pipe.2.en Docs for the <code>pipe</code> syscall]</ref>, read, write, and select<ref>[https://man.archlinux.org/man/select.2.en libc <code>select</code> docs]</ref>.

Port drivers<ref>[https://www.erlang.org/doc/system/ports.html Erlang ports docs]</ref> are fundamental to ERTS, and several levels of port wiring are involved in launching external processes: the spawn driver starts a forker driver which sends a control message to <code>erl_child_setup</code> to execute your external command. Each BEAM has a single erl_child_setup process to watch over all children. This architecture reflects the Supervisor paradigm and we can leverage it to produce some of the same properties: the subprocess can buffer reads and writes asynchronously and handle them sequentially; and if the BEAM crashes then erl_child_setup can detect the condition and do its own cleanup.

Letting a child process outlive its controlling process leaves the child in a state called "orphaned" in POSIX, and the standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists. This can be seen as undesirable because unix itself has a paradigm similar to OTP's Supervisors, in which each parent is responsible for its children. Without supervision, a process could potentially run forever or do naughty things. The system <code>init</code> process starts and tracks its own children, and can restart them in response to service commands. But init will know nothing about adopted, orphan processes or how to monitor and restart them.

The patch [https://github.com/erlang/otp/pull/9453 PR#9453] adapting port_close to SIGTERM is waiting for review and responses look generally positive so far.

{{Aside|text='''Which signal?'''

Which signal to use is still an open question:

; <code>HUP</code> : sent to a process when its standard input stream is closed<ref>[https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap11.html#tag_11_01_10 POSIX standard "General Terminal Interface: Modem Disconnect"]</ref>

; <code>TERM</code> : has a clear intention of "kill this thing" but still possible to trap at the target and handle in a customized way

; <code>KILL</code> : bursting with destructive potential, this signal cannot be stopped and you may not clean up

There is a refreshing diversity of opinion, so it could be worthwhile to make the signal configurable for each port.
}}

== TODO: consistency with unix process groups ==

... there is something fun here about how unix already has process tree behaviors which are close analogues to a BEAM supervisor tree.

== Future directions ==
Discussion threads also included some notable grumbling about the Port API in general, it seems this part of ERTS is overdue for a larger redesign.

There's a good opportunity to unify the different platform implementations: Windows lacks the erl_child_setup layer entirely, for example.

Another idea to borrow from the erlexec library is to have an option to kill the entire process group of a child, which is shared by any descendants that haven't explicitly broken out of its original group. This would be useful for managing deep trees of external processes launched by a forked command.

== References ==

Elixir/Ports and external process wiring

2025-10-24T11:50:00Z

Adamw: /* Reliable clean up */ fix link syntax

A deceivingly simple programming adventure veers unexpectedly into piping and signaling between unix processes.

== Context: controlling "rsync" ==
{{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}}

My exploration begins while writing a beta-quality library for Elixir to transfer files in the background and monitor progress using rsync.

I was excited to learn how to interface with long-lived external processes—and this project offered more than I hoped for.

{{Aside|text=<p>[[w:rsync|Rsync]] is the standard utility for file transfers, locally or over a network. It can resume incomplete transfers and synchronize directories efficiently, and after almost 30 years of usage rsync can be trusted to handle any edge case.</p>
<p>BEAM<ref>The virtual machine shared by Erlang, Elixir, Gleam, Ash, and so on: [https://blog.stenmans.org/theBeamBook/ the BEAM Book]</ref> is a fairly unique ecosystem in which it's not considered deviant to reinvent a rounder wheel: an external dependency like "cron" will often be ported into native Erlang—but the complexity of rsync and its dependence on a matching remote daemon makes it unlikely that it will be rewritten any time soon, which is why I've decided to wrap external command execution in a library.</p>}}

[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|300x300px]]

=== Naïve shelling ===

Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
System.shell("rsync -a source target")
</syntaxhighlight>
This has a few shortcomings, starting with how one would pass it dynamic paths. It's unsafe to use string interpolation (<code>"#{source}"</code> ): consider what could happen if the filenames include unescaped whitespace or special shell characters such as ";".

=== Safe path handling ===
We turn next to <code>System.cmd</code>, which takes a raw argv and can't be fooled special characters in the path arguments:<syntaxhighlight lang="elixir">
System.find_executable(rsync_path)
|> System.cmd([~w(-a), source, target])
</syntaxhighlight>For a short job this is perfect, but for longer transfers our program loses control and observability, waiting indefinitely for a monolithic command to return.

=== Asynchronous call and communication ===
To run a external process asynchronously we reach for Elixir's low-level <code>Port.open</code>, nothing but a one-line wrapper<ref>See the [https://github.com/elixir-lang/elixir/blob/809b035dccf046b7b7b4422f42cfb6d075df71d2/lib/elixir/lib/port.ex#L232 port.ex source code]</ref> passing its parameters directly to ERTS <code>open_port</code><ref>[https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2 Erlang <code>open_port</code> docs]</ref>. This function is tremendously flexible, here we turn a few knobs:<syntaxhighlight lang="elixir">
Port.open(
{:spawn_executable, rsync_path},
[
:binary,
:exit_status,
:hide,
:use_stdio,
:stderr_to_stdout,
args:
~w(-a --info=progress2) ++
rsync_args ++
sources ++
[args[:target]],
env: env
]
)
</syntaxhighlight>

{{Aside|text=
'''Rsync progress reporting options'''

There are a variety of ways to report progress:

; <code>-v</code> : list each filename as it's transferred

; <code>--progress</code> : report statistics per file

; <code>--info=progress2</code> : report overall progress

; <code>--itemize-changes</code> : list the operations taken on each file

; <code>--out-format=FORMAT</code> : any custom format string following rsyncd.conf's <code>log format</code><ref>https://man.freebsd.org/cgi/man.cgi?query=rsyncd.conf</ref>
}}

Rsync outputs <code>--info=progress2</code> lines like so:<syntaxhighlight lang="text">
overall percent complete time remaining
bytes transferred | transfer speed |
| | | |
3,342,336 33% 3.14MB/s 0:00:02
</syntaxhighlight>

The controlling Port captures these lines is sent to the library's <code>handle_info</code> callback as <code>{:data, line}</code>. After the transfer is finished we receive a conclusive <code>{:exit_status, status_code}</code> message.

As a first step, we extract the overall_percent_done column and flag any unrecognized output:
<syntaxhighlight lang="elixir">
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
percent_done_text when percent_done_text != nil <- Enum.at(terms, 1),
{percent_done, "%"} <- Float.parse(percent_done_text) do
percent_done
else
_ ->
{:unknown, line}
end
</syntaxhighlight>The <code>trim</code> is lifting more than its weight here: it lets us completely ignore spacing and newline trickery—and ignores the leading carriage return before each line, seen in the rsync source code:<ref>[https://github.com/RsyncProject/rsync/blob/797e17fc4a6f15e3b1756538a9f812b63942686f/progress.c#L129 rsync/progress.c] source code</ref>
<syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>Carriage return <code>\r</code> deserves special mention: this is the first "control" character we come across and it looks the same as an ordinary byte in the binary data coming over the pipe from rsync, similar to newline <code>\n</code>. Its normal role is to control the terminal emulator, rewinding the cursor so that the current line can be overwritten! And like newline, carriage return can be ignored. Control signaling is exactly what goes haywire about this project, and the leaky category distinction between data and control seems to be a repeated theme in inter-process communication. The reality is not so much data vs. control, as it seems to be a sequence of layers like with [[w:OSI model|networking]].

{{Aside|text=
[[File:Chinese typewriter 03.jpg|right|200x200px]]

On the terminal, rsync progress lines are updated in place by beginning each line with a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>. Try this command in a terminal:<syntaxhighlight lang="shell">
# echo "three^Mtwo"
twoee
</syntaxhighlight>
You'll have to use <control>-v <control>-m to type a literal carriage return, copy-and-paste won't work.

The character is named after the pushing of a physical typewriter carriage to return to the beginning of the current line without feeding the roller to a new line.

[[File:Baboons Playing in Chobe National Park-crlf.jpg|left|300x300px|Three young baboons playing on a rock ledge. Two are on the ridge and one below, grabbing the tail of another. A meme font shows "\r", "\n", and "\r\n" personified as each baboon.]]
[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreement about carriage return]] vs. line feed has caused eye-rolling since the dawn of personal computing.
}}

== OTP generic server ==
The Port API is convenient enough so far, but Erlang/OTP really starts to shine once we wrap each Port connection under a <code>gen_server</code><ref>[https://www.erlang.org/doc/apps/stdlib/gen_server.html Erlang gen_server docs]</ref> module, giving us several properties for free: A dedicated application thread coordinates with its rsync process independent of anything else. Input and output are asynchronous and buffered, but handled sequentially in a thread-safe way. The gen_server holds internal state including the up-to-date completion percentage. And the caller can request updates as needed, or it can listen for push messages with the parsed statistics.

This gen_server is also expected to run safely under an OTP supervision tree<ref>[https://adoptingerlang.org/docs/development/supervision_trees/ "Supervision Trees"] chapter from [https://adoptingerlang.org/ Adopting Erlang]</ref> but this is where our dream falls apart for the moment. The Port already watches for rsync completion or failure and reports upwards to its caller, but we fail at the critical property of being able to propagate a termination downwards to shut down rsync if the calling code or our library module crashes.

== Problem: runaway processes ==
[[File:CargoNet Di 12 Euro 4000 Lønsdal - Bolna.jpg|thumb]]
The unpleasant real-world consequence is that rsync transfers will continue to run in the background even after Elixir kills our gen_server or shuts down, because the BEAM has no way of stopping the external process.

It's possible to find the operating system PID of the child process with <code>Port.info(port, :os_pid)</code> and send it a signal by shelling out to unix <code>kill PID</code>, but BEAM doesn't include built-in functions to send a signal to an OS process, and there is an ugly race condition between closing the port and sending this signal. We'll keep looking for another way to "link" the processes.

To debug what happens during <code>port_close</code> and to eliminate variables, I tried spawning <code>sleep 60</code> instead of rsync and I found that it behaves in exactly the same way: hanging until <code>sleep</code> ends naturally regardless of what happened in Elixir or whether its pipes are still open. This happens to have been a lucky choice as I learned later: "sleep" is daemon-like so similar to rsync, but its behavior is much simpler to reason about.

== Bad assumption: pipe-like processes ==
A pipeline like <code>gzip</code> or <code>cat</code> it built to read from its input and write to its output. We can roughly group the different styles of command-line application into "pipeline" programs which read and write, "interactive" programs which require user input, and "daemon" programs which are designed to run in the background. Some programs support multiple modes depending on the arguments given at launch, or by detecting the terminal using <code>isatty</code><ref>[https://man.archlinux.org/man/isatty.3.en docs for <code>isatty</code>]</ref>. The BEAM is currently optimized to interface with pipeline programs and it assumes that the external process will stop when its "standard input" is closed.

A typical pipeline program will stop once it detects that input has ended, for example by calling <code>read</code><ref>[https://man.archlinux.org/man/read.2 libc <code>read</code> docs]</ref> in a loop:<syntaxhighlight lang="c">
size_read = read (input_desc, buf, bufsize);
if (size_read < 0) { error... }
if (size_read == 0) { end of file... }
</syntaxhighlight>

If the program does blocking I/O, then a zero-byte <code>read</code> indicates the end of file condition. A program which does asynchronous I/O with <code>O_NONBLOCK</code><ref>[https://man.archlinux.org/man/open.2.en#O_NONBLOCK O_NONBLOCK docs]</ref> might instead detect EOF by listening for the <code>HUP</code> hang-up signal which is can be arranged (TODO: document how this can be done with <code>prctl</code>, and on which platforms).

But here we'll focus on how processes can more generally affect each other through pipes. Surprising answer: without much effect! You can experiment with the <code>/dev/null</code> device which behaves like a closed pipe, for example compare these two commands:

<syntaxhighlight lang="shell">
cat < /dev/null

sleep 10 < /dev/null
</syntaxhighlight><code>cat</code> exits immediately, but <code>sleep</code> does its thing as usual.

You could do the same experiment by opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file. Interestingly, what happened here is that <control>-d is interpreted by bash which responds by closing its pipe connected to standard input of the child process. This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe. My entry point to learning more is this stty webzine<ref>[https://wizardzines.com/comics/stty/ ★ wizard zines ★: stty]</ref> by Julia Evans. Dump information about your own terminal emulator: <code>stty -a</code>

Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a hard reality. A program with a chaotic disposition could even reopen stdin after it was closed and connect it to something else, to the great surprise of friends and neighbors.

Back to the problem at hand, "rsync" is in the category of "daemon-like" programs which will carry on even after standard input is closed. This makes sense enough, since rsync isn't interactive and any output is just a side effect of its main purpose.

== Shimming can kill ==
A small shim can adapt a daemon-like program to behave more like a pipeline. The shim is sensitive to stdin closing or SIGHUP, and when this is detected it converts this into a stronger signal like SIGTERM which it forwards to its own child. This is the idea behind a suggested shell script<ref>[https://hexdocs.pm/elixir/1.19.0/Port.html#module-orphan-operating-system-processes Elixir Port docs showing a shim script]</ref> for Elixir, and the <code>erlexec</code><ref name=":0">[https://hexdocs.pm/erlexec/readme.html <code>erlexec</code> library]</ref> library. The opposite adapter can be found in the [[w:nohup|nohup]] shell command and the grimsby<ref>[https://github.com/shortishly/grimsby <code>grimsby</code> library]</ref> library: these will keep standard in and/or standard out open for the child process even after the parent exits, so that a pipe-like program can behave more like a daemon.

I used the shim approach in my rsync library and it includes a small C program<ref>[https://gitlab.com/adamwight/rsync_ex/-/blob/main/src/main.c?ref_type=heads rsync_ex C shim program]</ref> which wraps rsync and makes it sensitive to BEAM <code>port_close</code>. It's featherweight, leaving pipes unchanged as it passes control to rsync, here are the business parts:<syntaxhighlight lang="c">// Set up a fail-safe to self-signal with HUP if the controlling process dies.
prctl(PR_SET_PDEATHSIG, SIGHUP);</syntaxhighlight><syntaxhighlight lang="c">
void handle_signal(int signum) {
if (signum == SIGHUP && child_pid > 0) {
// Send the child TERM so that rsync can perform clean-up such as shutting down a remote server.
kill(child_pid, SIGTERM);
}
}
</syntaxhighlight>

== Reliable clean up ==
{{Project|status=in review|url=https://erlangforums.com/t/open-port-and-zombie-processes|source=https://github.com/erlang/otp/pull/9453}}
It's always a pleasure to ask questions in the BEAM communities, they deserve their reputation for being friendly and open. The first big tip was to look at the third-party library <code>erlexec</code><ref name=":0" />, which demonstrates emerging best practices which could be backported into the language itself. Everyone speaking on the problem generally agrees that the fragile clean up of external processes is a bug, and supports the idea that some flavor of "terminate" signal should be sent to spawned programs when the port is closed.
[[File:Itinerant glassworker exhibition with spinning wheel and steam engine.jpg|thumb]]
I would be lying to hide my disappointment that the required core changes are mostly in an auxiliary C program and not written in Erlang or even in the BEAM itself, but it was still fascinating to open such an elegant black box and find the technological equivalent of a steam engine inside. All of the futuristic, high-level features we've come to know actually map closely to a few scraps of wizardry with ordinary pipes<ref>[https://man.archlinux.org/man/pipe.7.en Overview of unix pipes]</ref>, using libc's pipe<ref>[https://man.archlinux.org/man/pipe.2.en Docs for the <code>pipe</code> syscall]</ref>, read, write, and select<ref>[https://man.archlinux.org/man/select.2.en libc <code>select</code> docs]</ref>.

Port drivers<ref>[https://www.erlang.org/doc/system/ports.html Erlang ports docs]</ref> are fundamental to ERTS, and several levels of port wiring are involved in launching external processes: the spawn driver starts a forker driver which sends a control message to <code>erl_child_setup</code> to execute your external command. Each BEAM has a single erl_child_setup process to watch over all children. This architecture reflects the Supervisor paradigm and we can leverage it to produce some of the same properties: the subprocess can buffer reads and writes asynchronously and handle them sequentially; and if the BEAM crashes then erl_child_setup can detect the condition and do its own cleanup.

Letting a child process outlive its controlling process leaves the child in a state called "orphaned" in POSIX, and the standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists. This can be seen as undesirable because unix itself has a paradigm similar to OTP's Supervisors, in which each parent is responsible for its children. Without supervision, a process could potentially run forever or do naughty things. The system <code>init</code> process starts and tracks its own children, and can restart them in response to service commands. But init will know nothing about adopted, orphan processes or how to monitor and restart them.

The patch [https://github.com/erlang/otp/pull/9453 PR#9453] adapting port_close to SIGTERM is waiting for review and responses look generally positive so far.

{{Aside|text='''Which signal?'''

Which signal to use is still an open question:

; <code>HUP</code> : sent to a process when its standard input stream is closed<ref>[https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap11.html#tag_11_01_10 POSIX standard "General Terminal Interface: Modem Disconnect"]</ref>

; <code>TERM</code> : has a clear intention of "kill this thing" but still possible to trap at the target and handle in a customized way

; <code>KILL</code> : bursting with destructive potential, this signal cannot be stopped and you may not clean up

There is a refreshing diversity of opinion, so it could be worthwhile to make the signal configurable for each port.
}}

== TODO: consistency with unix process groups ==

... there is something fun here about how unix already has process tree behaviors which are close analogues to a BEAM supervisor tree.

== Future directions ==
Discussion threads also included some notable grumbling about the Port API in general, it seems this part of ERTS is overdue for a larger redesign.

There's a good opportunity to unify the different platform implementations: Windows lacks the erl_child_setup layer entirely, for example.

Another idea to borrow from the erlexec library is to have an option to kill the entire process group of a child, which is shared by any descendants that haven't explicitly broken out of its original group. This would be useful for managing deep trees of external processes launched by a forked command.

== References ==

Elixir/Ports and external process wiring

2025-10-23T15:22:58Z

Adamw: /* Future directions */ teaser about process groups

A deceivingly simple programming adventure veers unexpectedly into piping and signaling between unix processes.

== Context: controlling "rsync" ==
{{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}}

My exploration begins while writing a beta-quality library for Elixir to transfer files in the background and monitor progress using rsync.

I was excited to learn how to interface with long-lived external processes—and this project offered more than I hoped for.

{{Aside|text=<p>[[w:rsync|Rsync]] is the standard utility for file transfers, locally or over a network. It can resume incomplete transfers and synchronize directories efficiently, and after almost 30 years of usage rsync can be trusted to handle any edge case.</p>
<p>BEAM<ref>The virtual machine shared by Erlang, Elixir, Gleam, Ash, and so on: [https://blog.stenmans.org/theBeamBook/ the BEAM Book]</ref> is a fairly unique ecosystem in which it's not considered deviant to reinvent a rounder wheel: an external dependency like "cron" will often be ported into native Erlang—but the complexity of rsync and its dependence on a matching remote daemon makes it unlikely that it will be rewritten any time soon, which is why I've decided to wrap external command execution in a library.</p>}}

[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|300x300px]]

=== Naïve shelling ===

Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
System.shell("rsync -a source target")
</syntaxhighlight>
This has a few shortcomings, starting with how one would pass it dynamic paths. It's unsafe to use string interpolation (<code>"#{source}"</code> ): consider what could happen if the filenames include unescaped whitespace or special shell characters such as ";".

=== Safe path handling ===
We turn next to <code>System.cmd</code>, which takes a raw argv and can't be fooled special characters in the path arguments:<syntaxhighlight lang="elixir">
System.find_executable(rsync_path)
|> System.cmd([~w(-a), source, target])
</syntaxhighlight>For a short job this is perfect, but for longer transfers our program loses control and observability, waiting indefinitely for a monolithic command to return.

=== Asynchronous call and communication ===
To run a external process asynchronously we reach for Elixir's low-level <code>Port.open</code>, nothing but a one-line wrapper<ref>See the [https://github.com/elixir-lang/elixir/blob/809b035dccf046b7b7b4422f42cfb6d075df71d2/lib/elixir/lib/port.ex#L232 port.ex source code]</ref> passing its parameters directly to ERTS <code>open_port</code><ref>[https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2 Erlang <code>open_port</code> docs]</ref>. This function is tremendously flexible, here we turn a few knobs:<syntaxhighlight lang="elixir">
Port.open(
{:spawn_executable, rsync_path},
[
:binary,
:exit_status,
:hide,
:use_stdio,
:stderr_to_stdout,
args:
~w(-a --info=progress2) ++
rsync_args ++
sources ++
[args[:target]],
env: env
]
)
</syntaxhighlight>

{{Aside|text=
'''Rsync progress reporting options'''

There are a variety of ways to report progress:

; <code>-v</code> : list each filename as it's transferred

; <code>--progress</code> : report statistics per file

; <code>--info=progress2</code> : report overall progress

; <code>--itemize-changes</code> : list the operations taken on each file

; <code>--out-format=FORMAT</code> : any custom format string following rsyncd.conf's <code>log format</code><ref>https://man.freebsd.org/cgi/man.cgi?query=rsyncd.conf</ref>
}}

Rsync outputs <code>--info=progress2</code> lines like so:<syntaxhighlight lang="text">
overall percent complete time remaining
bytes transferred | transfer speed |
| | | |
3,342,336 33% 3.14MB/s 0:00:02
</syntaxhighlight>

The controlling Port captures these lines is sent to the library's <code>handle_info</code> callback as <code>{:data, line}</code>. After the transfer is finished we receive a conclusive <code>{:exit_status, status_code}</code> message.

As a first step, we extract the overall_percent_done column and flag any unrecognized output:
<syntaxhighlight lang="elixir">
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
percent_done_text when percent_done_text != nil <- Enum.at(terms, 1),
{percent_done, "%"} <- Float.parse(percent_done_text) do
percent_done
else
_ ->
{:unknown, line}
end
</syntaxhighlight>The <code>trim</code> is lifting more than its weight here: it lets us completely ignore spacing and newline trickery—and ignores the leading carriage return before each line, seen in the rsync source code:<ref>[https://github.com/RsyncProject/rsync/blob/797e17fc4a6f15e3b1756538a9f812b63942686f/progress.c#L129 rsync/progress.c] source code</ref>
<syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>Carriage return <code>\r</code> deserves special mention: this is the first "control" character we come across and it looks the same as an ordinary byte in the binary data coming over the pipe from rsync, similar to newline <code>\n</code>. Its normal role is to control the terminal emulator, rewinding the cursor so that the current line can be overwritten! And like newline, carriage return can be ignored. Control signaling is exactly what goes haywire about this project, and the leaky category distinction between data and control seems to be a repeated theme in inter-process communication. The reality is not so much data vs. control, as it seems to be a sequence of layers like with [[w:OSI model|networking]].

{{Aside|text=
[[File:Chinese typewriter 03.jpg|right|200x200px]]

On the terminal, rsync progress lines are updated in place by beginning each line with a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>. Try this command in a terminal:<syntaxhighlight lang="shell">
# echo "three^Mtwo"
twoee
</syntaxhighlight>
You'll have to use <control>-v <control>-m to type a literal carriage return, copy-and-paste won't work.

The character is named after the pushing of a physical typewriter carriage to return to the beginning of the current line without feeding the roller to a new line.

[[File:Baboons Playing in Chobe National Park-crlf.jpg|left|300x300px|Three young baboons playing on a rock ledge. Two are on the ridge and one below, grabbing the tail of another. A meme font shows "\r", "\n", and "\r\n" personified as each baboon.]]
[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreement about carriage return]] vs. line feed has caused eye-rolling since the dawn of personal computing.
}}

== OTP generic server ==
The Port API is convenient enough so far, but Erlang/OTP really starts to shine once we wrap each Port connection under a <code>gen_server</code><ref>[https://www.erlang.org/doc/apps/stdlib/gen_server.html Erlang gen_server docs]</ref> module, giving us several properties for free: A dedicated application thread coordinates with its rsync process independent of anything else. Input and output are asynchronous and buffered, but handled sequentially in a thread-safe way. The gen_server holds internal state including the up-to-date completion percentage. And the caller can request updates as needed, or it can listen for push messages with the parsed statistics.

This gen_server is also expected to run safely under an OTP supervision tree<ref>[https://adoptingerlang.org/docs/development/supervision_trees/ "Supervision Trees"] chapter from [https://adoptingerlang.org/ Adopting Erlang]</ref> but this is where our dream falls apart for the moment. The Port already watches for rsync completion or failure and reports upwards to its caller, but we fail at the critical property of being able to propagate a termination downwards to shut down rsync if the calling code or our library module crashes.

== Problem: runaway processes ==
[[File:CargoNet Di 12 Euro 4000 Lønsdal - Bolna.jpg|thumb]]
The unpleasant real-world consequence is that rsync transfers will continue to run in the background even after Elixir kills our gen_server or shuts down, because the BEAM has no way of stopping the external process.

It's possible to find the operating system PID of the child process with <code>Port.info(port, :os_pid)</code> and send it a signal by shelling out to unix <code>kill PID</code>, but BEAM doesn't include built-in functions to send a signal to an OS process, and there is an ugly race condition between closing the port and sending this signal. We'll keep looking for another way to "link" the processes.

To debug what happens during <code>port_close</code> and to eliminate variables, I tried spawning <code>sleep 60</code> instead of rsync and I found that it behaves in exactly the same way: hanging until <code>sleep</code> ends naturally regardless of what happened in Elixir or whether its pipes are still open. This happens to have been a lucky choice as I learned later: "sleep" is daemon-like so similar to rsync, but its behavior is much simpler to reason about.

== Bad assumption: pipe-like processes ==
A pipeline like <code>gzip</code> or <code>cat</code> it built to read from its input and write to its output. We can roughly group the different styles of command-line application into "pipeline" programs which read and write, "interactive" programs which require user input, and "daemon" programs which are designed to run in the background. Some programs support multiple modes depending on the arguments given at launch, or by detecting the terminal using <code>isatty</code><ref>[https://man.archlinux.org/man/isatty.3.en docs for <code>isatty</code>]</ref>. The BEAM is currently optimized to interface with pipeline programs and it assumes that the external process will stop when its "standard input" is closed.

A typical pipeline program will stop once it detects that input has ended, for example by calling <code>read</code><ref>[https://man.archlinux.org/man/read.2 libc <code>read</code> docs]</ref> in a loop:<syntaxhighlight lang="c">
size_read = read (input_desc, buf, bufsize);
if (size_read < 0) { error... }
if (size_read == 0) { end of file... }
</syntaxhighlight>

If the program does blocking I/O, then a zero-byte <code>read</code> indicates the end of file condition. A program which does asynchronous I/O with <code>O_NONBLOCK</code><ref>[https://man.archlinux.org/man/open.2.en#O_NONBLOCK O_NONBLOCK docs]</ref> might instead detect EOF by listening for the <code>HUP</code> hang-up signal which is can be arranged (TODO: document how this can be done with <code>prctl</code>, and on which platforms).

But here we'll focus on how processes can more generally affect each other through pipes. Surprising answer: without much effect! You can experiment with the <code>/dev/null</code> device which behaves like a closed pipe, for example compare these two commands:

<syntaxhighlight lang="shell">
cat < /dev/null

sleep 10 < /dev/null
</syntaxhighlight><code>cat</code> exits immediately, but <code>sleep</code> does its thing as usual.

You could do the same experiment by opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file. Interestingly, what happened here is that <control>-d is interpreted by bash which responds by closing its pipe connected to standard input of the child process. This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe. My entry point to learning more is this stty webzine<ref>[https://wizardzines.com/comics/stty/ ★ wizard zines ★: stty]</ref> by Julia Evans. Dump information about your own terminal emulator: <code>stty -a</code>

Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a hard reality. A program with a chaotic disposition could even reopen stdin after it was closed and connect it to something else, to the great surprise of friends and neighbors.

Back to the problem at hand, "rsync" is in the category of "daemon-like" programs which will carry on even after standard input is closed. This makes sense enough, since rsync isn't interactive and any output is just a side effect of its main purpose.

== Shimming can kill ==
A small shim can adapt a daemon-like program to behave more like a pipeline. The shim is sensitive to stdin closing or SIGHUP, and when this is detected it converts this into a stronger signal like SIGTERM which it forwards to its own child. This is the idea behind a suggested shell script<ref>[https://hexdocs.pm/elixir/1.19.0/Port.html#module-orphan-operating-system-processes Elixir Port docs showing a shim script]</ref> for Elixir, and the <code>erlexec</code><ref name=":0">[https://hexdocs.pm/erlexec/readme.html <code>erlexec</code> library]</ref> library. The opposite adapter can be found in the [[w:nohup|nohup]] shell command and the grimsby<ref>[https://github.com/shortishly/grimsby <code>grimsby</code> library]</ref> library: these will keep standard in and/or standard out open for the child process even after the parent exits, so that a pipe-like program can behave more like a daemon.

I used the shim approach in my rsync library and it includes a small C program<ref>[https://gitlab.com/adamwight/rsync_ex/-/blob/main/src/main.c?ref_type=heads rsync_ex C shim program]</ref> which wraps rsync and makes it sensitive to BEAM <code>port_close</code>. It's featherweight, leaving pipes unchanged as it passes control to rsync, here are the business parts:<syntaxhighlight lang="c">// Set up a fail-safe to self-signal with HUP if the controlling process dies.
prctl(PR_SET_PDEATHSIG, SIGHUP);</syntaxhighlight><syntaxhighlight lang="c">
void handle_signal(int signum) {
if (signum == SIGHUP && child_pid > 0) {
// Send the child TERM so that rsync can perform clean-up such as shutting down a remote server.
kill(child_pid, SIGTERM);
}
}
</syntaxhighlight>

== Reliable clean up ==
{{Project|status=in review|url=https://erlangforums.com/t/open-port-and-zombie-processes|source=https://github.com/erlang/otp/pull/9453}}
It's always a pleasure to ask questions in the BEAM communities, they deserve their reputation for being friendly and open. The first big tip was to look at the third-party library <code>erlexec</code><ref name=":0" />, which demonstrates emerging best practices which could be backported into the language itself. Everyone speaking on the problem generally agrees that the fragile clean up of external processes is a bug, and supports the idea that some flavor of "terminate" signal should be sent to spawned programs when the port is closed.
[[File:Itinerant glassworker exhibition with spinning wheel and steam engine.jpg|thumb]]
I would be lying to hide my disappointment that the required core changes are mostly in an auxiliary C program and not written in Erlang or even in the BEAM itself, but it was still fascinating to open such an elegant black box and find the technological equivalent of a steam engine inside. All of the futuristic, high-level features we've come to know actually map closely to a few scraps of wizardry with ordinary pipes<ref>[https://man.archlinux.org/man/pipe.7.en Overview of unix pipes]</ref>, using libc's pipe<ref>[https://man.archlinux.org/man/pipe.2.en Docs for the <code>pipe</code> syscall]</ref>, read, write, and select<ref>[https://man.archlinux.org/man/select.2.en libc <code>select</code> docs]</ref>.

Port drivers<ref>[https://www.erlang.org/doc/system/ports.html Erlang ports docs]</ref> are fundamental to ERTS, and several levels of port wiring are involved in launching external processes: the spawn driver starts a forker driver which sends a control message to <code>erl_child_setup</code> to execute your external command. Each BEAM has a single erl_child_setup process to watch over all children. This architecture reflects the Supervisor paradigm and we can leverage it to produce some of the same properties: the subprocess can buffer reads and writes asynchronously and handle them sequentially; and if the BEAM crashes then erl_child_setup can detect the condition and do its own cleanup.

Letting a child process outlive its controlling process leaves the child in a state called "orphaned" in POSIX, and the standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists. This can be seen as undesirable because unix itself has a paradigm similar to OTP's Supervisors, in which each parent is responsible for its children. Without supervision, a process could potentially run forever or do naughty things. The system <code>init</code> process starts and tracks its own children, and can restart them in response to service commands. But init will know nothing about adopted, orphan processes or how to monitor and restart them.

The patch [https://github.com/erlang/otp/pull/9453 PR#9453] adapting port_close to SIGTERM is waiting for review and responses look generally positive so far.

{{Aside|text='''Which signal?'''

Which signal to use is still an open question:

; <code>HUP</code> : sent to a process when its standard input stream is closed<ref>[https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap11.html#tag_11_01_10 POSIX standard "General Terminal Interface: Modem Disconnect"</ref>

; <code>TERM</code> : has a clear intention of "kill this thing" but still possible to trap at the target and handle in a customized way

; <code>KILL</code> : bursting with destructive potential, this signal cannot be stopped and you may not clean up

There is a refreshing diversity of opinion, so it could be worthwhile to make the signal configurable for each port.
}}

== TODO: consistency with unix process groups ==

... there is something fun here about how unix already has process tree behaviors which are close analogues to a BEAM supervisor tree.

== Future directions ==
Discussion threads also included some notable grumbling about the Port API in general, it seems this part of ERTS is overdue for a larger redesign.

There's a good opportunity to unify the different platform implementations: Windows lacks the erl_child_setup layer entirely, for example.

Another idea to borrow from the erlexec library is to have an option to kill the entire process group of a child, which is shared by any descendants that haven't explicitly broken out of its original group. This would be useful for managing deep trees of external processes launched by a forked command.

== References ==

Elixir/Ports and external process wiring

2025-10-23T15:21:16Z

Adamw: /* Bad assumption: pipe-like processes */ correction about when HUP is sent

A deceivingly simple programming adventure veers unexpectedly into piping and signaling between unix processes.

== Context: controlling "rsync" ==
{{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}}

My exploration begins while writing a beta-quality library for Elixir to transfer files in the background and monitor progress using rsync.

I was excited to learn how to interface with long-lived external processes—and this project offered more than I hoped for.

{{Aside|text=<p>[[w:rsync|Rsync]] is the standard utility for file transfers, locally or over a network. It can resume incomplete transfers and synchronize directories efficiently, and after almost 30 years of usage rsync can be trusted to handle any edge case.</p>
<p>BEAM<ref>The virtual machine shared by Erlang, Elixir, Gleam, Ash, and so on: [https://blog.stenmans.org/theBeamBook/ the BEAM Book]</ref> is a fairly unique ecosystem in which it's not considered deviant to reinvent a rounder wheel: an external dependency like "cron" will often be ported into native Erlang—but the complexity of rsync and its dependence on a matching remote daemon makes it unlikely that it will be rewritten any time soon, which is why I've decided to wrap external command execution in a library.</p>}}

[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|300x300px]]

=== Naïve shelling ===

Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
System.shell("rsync -a source target")
</syntaxhighlight>
This has a few shortcomings, starting with how one would pass it dynamic paths. It's unsafe to use string interpolation (<code>"#{source}"</code> ): consider what could happen if the filenames include unescaped whitespace or special shell characters such as ";".

=== Safe path handling ===
We turn next to <code>System.cmd</code>, which takes a raw argv and can't be fooled special characters in the path arguments:<syntaxhighlight lang="elixir">
System.find_executable(rsync_path)
|> System.cmd([~w(-a), source, target])
</syntaxhighlight>For a short job this is perfect, but for longer transfers our program loses control and observability, waiting indefinitely for a monolithic command to return.

=== Asynchronous call and communication ===
To run a external process asynchronously we reach for Elixir's low-level <code>Port.open</code>, nothing but a one-line wrapper<ref>See the [https://github.com/elixir-lang/elixir/blob/809b035dccf046b7b7b4422f42cfb6d075df71d2/lib/elixir/lib/port.ex#L232 port.ex source code]</ref> passing its parameters directly to ERTS <code>open_port</code><ref>[https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2 Erlang <code>open_port</code> docs]</ref>. This function is tremendously flexible, here we turn a few knobs:<syntaxhighlight lang="elixir">
Port.open(
{:spawn_executable, rsync_path},
[
:binary,
:exit_status,
:hide,
:use_stdio,
:stderr_to_stdout,
args:
~w(-a --info=progress2) ++
rsync_args ++
sources ++
[args[:target]],
env: env
]
)
</syntaxhighlight>

{{Aside|text=
'''Rsync progress reporting options'''

There are a variety of ways to report progress:

; <code>-v</code> : list each filename as it's transferred

; <code>--progress</code> : report statistics per file

; <code>--info=progress2</code> : report overall progress

; <code>--itemize-changes</code> : list the operations taken on each file

; <code>--out-format=FORMAT</code> : any custom format string following rsyncd.conf's <code>log format</code><ref>https://man.freebsd.org/cgi/man.cgi?query=rsyncd.conf</ref>
}}

Rsync outputs <code>--info=progress2</code> lines like so:<syntaxhighlight lang="text">
overall percent complete time remaining
bytes transferred | transfer speed |
| | | |
3,342,336 33% 3.14MB/s 0:00:02
</syntaxhighlight>

The controlling Port captures these lines is sent to the library's <code>handle_info</code> callback as <code>{:data, line}</code>. After the transfer is finished we receive a conclusive <code>{:exit_status, status_code}</code> message.

As a first step, we extract the overall_percent_done column and flag any unrecognized output:
<syntaxhighlight lang="elixir">
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
percent_done_text when percent_done_text != nil <- Enum.at(terms, 1),
{percent_done, "%"} <- Float.parse(percent_done_text) do
percent_done
else
_ ->
{:unknown, line}
end
</syntaxhighlight>The <code>trim</code> is lifting more than its weight here: it lets us completely ignore spacing and newline trickery—and ignores the leading carriage return before each line, seen in the rsync source code:<ref>[https://github.com/RsyncProject/rsync/blob/797e17fc4a6f15e3b1756538a9f812b63942686f/progress.c#L129 rsync/progress.c] source code</ref>
<syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>Carriage return <code>\r</code> deserves special mention: this is the first "control" character we come across and it looks the same as an ordinary byte in the binary data coming over the pipe from rsync, similar to newline <code>\n</code>. Its normal role is to control the terminal emulator, rewinding the cursor so that the current line can be overwritten! And like newline, carriage return can be ignored. Control signaling is exactly what goes haywire about this project, and the leaky category distinction between data and control seems to be a repeated theme in inter-process communication. The reality is not so much data vs. control, as it seems to be a sequence of layers like with [[w:OSI model|networking]].

{{Aside|text=
[[File:Chinese typewriter 03.jpg|right|200x200px]]

On the terminal, rsync progress lines are updated in place by beginning each line with a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>. Try this command in a terminal:<syntaxhighlight lang="shell">
# echo "three^Mtwo"
twoee
</syntaxhighlight>
You'll have to use <control>-v <control>-m to type a literal carriage return, copy-and-paste won't work.

The character is named after the pushing of a physical typewriter carriage to return to the beginning of the current line without feeding the roller to a new line.

[[File:Baboons Playing in Chobe National Park-crlf.jpg|left|300x300px|Three young baboons playing on a rock ledge. Two are on the ridge and one below, grabbing the tail of another. A meme font shows "\r", "\n", and "\r\n" personified as each baboon.]]
[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreement about carriage return]] vs. line feed has caused eye-rolling since the dawn of personal computing.
}}

== OTP generic server ==
The Port API is convenient enough so far, but Erlang/OTP really starts to shine once we wrap each Port connection under a <code>gen_server</code><ref>[https://www.erlang.org/doc/apps/stdlib/gen_server.html Erlang gen_server docs]</ref> module, giving us several properties for free: A dedicated application thread coordinates with its rsync process independent of anything else. Input and output are asynchronous and buffered, but handled sequentially in a thread-safe way. The gen_server holds internal state including the up-to-date completion percentage. And the caller can request updates as needed, or it can listen for push messages with the parsed statistics.

This gen_server is also expected to run safely under an OTP supervision tree<ref>[https://adoptingerlang.org/docs/development/supervision_trees/ "Supervision Trees"] chapter from [https://adoptingerlang.org/ Adopting Erlang]</ref> but this is where our dream falls apart for the moment. The Port already watches for rsync completion or failure and reports upwards to its caller, but we fail at the critical property of being able to propagate a termination downwards to shut down rsync if the calling code or our library module crashes.

== Problem: runaway processes ==
[[File:CargoNet Di 12 Euro 4000 Lønsdal - Bolna.jpg|thumb]]
The unpleasant real-world consequence is that rsync transfers will continue to run in the background even after Elixir kills our gen_server or shuts down, because the BEAM has no way of stopping the external process.

It's possible to find the operating system PID of the child process with <code>Port.info(port, :os_pid)</code> and send it a signal by shelling out to unix <code>kill PID</code>, but BEAM doesn't include built-in functions to send a signal to an OS process, and there is an ugly race condition between closing the port and sending this signal. We'll keep looking for another way to "link" the processes.

To debug what happens during <code>port_close</code> and to eliminate variables, I tried spawning <code>sleep 60</code> instead of rsync and I found that it behaves in exactly the same way: hanging until <code>sleep</code> ends naturally regardless of what happened in Elixir or whether its pipes are still open. This happens to have been a lucky choice as I learned later: "sleep" is daemon-like so similar to rsync, but its behavior is much simpler to reason about.

== Bad assumption: pipe-like processes ==
A pipeline like <code>gzip</code> or <code>cat</code> it built to read from its input and write to its output. We can roughly group the different styles of command-line application into "pipeline" programs which read and write, "interactive" programs which require user input, and "daemon" programs which are designed to run in the background. Some programs support multiple modes depending on the arguments given at launch, or by detecting the terminal using <code>isatty</code><ref>[https://man.archlinux.org/man/isatty.3.en docs for <code>isatty</code>]</ref>. The BEAM is currently optimized to interface with pipeline programs and it assumes that the external process will stop when its "standard input" is closed.

A typical pipeline program will stop once it detects that input has ended, for example by calling <code>read</code><ref>[https://man.archlinux.org/man/read.2 libc <code>read</code> docs]</ref> in a loop:<syntaxhighlight lang="c">
size_read = read (input_desc, buf, bufsize);
if (size_read < 0) { error... }
if (size_read == 0) { end of file... }
</syntaxhighlight>

If the program does blocking I/O, then a zero-byte <code>read</code> indicates the end of file condition. A program which does asynchronous I/O with <code>O_NONBLOCK</code><ref>[https://man.archlinux.org/man/open.2.en#O_NONBLOCK O_NONBLOCK docs]</ref> might instead detect EOF by listening for the <code>HUP</code> hang-up signal which is can be arranged (TODO: document how this can be done with <code>prctl</code>, and on which platforms).

But here we'll focus on how processes can more generally affect each other through pipes. Surprising answer: without much effect! You can experiment with the <code>/dev/null</code> device which behaves like a closed pipe, for example compare these two commands:

<syntaxhighlight lang="shell">
cat < /dev/null

sleep 10 < /dev/null
</syntaxhighlight><code>cat</code> exits immediately, but <code>sleep</code> does its thing as usual.

You could do the same experiment by opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file. Interestingly, what happened here is that <control>-d is interpreted by bash which responds by closing its pipe connected to standard input of the child process. This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe. My entry point to learning more is this stty webzine<ref>[https://wizardzines.com/comics/stty/ ★ wizard zines ★: stty]</ref> by Julia Evans. Dump information about your own terminal emulator: <code>stty -a</code>

Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a hard reality. A program with a chaotic disposition could even reopen stdin after it was closed and connect it to something else, to the great surprise of friends and neighbors.

Back to the problem at hand, "rsync" is in the category of "daemon-like" programs which will carry on even after standard input is closed. This makes sense enough, since rsync isn't interactive and any output is just a side effect of its main purpose.

== Shimming can kill ==
A small shim can adapt a daemon-like program to behave more like a pipeline. The shim is sensitive to stdin closing or SIGHUP, and when this is detected it converts this into a stronger signal like SIGTERM which it forwards to its own child. This is the idea behind a suggested shell script<ref>[https://hexdocs.pm/elixir/1.19.0/Port.html#module-orphan-operating-system-processes Elixir Port docs showing a shim script]</ref> for Elixir, and the <code>erlexec</code><ref name=":0">[https://hexdocs.pm/erlexec/readme.html <code>erlexec</code> library]</ref> library. The opposite adapter can be found in the [[w:nohup|nohup]] shell command and the grimsby<ref>[https://github.com/shortishly/grimsby <code>grimsby</code> library]</ref> library: these will keep standard in and/or standard out open for the child process even after the parent exits, so that a pipe-like program can behave more like a daemon.

I used the shim approach in my rsync library and it includes a small C program<ref>[https://gitlab.com/adamwight/rsync_ex/-/blob/main/src/main.c?ref_type=heads rsync_ex C shim program]</ref> which wraps rsync and makes it sensitive to BEAM <code>port_close</code>. It's featherweight, leaving pipes unchanged as it passes control to rsync, here are the business parts:<syntaxhighlight lang="c">// Set up a fail-safe to self-signal with HUP if the controlling process dies.
prctl(PR_SET_PDEATHSIG, SIGHUP);</syntaxhighlight><syntaxhighlight lang="c">
void handle_signal(int signum) {
if (signum == SIGHUP && child_pid > 0) {
// Send the child TERM so that rsync can perform clean-up such as shutting down a remote server.
kill(child_pid, SIGTERM);
}
}
</syntaxhighlight>

== Reliable clean up ==
{{Project|status=in review|url=https://erlangforums.com/t/open-port-and-zombie-processes|source=https://github.com/erlang/otp/pull/9453}}
It's always a pleasure to ask questions in the BEAM communities, they deserve their reputation for being friendly and open. The first big tip was to look at the third-party library <code>erlexec</code><ref name=":0" />, which demonstrates emerging best practices which could be backported into the language itself. Everyone speaking on the problem generally agrees that the fragile clean up of external processes is a bug, and supports the idea that some flavor of "terminate" signal should be sent to spawned programs when the port is closed.
[[File:Itinerant glassworker exhibition with spinning wheel and steam engine.jpg|thumb]]
I would be lying to hide my disappointment that the required core changes are mostly in an auxiliary C program and not written in Erlang or even in the BEAM itself, but it was still fascinating to open such an elegant black box and find the technological equivalent of a steam engine inside. All of the futuristic, high-level features we've come to know actually map closely to a few scraps of wizardry with ordinary pipes<ref>[https://man.archlinux.org/man/pipe.7.en Overview of unix pipes]</ref>, using libc's pipe<ref>[https://man.archlinux.org/man/pipe.2.en Docs for the <code>pipe</code> syscall]</ref>, read, write, and select<ref>[https://man.archlinux.org/man/select.2.en libc <code>select</code> docs]</ref>.

Port drivers<ref>[https://www.erlang.org/doc/system/ports.html Erlang ports docs]</ref> are fundamental to ERTS, and several levels of port wiring are involved in launching external processes: the spawn driver starts a forker driver which sends a control message to <code>erl_child_setup</code> to execute your external command. Each BEAM has a single erl_child_setup process to watch over all children. This architecture reflects the Supervisor paradigm and we can leverage it to produce some of the same properties: the subprocess can buffer reads and writes asynchronously and handle them sequentially; and if the BEAM crashes then erl_child_setup can detect the condition and do its own cleanup.

Letting a child process outlive its controlling process leaves the child in a state called "orphaned" in POSIX, and the standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists. This can be seen as undesirable because unix itself has a paradigm similar to OTP's Supervisors, in which each parent is responsible for its children. Without supervision, a process could potentially run forever or do naughty things. The system <code>init</code> process starts and tracks its own children, and can restart them in response to service commands. But init will know nothing about adopted, orphan processes or how to monitor and restart them.

The patch [https://github.com/erlang/otp/pull/9453 PR#9453] adapting port_close to SIGTERM is waiting for review and responses look generally positive so far.

{{Aside|text='''Which signal?'''

Which signal to use is still an open question:

; <code>HUP</code> : sent to a process when its standard input stream is closed<ref>[https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap11.html#tag_11_01_10 POSIX standard "General Terminal Interface: Modem Disconnect"</ref>

; <code>TERM</code> : has a clear intention of "kill this thing" but still possible to trap at the target and handle in a customized way

; <code>KILL</code> : bursting with destructive potential, this signal cannot be stopped and you may not clean up

There is a refreshing diversity of opinion, so it could be worthwhile to make the signal configurable for each port.
}}

== Future directions ==
Discussion threads also included some notable grumbling about the Port API in general, it seems this part of ERTS is overdue for a larger redesign.

There's a good opportunity to unify the different platform implementations: Windows lacks the erl_child_setup layer entirely, for example.

Another idea to borrow from the erlexec library is to have an option to kill the entire process group of a child, which is shared by any descendants that haven't explicitly broken out of its original group. This would be useful for managing deep trees of external processes launched by a forked command.

== References ==

Elixir/Ports and external process wiring

2025-10-23T15:20:00Z

Adamw: /* Problem: runaway processes */ Correct some bad information about port os_pid. Special thank you to akash-akya's post https://elixirforum.com/t/any-interest-in-a-library-that-wraps-rsync/69297/10

A deceivingly simple programming adventure veers unexpectedly into piping and signaling between unix processes.

== Context: controlling "rsync" ==
{{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}}

My exploration begins while writing a beta-quality library for Elixir to transfer files in the background and monitor progress using rsync.

I was excited to learn how to interface with long-lived external processes—and this project offered more than I hoped for.

{{Aside|text=<p>[[w:rsync|Rsync]] is the standard utility for file transfers, locally or over a network. It can resume incomplete transfers and synchronize directories efficiently, and after almost 30 years of usage rsync can be trusted to handle any edge case.</p>
<p>BEAM<ref>The virtual machine shared by Erlang, Elixir, Gleam, Ash, and so on: [https://blog.stenmans.org/theBeamBook/ the BEAM Book]</ref> is a fairly unique ecosystem in which it's not considered deviant to reinvent a rounder wheel: an external dependency like "cron" will often be ported into native Erlang—but the complexity of rsync and its dependence on a matching remote daemon makes it unlikely that it will be rewritten any time soon, which is why I've decided to wrap external command execution in a library.</p>}}

[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|300x300px]]

=== Naïve shelling ===

Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
System.shell("rsync -a source target")
</syntaxhighlight>
This has a few shortcomings, starting with how one would pass it dynamic paths. It's unsafe to use string interpolation (<code>"#{source}"</code> ): consider what could happen if the filenames include unescaped whitespace or special shell characters such as ";".

=== Safe path handling ===
We turn next to <code>System.cmd</code>, which takes a raw argv and can't be fooled special characters in the path arguments:<syntaxhighlight lang="elixir">
System.find_executable(rsync_path)
|> System.cmd([~w(-a), source, target])
</syntaxhighlight>For a short job this is perfect, but for longer transfers our program loses control and observability, waiting indefinitely for a monolithic command to return.

=== Asynchronous call and communication ===
To run a external process asynchronously we reach for Elixir's low-level <code>Port.open</code>, nothing but a one-line wrapper<ref>See the [https://github.com/elixir-lang/elixir/blob/809b035dccf046b7b7b4422f42cfb6d075df71d2/lib/elixir/lib/port.ex#L232 port.ex source code]</ref> passing its parameters directly to ERTS <code>open_port</code><ref>[https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2 Erlang <code>open_port</code> docs]</ref>. This function is tremendously flexible, here we turn a few knobs:<syntaxhighlight lang="elixir">
Port.open(
{:spawn_executable, rsync_path},
[
:binary,
:exit_status,
:hide,
:use_stdio,
:stderr_to_stdout,
args:
~w(-a --info=progress2) ++
rsync_args ++
sources ++
[args[:target]],
env: env
]
)
</syntaxhighlight>

{{Aside|text=
'''Rsync progress reporting options'''

There are a variety of ways to report progress:

; <code>-v</code> : list each filename as it's transferred

; <code>--progress</code> : report statistics per file

; <code>--info=progress2</code> : report overall progress

; <code>--itemize-changes</code> : list the operations taken on each file

; <code>--out-format=FORMAT</code> : any custom format string following rsyncd.conf's <code>log format</code><ref>https://man.freebsd.org/cgi/man.cgi?query=rsyncd.conf</ref>
}}

Rsync outputs <code>--info=progress2</code> lines like so:<syntaxhighlight lang="text">
overall percent complete time remaining
bytes transferred | transfer speed |
| | | |
3,342,336 33% 3.14MB/s 0:00:02
</syntaxhighlight>

The controlling Port captures these lines is sent to the library's <code>handle_info</code> callback as <code>{:data, line}</code>. After the transfer is finished we receive a conclusive <code>{:exit_status, status_code}</code> message.

As a first step, we extract the overall_percent_done column and flag any unrecognized output:
<syntaxhighlight lang="elixir">
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
percent_done_text when percent_done_text != nil <- Enum.at(terms, 1),
{percent_done, "%"} <- Float.parse(percent_done_text) do
percent_done
else
_ ->
{:unknown, line}
end
</syntaxhighlight>The <code>trim</code> is lifting more than its weight here: it lets us completely ignore spacing and newline trickery—and ignores the leading carriage return before each line, seen in the rsync source code:<ref>[https://github.com/RsyncProject/rsync/blob/797e17fc4a6f15e3b1756538a9f812b63942686f/progress.c#L129 rsync/progress.c] source code</ref>
<syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>Carriage return <code>\r</code> deserves special mention: this is the first "control" character we come across and it looks the same as an ordinary byte in the binary data coming over the pipe from rsync, similar to newline <code>\n</code>. Its normal role is to control the terminal emulator, rewinding the cursor so that the current line can be overwritten! And like newline, carriage return can be ignored. Control signaling is exactly what goes haywire about this project, and the leaky category distinction between data and control seems to be a repeated theme in inter-process communication. The reality is not so much data vs. control, as it seems to be a sequence of layers like with [[w:OSI model|networking]].

{{Aside|text=
[[File:Chinese typewriter 03.jpg|right|200x200px]]

On the terminal, rsync progress lines are updated in place by beginning each line with a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>. Try this command in a terminal:<syntaxhighlight lang="shell">
# echo "three^Mtwo"
twoee
</syntaxhighlight>
You'll have to use <control>-v <control>-m to type a literal carriage return, copy-and-paste won't work.

The character is named after the pushing of a physical typewriter carriage to return to the beginning of the current line without feeding the roller to a new line.

[[File:Baboons Playing in Chobe National Park-crlf.jpg|left|300x300px|Three young baboons playing on a rock ledge. Two are on the ridge and one below, grabbing the tail of another. A meme font shows "\r", "\n", and "\r\n" personified as each baboon.]]
[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreement about carriage return]] vs. line feed has caused eye-rolling since the dawn of personal computing.
}}

== OTP generic server ==
The Port API is convenient enough so far, but Erlang/OTP really starts to shine once we wrap each Port connection under a <code>gen_server</code><ref>[https://www.erlang.org/doc/apps/stdlib/gen_server.html Erlang gen_server docs]</ref> module, giving us several properties for free: A dedicated application thread coordinates with its rsync process independent of anything else. Input and output are asynchronous and buffered, but handled sequentially in a thread-safe way. The gen_server holds internal state including the up-to-date completion percentage. And the caller can request updates as needed, or it can listen for push messages with the parsed statistics.

This gen_server is also expected to run safely under an OTP supervision tree<ref>[https://adoptingerlang.org/docs/development/supervision_trees/ "Supervision Trees"] chapter from [https://adoptingerlang.org/ Adopting Erlang]</ref> but this is where our dream falls apart for the moment. The Port already watches for rsync completion or failure and reports upwards to its caller, but we fail at the critical property of being able to propagate a termination downwards to shut down rsync if the calling code or our library module crashes.

== Problem: runaway processes ==
[[File:CargoNet Di 12 Euro 4000 Lønsdal - Bolna.jpg|thumb]]
The unpleasant real-world consequence is that rsync transfers will continue to run in the background even after Elixir kills our gen_server or shuts down, because the BEAM has no way of stopping the external process.

It's possible to find the operating system PID of the child process with <code>Port.info(port, :os_pid)</code> and send it a signal by shelling out to unix <code>kill PID</code>, but BEAM doesn't include built-in functions to send a signal to an OS process, and there is an ugly race condition between closing the port and sending this signal. We'll keep looking for another way to "link" the processes.

To debug what happens during <code>port_close</code> and to eliminate variables, I tried spawning <code>sleep 60</code> instead of rsync and I found that it behaves in exactly the same way: hanging until <code>sleep</code> ends naturally regardless of what happened in Elixir or whether its pipes are still open. This happens to have been a lucky choice as I learned later: "sleep" is daemon-like so similar to rsync, but its behavior is much simpler to reason about.

== Bad assumption: pipe-like processes ==
A pipeline like <code>gzip</code> or <code>cat</code> it built to read from its input and write to its output. We can roughly group the different styles of command-line application into "pipeline" programs which read and write, "interactive" programs which require user input, and "daemon" programs which are designed to run in the background. Some programs support multiple modes depending on the arguments given at launch, or by detecting the terminal using <code>isatty</code><ref>[https://man.archlinux.org/man/isatty.3.en docs for <code>isatty</code>]</ref>. The BEAM is currently optimized to interface with pipeline programs and it assumes that the external process will stop when its "standard input" is closed.

A typical pipeline program will stop once it detects that input has ended, for example by calling <code>read</code><ref>[https://man.archlinux.org/man/read.2 libc <code>read</code> docs]</ref> in a loop:<syntaxhighlight lang="c">
size_read = read (input_desc, buf, bufsize);
if (size_read < 0) { error... }
if (size_read == 0) { end of file... }
</syntaxhighlight>

If the program does blocking I/O, then a zero-byte <code>read</code> indicates the end of file condition. A program which does asynchronous I/O with <code>O_NONBLOCK</code><ref>[https://man.archlinux.org/man/open.2.en#O_NONBLOCK O_NONBLOCK docs]</ref> might instead detect EOF by listening for the <code>HUP</code> hang-up signal which is normally sent when input is closed.

But here we'll focus on how processes can more generally affect each other through pipes. Surprising answer: without much effect! You can experiment with the <code>/dev/null</code> device which behaves like a closed pipe, for example compare these two commands:

<syntaxhighlight lang="shell">
cat < /dev/null

sleep 10 < /dev/null
</syntaxhighlight><code>cat</code> exits immediately, but <code>sleep</code> does its thing as usual.

You could do the same experiment by opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file. Interestingly, what happened here is that <control>-d is interpreted by bash which responds by closing its pipe connected to standard input of the child process. This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe. My entry point to learning more is this stty webzine<ref>[https://wizardzines.com/comics/stty/ ★ wizard zines ★: stty]</ref> by Julia Evans. Dump information about your own terminal emulator: <code>stty -a</code>

Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a hard reality. A program with a chaotic disposition could even reopen stdin after it was closed and connect it to something else, to the great surprise of friends and neighbors.

Back to the problem at hand, "rsync" is in the category of "daemon-like" programs which will carry on even after standard input is closed. This makes sense enough, since rsync isn't interactive and any output is just a side effect of its main purpose.

== Shimming can kill ==
A small shim can adapt a daemon-like program to behave more like a pipeline. The shim is sensitive to stdin closing or SIGHUP, and when this is detected it converts this into a stronger signal like SIGTERM which it forwards to its own child. This is the idea behind a suggested shell script<ref>[https://hexdocs.pm/elixir/1.19.0/Port.html#module-orphan-operating-system-processes Elixir Port docs showing a shim script]</ref> for Elixir, and the <code>erlexec</code><ref name=":0">[https://hexdocs.pm/erlexec/readme.html <code>erlexec</code> library]</ref> library. The opposite adapter can be found in the [[w:nohup|nohup]] shell command and the grimsby<ref>[https://github.com/shortishly/grimsby <code>grimsby</code> library]</ref> library: these will keep standard in and/or standard out open for the child process even after the parent exits, so that a pipe-like program can behave more like a daemon.

I used the shim approach in my rsync library and it includes a small C program<ref>[https://gitlab.com/adamwight/rsync_ex/-/blob/main/src/main.c?ref_type=heads rsync_ex C shim program]</ref> which wraps rsync and makes it sensitive to BEAM <code>port_close</code>. It's featherweight, leaving pipes unchanged as it passes control to rsync, here are the business parts:<syntaxhighlight lang="c">// Set up a fail-safe to self-signal with HUP if the controlling process dies.
prctl(PR_SET_PDEATHSIG, SIGHUP);</syntaxhighlight><syntaxhighlight lang="c">
void handle_signal(int signum) {
if (signum == SIGHUP && child_pid > 0) {
// Send the child TERM so that rsync can perform clean-up such as shutting down a remote server.
kill(child_pid, SIGTERM);
}
}
</syntaxhighlight>

== Reliable clean up ==
{{Project|status=in review|url=https://erlangforums.com/t/open-port-and-zombie-processes|source=https://github.com/erlang/otp/pull/9453}}
It's always a pleasure to ask questions in the BEAM communities, they deserve their reputation for being friendly and open. The first big tip was to look at the third-party library <code>erlexec</code><ref name=":0" />, which demonstrates emerging best practices which could be backported into the language itself. Everyone speaking on the problem generally agrees that the fragile clean up of external processes is a bug, and supports the idea that some flavor of "terminate" signal should be sent to spawned programs when the port is closed.
[[File:Itinerant glassworker exhibition with spinning wheel and steam engine.jpg|thumb]]
I would be lying to hide my disappointment that the required core changes are mostly in an auxiliary C program and not written in Erlang or even in the BEAM itself, but it was still fascinating to open such an elegant black box and find the technological equivalent of a steam engine inside. All of the futuristic, high-level features we've come to know actually map closely to a few scraps of wizardry with ordinary pipes<ref>[https://man.archlinux.org/man/pipe.7.en Overview of unix pipes]</ref>, using libc's pipe<ref>[https://man.archlinux.org/man/pipe.2.en Docs for the <code>pipe</code> syscall]</ref>, read, write, and select<ref>[https://man.archlinux.org/man/select.2.en libc <code>select</code> docs]</ref>.

Port drivers<ref>[https://www.erlang.org/doc/system/ports.html Erlang ports docs]</ref> are fundamental to ERTS, and several levels of port wiring are involved in launching external processes: the spawn driver starts a forker driver which sends a control message to <code>erl_child_setup</code> to execute your external command. Each BEAM has a single erl_child_setup process to watch over all children. This architecture reflects the Supervisor paradigm and we can leverage it to produce some of the same properties: the subprocess can buffer reads and writes asynchronously and handle them sequentially; and if the BEAM crashes then erl_child_setup can detect the condition and do its own cleanup.

Letting a child process outlive its controlling process leaves the child in a state called "orphaned" in POSIX, and the standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists. This can be seen as undesirable because unix itself has a paradigm similar to OTP's Supervisors, in which each parent is responsible for its children. Without supervision, a process could potentially run forever or do naughty things. The system <code>init</code> process starts and tracks its own children, and can restart them in response to service commands. But init will know nothing about adopted, orphan processes or how to monitor and restart them.

The patch [https://github.com/erlang/otp/pull/9453 PR#9453] adapting port_close to SIGTERM is waiting for review and responses look generally positive so far.

{{Aside|text='''Which signal?'''

Which signal to use is still an open question:

; <code>HUP</code> : sent to a process when its standard input stream is closed<ref>[https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap11.html#tag_11_01_10 POSIX standard "General Terminal Interface: Modem Disconnect"</ref>

; <code>TERM</code> : has a clear intention of "kill this thing" but still possible to trap at the target and handle in a customized way

; <code>KILL</code> : bursting with destructive potential, this signal cannot be stopped and you may not clean up

There is a refreshing diversity of opinion, so it could be worthwhile to make the signal configurable for each port.
}}

== Future directions ==
Discussion threads also included some notable grumbling about the Port API in general, it seems this part of ERTS is overdue for a larger redesign.

There's a good opportunity to unify the different platform implementations: Windows lacks the erl_child_setup layer entirely, for example.

Another idea to borrow from the erlexec library is to have an option to kill the entire process group of a child, which is shared by any descendants that haven't explicitly broken out of its original group. This would be useful for managing deep trees of external processes launched by a forked command.

== References ==

Elixir/Ports and external process wiring

2025-10-21T15:49:33Z

Adamw: light edits

A deceivingly simple programming adventure veers unexpectedly into piping and signaling between unix processes.

== Context: controlling "rsync" ==
{{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}}

My exploration begins while writing a beta-quality library for Elixir to transfer files in the background and monitor progress using rsync.

I was excited to learn how to interface with long-lived external processes—and this project offered more than I hoped for.

{{Aside|text=<p>[[w:rsync|Rsync]] is the standard utility for file transfers, locally or over a network. It can resume incomplete transfers and synchronize directories efficiently, and after almost 30 years of usage rsync can be trusted to handle any edge case.</p>
<p>BEAM<ref>The virtual machine shared by Erlang, Elixir, Gleam, Ash, and so on: [https://blog.stenmans.org/theBeamBook/ the BEAM Book]</ref> is a fairly unique ecosystem in which it's not considered deviant to reinvent a rounder wheel: an external dependency like "cron" will often be ported into native Erlang—but the complexity of rsync and its dependence on a matching remote daemon makes it unlikely that it will be rewritten any time soon, which is why I've decided to wrap external command execution in a library.</p>}}

[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|300x300px]]

=== Naïve shelling ===

Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
System.shell("rsync -a source target")
</syntaxhighlight>
This has a few shortcomings, starting with how one would pass it dynamic paths. It's unsafe to use string interpolation (<code>"#{source}"</code> ): consider what could happen if the filenames include unescaped whitespace or special shell characters such as ";".

=== Safe path handling ===
We turn next to <code>System.cmd</code>, which takes a raw argv and can't be fooled special characters in the path arguments:<syntaxhighlight lang="elixir">
System.find_executable(rsync_path)
|> System.cmd([~w(-a), source, target])
</syntaxhighlight>For a short job this is perfect, but for longer transfers our program loses control and observability, waiting indefinitely for a monolithic command to return.

=== Asynchronous call and communication ===
To run a external process asynchronously we reach for Elixir's low-level <code>Port.open</code>, nothing but a one-line wrapper<ref>See the [https://github.com/elixir-lang/elixir/blob/809b035dccf046b7b7b4422f42cfb6d075df71d2/lib/elixir/lib/port.ex#L232 port.ex source code]</ref> passing its parameters directly to ERTS <code>open_port</code><ref>[https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2 Erlang <code>open_port</code> docs]</ref>. This function is tremendously flexible, here we turn a few knobs:<syntaxhighlight lang="elixir">
Port.open(
{:spawn_executable, rsync_path},
[
:binary,
:exit_status,
:hide,
:use_stdio,
:stderr_to_stdout,
args:
~w(-a --info=progress2) ++
rsync_args ++
sources ++
[args[:target]],
env: env
]
)
</syntaxhighlight>

{{Aside|text=
'''Rsync progress reporting options'''

There are a variety of ways to report progress:

; <code>-v</code> : list each filename as it's transferred

; <code>--progress</code> : report statistics per file

; <code>--info=progress2</code> : report overall progress

; <code>--itemize-changes</code> : list the operations taken on each file

; <code>--out-format=FORMAT</code> : any custom format string following rsyncd.conf's <code>log format</code><ref>https://man.freebsd.org/cgi/man.cgi?query=rsyncd.conf</ref>
}}

Rsync outputs <code>--info=progress2</code> lines like so:<syntaxhighlight lang="text">
overall percent complete time remaining
bytes transferred | transfer speed |
| | | |
3,342,336 33% 3.14MB/s 0:00:02
</syntaxhighlight>

The controlling Port captures these lines is sent to the library's <code>handle_info</code> callback as <code>{:data, line}</code>. After the transfer is finished we receive a conclusive <code>{:exit_status, status_code}</code> message.

As a first step, we extract the overall_percent_done column and flag any unrecognized output:
<syntaxhighlight lang="elixir">
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
percent_done_text when percent_done_text != nil <- Enum.at(terms, 1),
{percent_done, "%"} <- Float.parse(percent_done_text) do
percent_done
else
_ ->
{:unknown, line}
end
</syntaxhighlight>The <code>trim</code> is lifting more than its weight here: it lets us completely ignore spacing and newline trickery—and ignores the leading carriage return before each line, seen in the rsync source code:<ref>[https://github.com/RsyncProject/rsync/blob/797e17fc4a6f15e3b1756538a9f812b63942686f/progress.c#L129 rsync/progress.c] source code</ref>
<syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>Carriage return <code>\r</code> deserves special mention: this is the first "control" character we come across and it looks the same as an ordinary byte in the binary data coming over the pipe from rsync, similar to newline <code>\n</code>. Its normal role is to control the terminal emulator, rewinding the cursor so that the current line can be overwritten! And like newline, carriage return can be ignored. Control signaling is exactly what goes haywire about this project, and the leaky category distinction between data and control seems to be a repeated theme in inter-process communication. The reality is not so much data vs. control, as it seems to be a sequence of layers like with [[w:OSI model|networking]].

{{Aside|text=
[[File:Chinese typewriter 03.jpg|right|200x200px]]

On the terminal, rsync progress lines are updated in place by beginning each line with a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>. Try this command in a terminal:<syntaxhighlight lang="shell">
# echo "three^Mtwo"
twoee
</syntaxhighlight>
You'll have to use <control>-v <control>-m to type a literal carriage return, copy-and-paste won't work.

The character is named after the pushing of a physical typewriter carriage to return to the beginning of the current line without feeding the roller to a new line.

[[File:Baboons Playing in Chobe National Park-crlf.jpg|left|300x300px|Three young baboons playing on a rock ledge. Two are on the ridge and one below, grabbing the tail of another. A meme font shows "\r", "\n", and "\r\n" personified as each baboon.]]
[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreement about carriage return]] vs. line feed has caused eye-rolling since the dawn of personal computing.
}}

== OTP generic server ==
The Port API is convenient enough so far, but Erlang/OTP really starts to shine once we wrap each Port connection under a <code>gen_server</code><ref>[https://www.erlang.org/doc/apps/stdlib/gen_server.html Erlang gen_server docs]</ref> module, giving us several properties for free: A dedicated application thread coordinates with its rsync process independent of anything else. Input and output are asynchronous and buffered, but handled sequentially in a thread-safe way. The gen_server holds internal state including the up-to-date completion percentage. And the caller can request updates as needed, or it can listen for push messages with the parsed statistics.

This gen_server is also expected to run safely under an OTP supervision tree<ref>[https://adoptingerlang.org/docs/development/supervision_trees/ "Supervision Trees"] chapter from [https://adoptingerlang.org/ Adopting Erlang]</ref> but this is where our dream falls apart for the moment. The Port already watches for rsync completion or failure and reports upwards to its caller, but we fail at the critical property of being able to propagate a termination downwards to shut down rsync if the calling code or our library module crashes.

== Problem: runaway processes ==
[[File:CargoNet Di 12 Euro 4000 Lønsdal - Bolna.jpg|thumb]]
The unpleasant real-world consequence is that rsync transfers will continue to run in the background even after Elixir kills our gen_server or shuts down, because the BEAM has no way of stopping the external process.

It's possible to send a signal by shelling out to unix <code>kill PID</code>, but BEAM doesn't expose the child process ID and doesn't include any built-in functions to send a signal to an OS process. Clearly we're expected to do this another way. Another problem with "kill" is that we want the external process to stop no matter how badly the BEAM is damaged, so we shouldn't rely on stored data or on running final clean-up logic before exiting.

To debug what happens during <code>port_close</code> and to eliminate variables, I tried spawning <code>sleep 60</code> instead of rsync and I found that it behaves in exactly the same way: hanging until <code>sleep</code> ends naturally regardless of what happened in Elixir or whether its pipes are still open. This happens to have been a lucky choice as I learned later: "sleep" is daemon-like so similar to rsync, but its behavior is much simpler to reason about.

== Bad assumption: pipe-like processes ==
A pipeline like <code>gzip</code> or <code>cat</code> it built to read from its input and write to its output. We can roughly group the different styles of command-line application into "pipeline" programs which read and write, "interactive" programs which require user input, and "daemon" programs which are designed to run in the background. Some programs support multiple modes depending on the arguments given at launch, or by detecting the terminal using <code>isatty</code><ref>[https://man.archlinux.org/man/isatty.3.en docs for <code>isatty</code>]</ref>. The BEAM is currently optimized to interface with pipeline programs and it assumes that the external process will stop when its "standard input" is closed.

A typical pipeline program will stop once it detects that input has ended, for example by calling <code>read</code><ref>[https://man.archlinux.org/man/read.2 libc <code>read</code> docs]</ref> in a loop:<syntaxhighlight lang="c">
size_read = read (input_desc, buf, bufsize);
if (size_read < 0) { error... }
if (size_read == 0) { end of file... }
</syntaxhighlight>

If the program does blocking I/O, then a zero-byte <code>read</code> indicates the end of file condition. A program which does asynchronous I/O with <code>O_NONBLOCK</code><ref>[https://man.archlinux.org/man/open.2.en#O_NONBLOCK O_NONBLOCK docs]</ref> might instead detect EOF by listening for the <code>HUP</code> hang-up signal which is normally sent when input is closed.

But here we'll focus on how processes can more generally affect each other through pipes. Surprising answer: without much effect! You can experiment with the <code>/dev/null</code> device which behaves like a closed pipe, for example compare these two commands:

<syntaxhighlight lang="shell">
cat < /dev/null

sleep 10 < /dev/null
</syntaxhighlight><code>cat</code> exits immediately, but <code>sleep</code> does its thing as usual.

You could do the same experiment by opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file. Interestingly, what happened here is that <control>-d is interpreted by bash which responds by closing its pipe connected to standard input of the child process. This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe. My entry point to learning more is this stty webzine<ref>[https://wizardzines.com/comics/stty/ ★ wizard zines ★: stty]</ref> by Julia Evans. Dump information about your own terminal emulator: <code>stty -a</code>

Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a hard reality. A program with a chaotic disposition could even reopen stdin after it was closed and connect it to something else, to the great surprise of friends and neighbors.

Back to the problem at hand, "rsync" is in the category of "daemon-like" programs which will carry on even after standard input is closed. This makes sense enough, since rsync isn't interactive and any output is just a side effect of its main purpose.

== Shimming can kill ==
A small shim can adapt a daemon-like program to behave more like a pipeline. The shim is sensitive to stdin closing or SIGHUP, and when this is detected it converts this into a stronger signal like SIGTERM which it forwards to its own child. This is the idea behind a suggested shell script<ref>[https://hexdocs.pm/elixir/1.19.0/Port.html#module-orphan-operating-system-processes Elixir Port docs showing a shim script]</ref> for Elixir, and the <code>erlexec</code><ref name=":0">[https://hexdocs.pm/erlexec/readme.html <code>erlexec</code> library]</ref> library. The opposite adapter can be found in the [[w:nohup|nohup]] shell command and the grimsby<ref>[https://github.com/shortishly/grimsby <code>grimsby</code> library]</ref> library: these will keep standard in and/or standard out open for the child process even after the parent exits, so that a pipe-like program can behave more like a daemon.

I used the shim approach in my rsync library and it includes a small C program<ref>[https://gitlab.com/adamwight/rsync_ex/-/blob/main/src/main.c?ref_type=heads rsync_ex C shim program]</ref> which wraps rsync and makes it sensitive to BEAM <code>port_close</code>. It's featherweight, leaving pipes unchanged as it passes control to rsync, here are the business parts:<syntaxhighlight lang="c">// Set up a fail-safe to self-signal with HUP if the controlling process dies.
prctl(PR_SET_PDEATHSIG, SIGHUP);</syntaxhighlight><syntaxhighlight lang="c">
void handle_signal(int signum) {
if (signum == SIGHUP && child_pid > 0) {
// Send the child TERM so that rsync can perform clean-up such as shutting down a remote server.
kill(child_pid, SIGTERM);
}
}
</syntaxhighlight>

== Reliable clean up ==
{{Project|status=in review|url=https://erlangforums.com/t/open-port-and-zombie-processes|source=https://github.com/erlang/otp/pull/9453}}
It's always a pleasure to ask questions in the BEAM communities, they deserve their reputation for being friendly and open. The first big tip was to look at the third-party library <code>erlexec</code><ref name=":0" />, which demonstrates emerging best practices which could be backported into the language itself. Everyone speaking on the problem generally agrees that the fragile clean up of external processes is a bug, and supports the idea that some flavor of "terminate" signal should be sent to spawned programs when the port is closed.
[[File:Itinerant glassworker exhibition with spinning wheel and steam engine.jpg|thumb]]
I would be lying to hide my disappointment that the required core changes are mostly in an auxiliary C program and not written in Erlang or even in the BEAM itself, but it was still fascinating to open such an elegant black box and find the technological equivalent of a steam engine inside. All of the futuristic, high-level features we've come to know actually map closely to a few scraps of wizardry with ordinary pipes<ref>[https://man.archlinux.org/man/pipe.7.en Overview of unix pipes]</ref>, using libc's pipe<ref>[https://man.archlinux.org/man/pipe.2.en Docs for the <code>pipe</code> syscall]</ref>, read, write, and select<ref>[https://man.archlinux.org/man/select.2.en libc <code>select</code> docs]</ref>.

Port drivers<ref>[https://www.erlang.org/doc/system/ports.html Erlang ports docs]</ref> are fundamental to ERTS, and several levels of port wiring are involved in launching external processes: the spawn driver starts a forker driver which sends a control message to <code>erl_child_setup</code> to execute your external command. Each BEAM has a single erl_child_setup process to watch over all children. This architecture reflects the Supervisor paradigm and we can leverage it to produce some of the same properties: the subprocess can buffer reads and writes asynchronously and handle them sequentially; and if the BEAM crashes then erl_child_setup can detect the condition and do its own cleanup.

Letting a child process outlive its controlling process leaves the child in a state called "orphaned" in POSIX, and the standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists. This can be seen as undesirable because unix itself has a paradigm similar to OTP's Supervisors, in which each parent is responsible for its children. Without supervision, a process could potentially run forever or do naughty things. The system <code>init</code> process starts and tracks its own children, and can restart them in response to service commands. But init will know nothing about adopted, orphan processes or how to monitor and restart them.

The patch [https://github.com/erlang/otp/pull/9453 PR#9453] adapting port_close to SIGTERM is waiting for review and responses look generally positive so far.

{{Aside|text='''Which signal?'''

Which signal to use is still an open question:

; <code>HUP</code> : sent to a process when its standard input stream is closed<ref>[https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap11.html#tag_11_01_10 POSIX standard "General Terminal Interface: Modem Disconnect"</ref>

; <code>TERM</code> : has a clear intention of "kill this thing" but still possible to trap at the target and handle in a customized way

; <code>KILL</code> : bursting with destructive potential, this signal cannot be stopped and you may not clean up

There is a refreshing diversity of opinion, so it could be worthwhile to make the signal configurable for each port.
}}

== Future directions ==
Discussion threads also included some notable grumbling about the Port API in general, it seems this part of ERTS is overdue for a larger redesign.

There's a good opportunity to unify the different platform implementations: Windows lacks the erl_child_setup layer entirely, for example.

Another idea to borrow from the erlexec library is to have an option to kill the entire process group of a child, which is shared by any descendants that haven't explicitly broken out of its original group. This would be useful for managing deep trees of external processes launched by a forked command.

== References ==

Elixir/Ports and external process wiring

2025-10-21T13:51:49Z

Adamw: memed.

A deceivingly simple programming adventure veers unexpectedly into piping and signaling between unix processes.

== Context: controlling "rsync" ==
{{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}}

My exploration begins while writing a beta-quality library for Elixir to transfer files in the background and monitor progress using rsync.

I was excited to learn how to interface with long-lived external processes—and this project offered more than I hoped for.

{{Aside|text=<p>[[w:rsync|Rsync]] is the standard utility for file transfers, locally or over a network. It can resume incomplete transfers and synchronize directories efficiently, and after almost 30 years of usage rsync can be trusted to handle any edge case.</p>
<p>BEAM<ref>The virtual machine shared by Erlang, Elixir, Gleam, Ash, and so on: [https://blog.stenmans.org/theBeamBook/ the BEAM Book]</ref> is a fairly unique ecosystem in which it's not considered deviant to reinvent a rounder wheel: an external dependency like "cron" will often be ported into native Erlang—but the complexity of rsync and its dependence on a matching remote daemon makes it unlikely that it will be rewritten any time soon, which is why I've decided to wrap external command execution in a library.</p>}}

[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|300x300px]]

=== Naïve shelling ===

Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
System.shell("rsync -a source target")
</syntaxhighlight>
This has a few shortcomings, starting with how one would pass it dynamic paths. It's unsafe to use string interpolation (<code>"#{source}"</code> ): consider what could happen if the filenames include unescaped whitespace or special shell characters such as ";".

=== Safe path handling ===
We turn next to <code>System.cmd</code>, which takes a raw argv and can't be fooled special characters in the path arguments:<syntaxhighlight lang="elixir">
System.find_executable(rsync_path)
|> System.cmd([~w(-a), source, target])
</syntaxhighlight>For a short job this is perfect, but for longer transfers our program loses control and observability, waiting indefinitely for a monolithic command to return.

=== Asynchronous call and communication ===
To run a external process asynchronously we reach for Elixir's low-level <code>Port.open</code>, nothing but a one-line wrapper<ref>See the [https://github.com/elixir-lang/elixir/blob/809b035dccf046b7b7b4422f42cfb6d075df71d2/lib/elixir/lib/port.ex#L232 port.ex source code]</ref> passing its parameters directly to ERTS <code>open_port</code><ref>[https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2 Erlang <code>open_port</code> docs]</ref>. This function is tremendously flexible, here we turn a few knobs:<syntaxhighlight lang="elixir">
Port.open(
{:spawn_executable, rsync_path},
[
:binary,
:exit_status,
:hide,
:use_stdio,
:stderr_to_stdout,
args:
~w(-a --info=progress2) ++
rsync_args ++
sources ++
[args[:target]],
env: env
]
)
</syntaxhighlight>

{{Aside|text=
'''Rsync progress reporting options'''

There are a variety of ways to report progress:

; <code>-v</code> : list each filename as it's transferred

; <code>--progress</code> : report statistics per file

; <code>--info=progress2</code> : report overall progress

; <code>--itemize-changes</code> : list the operations taken on each file

; <code>--out-format=FORMAT</code> : any custom format string following rsyncd.conf's <code>log format</code><ref>https://man.freebsd.org/cgi/man.cgi?query=rsyncd.conf</ref>
}}

Rsync outputs <code>--info=progress2</code> lines like so:<syntaxhighlight lang="text">
overall percent complete time remaining
bytes transferred | transfer speed |
| | | |
3,342,336 33% 3.14MB/s 0:00:02
</syntaxhighlight>

The controlling Port captures these lines is sent to the library's <code>handle_info</code> callback as <code>{:data, line}</code>. After the transfer is finished we receive a conclusive <code>{:exit_status, status_code}</code> message.

As a first step, we extract the overall_percent_done column and flag any unrecognized output:
<syntaxhighlight lang="elixir">
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
percent_done_text when percent_done_text != nil <- Enum.at(terms, 1),
{percent_done, "%"} <- Float.parse(percent_done_text) do
percent_done
else
_ ->
{:unknown, line}
end
</syntaxhighlight>The <code>trim</code> is lifting more than its weight here: it lets us completely ignore spacing and newline trickery—and ignores the leading carriage return before each line, seen in the rsync source code:<ref>[https://github.com/RsyncProject/rsync/blob/797e17fc4a6f15e3b1756538a9f812b63942686f/progress.c#L129 rsync/progress.c] source code</ref>
<syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>Carriage return <code>\r</code> deserves special mention: this is the first "control" character we come across and it looks the same as an ordinary byte in the binary data coming over the pipe from rsync, similar to newline <code>\n</code>. Its normal role is to control the terminal emulator, rewinding the cursor so that the current line can be overwritten! And like newline, carriage return can be ignored. Control signaling is exactly what goes haywire about this project, and the leaky category distinction between data and control seems to be a repeated theme in inter-process communication. The reality is not so much data vs. control, as it seems to be a sequence of layers like with [[w:OSI model|networking]].

{{Aside|text=
[[File:Chinese typewriter 03.jpg|right|200x200px]]

On the terminal, rsync progress lines are updated in place by beginning each line with a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>. Try this command in a terminal:<syntaxhighlight lang="shell">
# echo "three^Mtwo"
twoee
</syntaxhighlight>
You'll have to use <control>-v <control>-m to type a literal carriage return, copy-and-paste won't work.

The character is named after the pushing of a physical typewriter carriage to return to the beginning of the current line without feeding the roller to a new line.

[[File:Baboons Playing in Chobe National Park-crlf.jpg|left|300x300px|Three young baboons playing on a rock ledge. Two are on the ridge and one below, grabbing the tail of another. A meme font shows "\r", "\n", and "\r\n" personified as each baboon.]]
[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreement about carriage return]] vs. line feed has caused eye-rolling since the dawn of personal computing.
}}

== OTP generic server ==
The Port API is convenient enough so far, but Erlang/OTP really starts to shine once we wrap each Port connection under a <code>gen_server</code><ref>[https://www.erlang.org/doc/apps/stdlib/gen_server.html Erlang gen_server docs]</ref> module, giving us several properties for free: A dedicated application thread coordinates with its rsync process independent of anything else. Input and output are asynchronous and buffered, but handled sequentially in a thread-safe way. The gen_server holds internal state including the up-to-date completion percentage. And the caller can request updates as needed, or it can listen for push messages with the parsed statistics.

This gen_server is also expected to run safely under an OTP supervision tree<ref>[https://adoptingerlang.org/docs/development/supervision_trees/ "Supervision Trees"] chapter from [https://adoptingerlang.org/ Adopting Erlang]</ref> but this is where our dream falls apart for the moment. The Port already watches for rsync completion or failure and reports upwards to its caller, but we fail at the critical property of being able to propagate a termination downwards to shut down rsync if the calling code or our library module crashes.

== Problem: runaway processes ==
[[File:CargoNet Di 12 Euro 4000 Lønsdal - Bolna.jpg|thumb]]
The unpleasant real-world consequence is that rsync transfers will continue to run in the background even after Elixir kills our gen_server or shuts down, because the BEAM has no way of stopping the external process.

It's possible to send a signal by shelling out to unix <code>kill PID</code>, but BEAM doesn't expose the child process ID and doesn't include any built-in functions to send a signal to an OS process. Clearly we're expected to do this another way. Another problem with "kill" is that we want the external process to stop no matter how badly the BEAM is damaged, so we shouldn't rely on stored data or on running final clean-up logic before exiting.

To debug what happens during <code>port_close</code> and to eliminate variables, I tried to spawn <code>sleep 60</code> using the same Port command, and I found that it behaves exactly the same way, hanging until the sleep ends naturally regardless of what happened in Elixir or whether its pipes are still open. This happens to have been a lucky choice as I learned later: "sleep" is unusual in the same way as rsync but its behavior is much simpler to reason about.

== Bad assumption: pipe-like processes ==
A pipeline like <code>gzip</code> or <code>cat</code> it built to read from its input and write to its output. We can roughly group the different styles of command-line application into "pipeline" programs which read and write, "interactive" programs which require user input, and "daemon" programs which are designed to run in the background. Some programs support multiple modes depending on the arguments given at launch, or by detecting the terminal using <code>isatty</code><ref>[https://man.archlinux.org/man/isatty.3.en docs for <code>isatty</code>]</ref>. The BEAM is currently optimized to interface with pipeline programs and it assumes that the external process will stop when its "standard input" is closed.

A typical pipeline program will stop once it detects that input has ended, by making regular C system calls to <code>read</code><ref>[https://man.archlinux.org/man/read.2 libc <code>read</code> docs]</ref>:<syntaxhighlight lang="c">
ssize_t n_read = read (input_desc, buf, bufsize);
if (n_read < 0) { error... }
if (n_read == 0) { end of file... }
</syntaxhighlight>When the program uses blocking I/O, reading zero bytes indicates the end of file. There are also programs which do asynchronous I/O using <code>O_NONBLOCK</code><ref>[https://man.archlinux.org/man/open.2.en#O_NONBLOCK O_NONBLOCK docs]</ref>, and these might rely on the <code>HUP</code> hang-up signal which is normally sent when input is closed.

But here we'll focus on how processes can more generally affect each other through pipes. Surprising answer: without much effect! You can experiment with the <code>/dev/null</code> device which behaves like a closed pipe, for example compare these two commands:<syntaxhighlight lang="shell">
cat < /dev/null

sleep 10 < /dev/null
</syntaxhighlight><code>cat</code> exits immediately, but <code>sleep</code> does its thing as usual.

You could do the same experiment by opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file. Interestingly, what happened here is that <control>-d is interpreted by bash which responds by closing its pipe connected to standard input of the child process. This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe. My entry point to learning more is this stty webzine<ref>[https://wizardzines.com/comics/stty/ ★ wizard zines ★: stty]</ref> by Julia Evans. Dump information about your own terminal emulator: <code>stty -a</code>

Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a hard reality. A program with a chaotic disposition could even reopen stdin after it was closed and connect it to something else, to the great surprise of friends and neighbors.

Back to the problem at hand, "rsync" is in the category of "daemon-like" programs which will carry on even after standard input is closed. This makes sense enough, since rsync isn't interactive and any output is just a side effect of its main purpose.

== Shimming can kill ==
A small shim can adapt a daemon-like program to behave more like a pipeline. The shim is sensitive to stdin closing or SIGHUP, and when this is detected it converts this into a stronger signal like SIGTERM which it forwards to its own child. This is the idea behind a suggested shell script<ref>[https://hexdocs.pm/elixir/1.19.0/Port.html#module-orphan-operating-system-processes Elixir Port docs showing a shim script]</ref> for Elixir, and the <code>erlexec</code><ref name=":0">[https://hexdocs.pm/erlexec/readme.html <code>erlexec</code> library]</ref> library. The opposite adapter can be found in the [[w:nohup|nohup]] shell command and the grimsby<ref>[https://github.com/shortishly/grimsby <code>grimsby</code> library]</ref> library: these will keep standard in and/or standard out open for the child process even after the parent exits, so that a pipe-like program can behave more like a daemon.

I used the shim approach in my rsync library and it includes a small C program<ref>[https://gitlab.com/adamwight/rsync_ex/-/blob/main/src/main.c?ref_type=heads rsync_ex C shim program]</ref> which wraps rsync and makes it sensitive to BEAM <code>port_close</code>. It's featherweight, leaving pipes unchanged as it passes control to rsync—its only real effect is to convert SIGHUP to SIGKILL (but should have been SIGTERM, see the sidebar discussion of different signals below).

== Reliable clean up ==
{{Project|status=in review|url=https://erlangforums.com/t/open-port-and-zombie-processes|source=https://github.com/erlang/otp/pull/9453}}
It's always a pleasure to ask questions in the BEAM communities, they deserve their reputation for being friendly and open. The first big tip was to look at the third-party library <code>erlexec</code><ref name=":0" />, which demonstrates emerging best practices which could be backported into the language itself. Everyone speaking on the problem generally agrees that the fragile clean up of external processes is a bug, and supports the idea that some flavor of "terminate" signal should be sent to spawned programs when the port is closed.
[[File:Itinerant glassworker exhibition with spinning wheel and steam engine.jpg|thumb]]
I would be lying to hide my disappointment that the required core changes are mostly in an auxiliary C program and not written in Erlang or even in the BEAM itself, but it was still fascinating to open such an elegant black box and find the technological equivalent of a steam engine inside. All of the futuristic, high-level features we've come to know actually map closely to a few scraps of wizardry with ordinary pipes<ref>[https://man.archlinux.org/man/pipe.7.en Overview of unix pipes]</ref>, using libc's pipe<ref>[https://man.archlinux.org/man/pipe.2.en Docs for the <code>pipe</code> syscall]</ref>, read, write, and select<ref>[https://man.archlinux.org/man/select.2.en libc <code>select</code> docs]</ref>.

Port drivers<ref>[https://www.erlang.org/doc/system/ports.html Erlang ports docs]</ref> are fundamental to ERTS, and several levels of port wiring are involved in launching external processes: the spawn driver starts a forker driver which sends a control message to <code>erl_child_setup</code> to execute your external command. Each BEAM has a single erl_child_setup process to watch over all children. This architecture reflects the Supervisor paradigm and we can leverage it to produce some of the same properties: the subprocess can buffer reads and writes asynchronously and handle them sequentially; and if the BEAM crashes then erl_child_setup can detect the condition and do its own cleanup.

Letting a child process outlive its controlling process leaves the child in a state called "orphaned" in POSIX, and the standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists. This can be seen as undesirable because unix itself has a paradigm similar to OTP's Supervisors, in which each parent is responsible for its children. Without supervision, a process could potentially run forever or do naughty things. The system <code>init</code> process starts and tracks its own children, and can restart them in response to service commands. But init will know nothing about adopted, orphan processes or how to monitor and restart them.

The patch [https://github.com/erlang/otp/pull/9453 PR#9453] adapting port_close to SIGTERM is waiting for review and responses look generally positive so far.

{{Aside|text='''Which signal?'''

Which signal to use is still an open question:

; <code>HUP</code> : sent to a process when its standard input stream is closed<ref>[https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap11.html#tag_11_01_10 POSIX standard "General Terminal Interface: Modem Disconnect"</ref>

; <code>TERM</code> : has a clear intention of "kill this thing" but still possible to trap at the target and handle in a customized way

; <code>KILL</code> : bursting with destructive potential, this signal cannot be stopped and you may not clean up

There is a refreshing diversity of opinion, so it could be worthwhile to make the signal configurable for each port.
}}

== Future directions ==
Discussion threads also included some notable grumbling about the Port API in general, it seems this part of ERTS is overdue for a larger redesign.

There's a good opportunity to unify the different platform implementations: Windows lacks the erl_child_setup layer entirely, for example.

Another idea to borrow from the erlexec library is to have an option to kill the entire process group of a child, which is shared by any descendants that haven't explicitly broken out of its original group. This would be useful for managing deep trees of external processes launched by a forked command.

== References ==

File:Baboons Playing in Chobe National Park-crlf.jpg

2025-10-21T13:51:06Z

Adamw: Uploaded a work by Faypearse from https://commons.wikimedia.org/wiki/File:Baboons_Playing_in_Chobe_National_Park.jpg with UploadWizard

=={{int:filedesc}}==
{{Information
|description={{en|1=Three young baboons playing on a rock ledge. Two are on the ridge and one below, grabbing the tail of another. A meme font shows "\r", "\n", and "\r\n" personified as each baboon.}}
|date=2025-10-21
|source=https://commons.wikimedia.org/wiki/File:Baboons_Playing_in_Chobe_National_Park.jpg
|author= Faypearse
|permission=
|other versions=
}}

=={{int:license-header}}==
{{cc-by-sa-4.0}}

Elixir/Ports and external process wiring

2025-10-20T14:41:31Z

Adamw:

A deceivingly simple programming adventure veers unexpectedly into piping and signaling between unix processes.

== Context: controlling "rsync" ==
{{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}}

My exploration begins while writing a beta-quality library for Elixir to transfer files in the background and monitor progress using rsync.

I was excited to learn how to interface with long-lived external processes—and this project offered more than I hoped for.

{{Aside|text=<p>[[w:rsync|Rsync]] is the standard utility for file transfers, locally or over a network. It can resume incomplete transfers and synchronize directories efficiently, and after almost 30 years of usage it can be trusted to handle any edge case.</p>
<p>BEAM is a fairly unique ecosystem in which it's not considered deviant to reinvent a rounder wheel: an external dependency like "cron" would often be ported into native Erlang—but the complexity of rsync and its dependence on a matching remote daemon makes it unlikely that it will be rewritten any time soon, which is why I've decided to wrap external command execution in a library.</p>}}

[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|300x300px]]

=== Naive shelling ===

Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
System.shell("rsync -a source target")
</syntaxhighlight>
This has a few shortcomings, starting with how we pass the filenames. It would be possible to pass a dynamic path using string interpolation like <code>#{source}</code> but this is risky: consider what happens if the filenames include whitespace or even special shell characters such as ";".

=== Safe path handling ===
We turn next to <code>System.cmd</code>, which takes a raw argv and can't be fooled special characters in the path arguments:<syntaxhighlight lang="elixir">
System.find_executable(rsync_path)
|> System.cmd([~w(-a), source, target])
</syntaxhighlight>For a short job this is perfect, but for longer transfers our program loses control and observability, waiting indefinitely for a monolithic command to return.

=== Asynchronous call and communication ===
To run a external process asynchronously we reach for Elixir's low-level <code>Port.open</code>, nothing but a one-line wrapper<ref>See the [https://github.com/elixir-lang/elixir/blob/809b035dccf046b7b7b4422f42cfb6d075df71d2/lib/elixir/lib/port.ex#L232 port.ex source code]</ref> which passes its parameters directly to ERTS <code>open_port</code><ref>[https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2 Erlang <code>open_port</code> docs]</ref>. This function is tremendously flexible, here we turn a few knobs:<syntaxhighlight lang="elixir">
Port.open(
{:spawn_executable, rsync_path},
[
:binary,
:exit_status,
:hide,
:use_stdio,
:stderr_to_stdout,
args:
~w(-a --info=progress2) ++
rsync_args ++
sources ++
[args[:target]],
env: env
]
)
</syntaxhighlight>

{{Aside|text=
'''Rsync progress reporting options'''

There are a variety of ways to report progress:

; <code>-v</code> : list each filename as it's transferred

; <code>--info=progress2</code> : report overall progress

; <code>--progress</code> : report statistics per file

; <code>--itemize-changes</code> : list the operations taken on each file

; <code>--out-format=FORMAT</code> : any format using parameters from rsyncd.conf's <code>log format</code><ref>https://man.freebsd.org/cgi/man.cgi?query=rsyncd.conf</ref>
}}

We've chosen <code>--info=progress2</code> , so the meaning of the reported percentage is "overall percent complete". Rsync outputs these progress lines in a fairly self-explanatory columnar format:<syntaxhighlight lang="text">
percent complete time remaining
bytes transferred | transfer speed |
| | | |
3,342,336 33% 3.14MB/s 0:00:02
</syntaxhighlight>

Our Port captures output and each line is sent to the library's <code>handle_info</code> callback as <code>{:data, line}</code>. After the transfer is finished we receive a conclusive <code>{:exit_status, status_code}</code> message.

As a first step, we extract the percent_done column and flag any unrecognized output:
<syntaxhighlight lang="elixir">
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
percent_done_text when percent_done_text != nil <- Enum.at(terms, 1),
{percent_done, "%"} <- Float.parse(percent_done_text) do
percent_done
else
_ ->
{:unknown, line}
end
</syntaxhighlight>The <code>trim</code> is lifting more than its weight here: it lets us completely ignore spacing and newline trickery—even skipping the leading carriage return that can be seen in the rsync source code,<ref>[https://github.com/RsyncProject/rsync/blob/797e17fc4a6f15e3b1756538a9f812b63942686f/progress.c#L129 rsync/progress.c] source code</ref>
<syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>Carriage return <code>\r</code> deserves special mention: this "control" character is just a byte in the binary data coming over the pipe from rsync, but its normal role is to control the terminal emulator, rewinding the cursor so that the current line can be overwritten!

A repeated theme in inter-process communication is that data and control are leaky categories. We come to the more formal control side channels later.

{{Aside|text=
[[File:Chinese typewriter 03.jpg|right|200x200px]]

On the terminal, rsync progress lines are updated in place by beginning each line with a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>. Try this command in a terminal:<syntaxhighlight lang="shell">
# echo "three^Mtwo"
twoee
</syntaxhighlight>
You'll have to use <control>-v <control>-m to type a literal carriage return, copy-and-paste won't work.

The character seems to be named after pushing the physical paper carriage of a typewriter back to the beginning of the line without feeding the roller.

[[File:Nilgais fighting, Lakeshwari, Gwalior district, India.jpg|left|200x200px]]
[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreement about carriage return]] vs. line feed has caused eye-rolling since the dawn of personal computing.
}}

== OTP generic server ==
The Port API is convenient enough so far, but Erlang/OTP really starts to shine once we wrap each Port connection under a <code>gen_server</code><ref>[https://www.erlang.org/doc/apps/stdlib/gen_server.html Erlang gen_server docs]</ref> module, giving us several properties for free: A dedicated application thread coordinates with its rsync process independent of anything else. Input and output are asynchronous and buffered, but handled sequentially in a thread-safe way. The gen_server holds internal state including the up-to-date completion percentage. And the caller can request updates as needed, or it can listen for push messages with the parsed statistics.

This gen_server is also expected to run safely under an OTP supervision tree<ref>[https://adoptingerlang.org/docs/development/supervision_trees/ "Supervision Trees"] chapter from [https://adoptingerlang.org/ Adopting Erlang]</ref> but this is where our dream falls apart for the moment. The Port already watches for rsync completion or failure and reports upwards to its caller, but we fail at the critical property of being able to propagate a termination downwards to shut down rsync if the calling code or our library module crashes.

== Problem: runaway processes ==
[[File:CargoNet Di 12 Euro 4000 Lønsdal - Bolna.jpg|thumb]]
The unpleasant real-world consequence is that rsync transfers will continue to run in the background even after Elixir kills our gen_server or shuts down, because the BEAM has no way of stopping the external process.

It's possible to send a signal by shelling out to unix <code>kill PID</code>, but BEAM doesn't expose the child process ID and doesn't include any built-in functions to send a signal to an OS process. Clearly we're expected to do this another way. Another problem with "kill" is that we want the external process to stop no matter how badly the BEAM is damaged, so we shouldn't rely on stored data or on running final clean-up logic before exiting.

To debug what happens during <code>port_close</code> and to eliminate variables, I tried to spawn <code>sleep 60</code> using the same Port command, and I found that it behaves exactly the same way, hanging until the sleep ends naturally regardless of what happened in Elixir or whether its pipes are still open. This happens to have been a lucky choice as I learned later: "sleep" is unusual in the same way as rsync but its behavior is much simpler to reason about.

== Bad assumption: pipe-like processes ==
A pipeline like <code>gzip</code> or <code>cat</code> it built to read from its input and write to its output. We can roughly group the different styles of command-line application into "pipeline" programs which read and write, "interactive" programs which require user input, and "daemon" programs which are designed to run in the background. Some programs support multiple modes depending on the arguments given at launch, or by detecting the terminal using <code>isatty</code><ref>[https://man.archlinux.org/man/isatty.3.en docs for <code>isatty</code>]</ref>. The BEAM is currently optimized to interface with pipeline programs and it assumes that the external process will stop when its "standard input" is closed.

A typical pipeline program will stop once it detects that input has ended, by making regular C system calls to <code>read</code><ref>[https://man.archlinux.org/man/read.2 libc <code>read</code> docs]</ref>:<syntaxhighlight lang="c">
ssize_t n_read = read (input_desc, buf, bufsize);
if (n_read < 0) { error... }
if (n_read == 0) { end of file... }
</syntaxhighlight>When the program uses blocking I/O, reading zero bytes indicates the end of file. There are also programs which do asynchronous I/O using <code>O_NONBLOCK</code><ref>[https://man.archlinux.org/man/open.2.en#O_NONBLOCK O_NONBLOCK docs]</ref>, and these might rely on the <code>HUP</code> hang-up signal which is normally sent when input is closed.

But here we'll focus on how processes can more generally affect each other through pipes. Surprising answer: without much effect! You can experiment with the <code>/dev/null</code> device which behaves like a closed pipe, for example compare these two commands:<syntaxhighlight lang="shell">
cat < /dev/null

sleep 10 < /dev/null
</syntaxhighlight><code>cat</code> exits immediately, but <code>sleep</code> does its thing as usual.

You could do the same experiment by opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file. Interestingly, what happened here is that <control>-d is interpreted by bash which responds by closing its pipe connected to standard input of the child process. This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe. My entry point to learning more is this stty webzine<ref>[https://wizardzines.com/comics/stty/ ★ wizard zines ★: stty]</ref> by Julia Evans. Dump information about your own terminal emulator: <code>stty -a</code>

Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a hard reality. A program with a chaotic disposition could even reopen stdin after it was closed and connect it to something else, to the great surprise of friends and neighbors.

Back to the problem at hand, "rsync" is in the category of "daemon-like" programs which will carry on even after standard input is closed. This makes sense enough, since rsync isn't interactive and any output is just a side effect of its main purpose.

== Shimming can kill ==
A small shim can adapt a daemon-like program to behave more like a pipeline. The shim is sensitive to stdin closing or SIGHUP, and when this is detected it converts this into a stronger signal like SIGTERM which it forwards to its own child. This is the idea behind a suggested shell script<ref>[https://hexdocs.pm/elixir/1.19.0/Port.html#module-orphan-operating-system-processes Elixir Port docs showing a shim script]</ref> for Elixir, and the <code>erlexec</code><ref name=":0">[https://hexdocs.pm/erlexec/readme.html <code>erlexec</code> library]</ref> library. The opposite adapter can be found in the [[w:nohup|nohup]] shell command and the grimsby<ref>[https://github.com/shortishly/grimsby <code>grimsby</code> library]</ref> library: these will keep standard in and/or standard out open for the child process even after the parent exits, so that a pipe-like program can behave more like a daemon.

I used the shim approach in my rsync library and it includes a small C program<ref>[https://gitlab.com/adamwight/rsync_ex/-/blob/main/src/main.c?ref_type=heads rsync_ex C shim program]</ref> which wraps rsync and makes it sensitive to BEAM <code>port_close</code>. It's featherweight, leaving pipes unchanged as it passes control to rsync—its only real effect is to convert SIGHUP to SIGKILL (but should have been SIGTERM, see the sidebar discussion of different signals below).

== Reliable clean up ==
{{Project|status=in review|url=https://erlangforums.com/t/open-port-and-zombie-processes|source=https://github.com/erlang/otp/pull/9453}}
It's always a pleasure to ask questions in the BEAM communities, they deserve their reputation for being friendly and open. The first big tip was to look at the third-party library <code>erlexec</code><ref name=":0" />, which demonstrates emerging best practices which could be backported into the language itself. Everyone speaking on the problem generally agrees that the fragile clean up of external processes is a bug, and supports the idea that some flavor of "terminate" signal should be sent to spawned programs when the port is closed.

I would be lying to hide my disappointment that the required core changes are mostly in an auxiliary C program and not written in Erlang or even in the BEAM itself, but it was still fascinating to open such an elegant black box and find the technological equivalent of a steam engine inside. All of the futuristic, high-level features we've come to know actually map closely to a few scraps of wizardry with ordinary pipes<ref>[https://man.archlinux.org/man/pipe.7.en Overview of unix pipes]</ref>, using libc's pipe<ref>[https://man.archlinux.org/man/pipe.2.en Docs for the <code>pipe</code> syscall]</ref>, read, write, and select<ref>[https://man.archlinux.org/man/select.2.en libc <code>select</code> docs]</ref>.

Port drivers<ref>[https://www.erlang.org/doc/system/ports.html Erlang ports docs]</ref> are fundamental to ERTS, and several levels of port wiring are involved in launching external processes: the spawn driver starts a forker driver which sends a control message to <code>erl_child_setup</code> to execute your external command. Each BEAM has a single erl_child_setup process to watch over all children. This architecture reflects the Supervisor paradigm and we can leverage it to produce some of the same properties: the subprocess can buffer reads and writes asynchronously and handle them sequentially; and if the BEAM crashes then erl_child_setup can detect the condition and do its own cleanup.

Letting a child process outlive its controlling process leaves the child in a state called "orphaned" in POSIX, and the standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists. This can be seen as undesirable because unix itself has a paradigm similar to OTP's Supervisors, in which each parent is responsible for its children. Without supervision, a process could potentially run forever or do naughty things. The system <code>init</code> process starts and tracks its own children, and can restart them in response to service commands. But init will know nothing about adopted, orphan processes or how to monitor and restart them.

The patch [https://github.com/erlang/otp/pull/9453 PR#9453] adapting port_close to SIGTERM is waiting for review and responses look generally positive so far.

{{Aside|text='''Which signal?'''

Which signal to use is still an open question:

; <code>HUP</code> : sent to a process when its standard input stream is closed<ref>[https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap11.html#tag_11_01_10 POSIX standard "General Terminal Interface: Modem Disconnect"</ref>

; <code>TERM</code> : has a clear intention of "kill this thing" but still possible to trap at the target and handle in a customized way

; <code>KILL</code> : bursting with destructive potential, this signal cannot be stopped and you may not clean up

There is a refreshing diversity of opinion, so it could be worthwhile to make the signal configurable for each port.
}}

== Future directions ==
Discussion threads also included some notable grumbling about the Port API in general, it seems this part of ERTS is overdue for a larger redesign.

There's a good opportunity to unify the different platform implementations: Windows lacks the erl_child_setup layer entirely, for example.

Another idea to borrow from the erlexec library is to have an option to kill the entire process group of a child, which is shared by any descendants that haven't explicitly broken out of its original group. This would be useful for managing deep trees of external processes launched by a forked command.

== References ==

Elixir/Ports and external process wiring

2025-10-20T12:26:14Z

Adamw: c/e

A deceivingly simple programming adventure veers unexpectedly into piping and signaling between unix processes.

== Context: controlling "rsync" ==
{{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}}

My exploration begins while writing a beta-quality library for Elixir to transfer files in the background and monitor progress using rsync.

I was excited to learn how to interface with long-lived external processes—and this project offered more than I hoped for.

{{Aside|text=<p>[[w:rsync|Rsync]] is the standard utility for file transfers, locally or over a network. It can resume incomplete transfers and synchronize directories efficiently, and after almost 30 years of usage it can be trusted to handle any edge case.</p>
<p>BEAM is a fairly unique ecosystem in which it's not considered deviant to reinvent a rounder wheel: an external dependency like "cron" would often be ported into native Erlang—but the complexity of rsync and its dependence on a matching remote daemon makes it unlikely that it will be rewritten any time soon, which is why I've decided to wrap external command execution in a library.</p>}}

[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|300x300px]]

=== Naive shelling ===

Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
System.shell("rsync -a source target")
</syntaxhighlight>
This has a few shortcomings, starting with how we pass the filenames. It would be possible to pass a dynamic path using string interpolation like <code>#{source}</code> but this is risky: consider what happens if the filenames include whitespace or even special shell characters such as ";".

=== Safe path handling ===
We turn next to <code>System.cmd</code>, which takes a raw argv and can't be fooled special characters in the path arguments:<syntaxhighlight lang="elixir">
System.find_executable(rsync_path)
|> System.cmd([~w(-a), source, target])
</syntaxhighlight>For a short job this is perfect, but for longer transfers our program loses control and observability, waiting indefinitely for a monolithic command to return.

=== Asynchronous call and communication ===
To run a external process asynchronously we reach for Elixir's low-level <code>Port.open</code>, nothing but a one-line wrapper<ref>See the [https://github.com/elixir-lang/elixir/blob/809b035dccf046b7b7b4422f42cfb6d075df71d2/lib/elixir/lib/port.ex#L232 port.ex source code]</ref> which passes its parameters directly to ERTS <code>open_port</code><ref>[https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2 Erlang <code>open_port</code> docs]</ref>. This function is tremendously flexible, here we turn a few knobs:<syntaxhighlight lang="elixir">
Port.open(
{:spawn_executable, rsync_path},
[
:binary,
:exit_status,
:hide,
:use_stdio,
:stderr_to_stdout,
args:
~w(-a --info=progress2) ++
rsync_args ++
sources ++
[args[:target]],
env: env
]
)
</syntaxhighlight>

{{Aside|text=
'''Rsync progress reporting options'''

There are a variety of ways to report progress:

; <code>-v</code> : list each filename as it's transferred

; <code>--info=progress2</code> : report overall progress

; <code>--progress</code> : report statistics per file

; <code>--itemize-changes</code> : list the operations taken on each file

; <code>--out-format=FORMAT</code> : any format using parameters from rsyncd.conf's <code>log format</code><ref>https://man.freebsd.org/cgi/man.cgi?query=rsyncd.conf</ref>
}}

We've chosen <code>--info=progress2</code> , so the meaning of the reported percentage is "overall percent complete". Rsync outputs these progress lines in a fairly self-explanatory columnar format:<syntaxhighlight lang="text">
percent complete time remaining
bytes transferred | transfer speed |
| | | |
3,342,336 33% 3.14MB/s 0:00:02
</syntaxhighlight>

Our Port captures output and each line is sent to the library's <code>handle_info</code> callback as <code>{:data, line}</code>. After the transfer is finished we receive a conclusive <code>{:exit_status, status_code}</code> message.

As a first step, we extract the percent_done column and flag any unrecognized output:
<syntaxhighlight lang="elixir">
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
percent_done_text when percent_done_text != nil <- Enum.at(terms, 1),
{percent_done, "%"} <- Float.parse(percent_done_text) do
percent_done
else
_ ->
{:unknown, line}
end
</syntaxhighlight>The <code>trim</code> is lifting more than its weight here: it lets us completely ignore spacing and newline trickery—even skipping the leading carriage return that can be seen in the rsync source code,<ref>[https://github.com/RsyncProject/rsync/blob/797e17fc4a6f15e3b1756538a9f812b63942686f/progress.c#L129 rsync/progress.c] source code</ref>
<syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>Carriage return <code>\r</code> deserves special mention: this "control" character is just a byte in the binary data coming over the pipe from rsync, but its normal role is to control the terminal emulator, rewinding the cursor so that the current line can be overwritten!

A repeated theme in inter-process communication is that data and control are leaky categories. We come to the more formal control side channels later.

{{Aside|text=
[[File:Chinese typewriter 03.jpg|right|200x200px]]

On the terminal, rsync progress lines are updated in place by beginning each line with a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>. Try this command in a terminal:<syntaxhighlight lang="shell">
echo "three^Mtwo"
</syntaxhighlight>
You'll have to use <control>-v <control>-m to type a literal carriage return, copy-and-paste won't work. Spoiler: the output should read "twoee".

The character seems to be named after pushing the physical paper carriage of a typewriter back to the beginning of the line without feeding the roller.

[[File:Nilgais fighting, Lakeshwari, Gwalior district, India.jpg|left|200x200px]]
[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreement about carriage return]] vs. line feed has caused eye-rolling since the dawn of personal computing.
}}

== OTP generic server ==
The Port API is convenient enough so far, but Erlang/OTP really starts to shine once we wrap each Port connection under a <code>gen_server</code><ref>[https://www.erlang.org/doc/apps/stdlib/gen_server.html Erlang gen_server docs]</ref> module, giving us several properties for free: A dedicated application thread coordinates with its rsync process independent of anything else. Input and output are asynchronous and buffered, but handled sequentially in a thread-safe way. The gen_server holds internal state including the up-to-date completion percentage. And the caller can request updates as needed, or it can listen for push messages with the parsed statistics.

This gen_server is also expected to run safely under an OTP supervision tree<ref>[https://adoptingerlang.org/docs/development/supervision_trees/ "Supervision Trees"] chapter from [https://adoptingerlang.org/ Adopting Erlang]</ref> but this is where our dream falls apart for the moment. The Port already watches for rsync completion or failure and reports upwards to its caller, but we fail at the critical property of being able to propagate a termination downwards to shut down rsync if the calling code or our library module crashes.

== Problem: runaway processes ==
[[File:CargoNet Di 12 Euro 4000 Lønsdal - Bolna.jpg|thumb]]
The unpleasant real-world consequence is that rsync transfers will continue to run in the background even after Elixir kills our gen_server or shuts down, because the BEAM has no way of stopping the external process.

It's possible to send a signal by shelling out to unix <code>kill PID</code>, but BEAM doesn't expose the child process ID and doesn't include any built-in functions to send a signal to an OS process. Clearly we're expected to do this another way. Another problem with "kill" is that we want the external process to stop no matter how badly the BEAM is damaged, so we shouldn't rely on stored data or on running final clean-up logic before exiting.

To debug what happens during <code>port_close</code> and to eliminate variables, I tried to spawn <code>sleep 60</code> using the same Port command, and I found that it behaves exactly the same way, hanging until the sleep ends naturally regardless of what happened in Elixir or whether its pipes are still open. This happens to have been a lucky choice as I learned later: "sleep" is unusual in the same way as rsync but its behavior is much simpler to reason about.

== Bad assumption: pipe-like processes ==
A pipeline like <code>gzip</code> or <code>cat</code> it built to read from its input and write to its output. These will stop once they detects that input has ended because the main loop usually makes a C system call to <code>read</code> like this:<syntaxhighlight lang="c">
ssize_t n_read = read (input_desc, buf, bufsize);
if (n_read < 0) { error... }
if (n_read == 0) { end of file... }
</syntaxhighlight>The manual for read<ref>[https://man.archlinux.org/man/read.2 libc <code>read</code> docs]</ref> explains that reading 0 bytes indicates the end of file, and a negative number indicates an error such as the input file descriptor already being closed. If you think this sounds weird, I would agree: how do we tell the difference between a stream which is stalled and one which has ended? Does the calling process yield control until input arrives? How do we know if more than bufsize bytes are available? If that word salad excites you, read more about <code>O_NONBLOCK</code><ref>[https://man.archlinux.org/man/open.2.en#O_NONBLOCK O_NONBLOCK docs]</ref> and unix pipes<ref>[https://man.archlinux.org/man/pipe.7.en overview of unix pipes]</ref>.

But here we'll focus on how processes affect each other through pipes. Surprising answer: it doesn't affect very much! Try opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file. Oh no, you killed it! You didn't actually send anything, though—the <control>-d is interpreted by bash and it responds by closing its pipe connected to "[[w:Standard streams|standard input]]" of the child process. This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe. My entry point to learning more is this stty webzine<ref>[https://wizardzines.com/comics/stty/ ★ wizard zines ★: stty]</ref> by Julia Evans. Go ahead and try this command, what could go wrong: <code>stty -a</code>

Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a hard reality. You could even reopen stdin from the application, to the great surprise of your friends and neighbors. For example, try opening "watch ls" or "sleep 60" and try <control>-d all you want—no effect. You did close its stdin but nobody cared, it wasn't listening to you anyway.

Back to the problem at hand, "rsync" is in this latter category of "daemon-like" programs which will carry on even after standard input is closed. This makes sense enough, since rsync isn't interactive and any output is just a side effect of its main purpose.

== Shimming can kill ==
It's possible to write a small adapter which is sensitive to stdin closing, then converts this into a stronger signal like SIGTERM which it forwards to its own child. This is the idea behind a suggested shell script<ref>[https://hexdocs.pm/elixir/1.19.0/Port.html#module-orphan-operating-system-processes Elixir Port docs showing a shim script]</ref> for Elixir and the erlexec<ref>[https://hexdocs.pm/erlexec/readme.html <code>erlexec</code> library]</ref> library. The opposite adapter is also found in the [[w:nohup|nohup]] shell command and the grimsby<ref>[https://github.com/shortishly/grimsby <code>grimsby</code> library]</ref> library: these will keep standard in and/or standard out open for the child process even after the parent exits.

I took the shim approach with my rsync library and included a small C program<ref>[https://gitlab.com/adamwight/rsync_ex/-/blob/main/src/main.c?ref_type=heads rsync_ex C shim program]</ref> which wraps rsync and makes it sensitive to the BEAM port_close. It's featherweight, leaving pipes unchanged as it passes control to rsync—its only real effect is to convert SIGHUP to SIGKILL (but should have been SIGTERM, see the sidebar discussion of different signals below).

== Reliable clean up ==
{{Project|status=in review|url=https://erlangforums.com/t/open-port-and-zombie-processes|source=https://github.com/erlang/otp/pull/9453}}
It's always a pleasure to ask questions in the BEAM communities, they have earned their reputation for being friendly and open. The first big tip was to look at the third-party library [https://hexdocs.pm/erlexec/ erlexec], which demonstrates emerging best practices which could be backported into the language itself. Everyone speaking on the problem has generally agreed that the fragile clean up of external processes is a bug, and supported the idea that some flavor of "terminate" signal should be sent to spawned programs.

I would be lying to hide my disappointment that the required core changes are mostly in a C program and not actually in Erlang, but it was still fascinating to open such an elegant black box and find the technological equivalent of a steam engine inside. All of the futuristic, high-level features we've come to know actually map closely to a few scraps of wizardry with ordinary pipes, using stdlib read, write, and select<ref>[https://man.archlinux.org/man/select.2.en libc <code>select</code> docs]</ref>.

Port drivers<ref>[https://www.erlang.org/doc/system/ports.html Erlang ports docs]</ref> are fundamental to ERTS and external processes are launched through several levels of wiring: the spawn driver starts a forker driver which sends a control message to <code>erl_child_setup</code> to execute your external command. Each BEAM has a single erl_child_setup process to watch over all children.

Letting a child process outlive the one that spawned leaves it in a state called an "orphaned process" in POSIX, and the standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists. This can be seen as undesirable because unix itself has a paradigm similar to OTP's Supervisors, in which each parent is responsible for its children. Without supervision, a process could potentially run forever or do naughty things. The system <code>init</code> process starts and tracks its own children, and can restart them in response to service commands. But init will know nothing about adopted, orphan processes or how to monitor and restart them.

The patch [https://github.com/erlang/otp/pull/9453 PR#9453] adapting port_close to SIGTERM is waiting for review and responses look generally positive so far.

{{Aside|text='''Which signal?'''

Which signal to use is still an open question:

; <code>HUP</code> : the softest "Goodbye!" that a program is free to interpret as it wishes

; <code>TERM</code> : has a clear intention of "kill this thing" but still possible to trap at the target and handle in a customized way

; <code>KILL</code> : bursting with destructive potential, this signal cannot be stopped and you may not clean up

There is a refreshing diversity of opinion, so it could be worthwhile to make the signal configurable for each port.
}}

== Future directions ==
Discussion threads also included some notable grumbling about the Port API in general, it seems this part of ERTS is overdue for a larger redesign.

There's a good opportunity to unify the different platform implementations: Windows lacks the erl_child_setup layer entirely, for example.

Another idea to borrow from the erlexec library is to have an option to kill the entire process group of a child, which is shared by any descendants that haven't explicitly broken out of its original group. This would be useful for managing deep trees of external processes launched by a forked command.

== References ==

Elixir/Ports and external process wiring

2025-10-20T10:08:04Z

Adamw: light c/e

A deceivingly simple programming adventure veers unexpectedly into piping and signaling between unix processes.

== Context: controlling "rsync" ==
{{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}}

My exploration begins while writing a beta-quality library for Elixir to transfer files in the background and monitor progress using rsync.

I was excited to learn how to interface with long-lived external processes—and this project offered more than I hoped for.

{{Aside|text=<p>[[w:rsync|Rsync]] is the standard utility for file transfers, locally or over a network. It can resume incomplete transfers and synchronize directories efficiently, and after almost 30 years of usage it can be trusted to handle any edge case.</p>
<p>BEAM is a fairly unique ecosystem in which it's not considered deviant to reinvent a rounder wheel: it's common to port external dependencies into native Erlang—but the complexity of rsync and its dependence on a matching remote daemon makes it unlikely that it will be rewritten any time soon, which is why I've decided to wrap external command execution in a library.</p>}}

[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|300x300px]]

=== Naive shelling ===

Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
System.shell("rsync -a source target")
</syntaxhighlight>
This has a few shortcomings, starting with how we pass the filenames. It would be possible to pass a dynamic path using string interpolation like <code>#{source}</code> but this is risky: consider what happens if the filenames include whitespace or even special shell characters such as ";".

=== Safe path handling ===
We turn next to <code>System.cmd</code>, which takes a raw argv and can't be fooled special characters in the path arguments:<syntaxhighlight lang="elixir">
System.find_executable(rsync_path)
|> System.cmd([~w(-a), source, target])
</syntaxhighlight>For a short job this is perfect, but for longer transfers our program loses control and observability, waiting indefinitely for a monolithic command to return.

=== Asynchronous call and communication ===
To run a external process asynchronously we reach for Elixir's low-level <code>Port.open</code>, nothing but a one-line wrapper<ref>See the [https://github.com/elixir-lang/elixir/blob/809b035dccf046b7b7b4422f42cfb6d075df71d2/lib/elixir/lib/port.ex#L232 port.ex source code]</ref> which passes its parameters directly to ERTS <code>open_port</code><ref>[https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2 Erlang <code>open_port</code> docs]</ref>. This function is tremendously flexible, here we turn a few knobs:<syntaxhighlight lang="elixir">
Port.open(
{:spawn_executable, rsync_path},
[
:binary,
:exit_status,
:hide,
:use_stdio,
:stderr_to_stdout,
args:
~w(-a --info=progress2) ++
rsync_args ++
sources ++
[args[:target]],
env: env
]
)
</syntaxhighlight>

{{Aside|text=
'''Rsync progress reporting options'''

There are a variety of ways to report progress:

; <code>-v</code> : list each filename as it's transferred

; <code>--info=progress2</code> : report overall progress

; <code>--progress</code> : report statistics per file

; <code>--itemize-changes</code> : list the operations taken on each file

; <code>--out-format=FORMAT</code> : any format using parameters from rsyncd.conf's <code>log format</code><ref>https://man.freebsd.org/cgi/man.cgi?query=rsyncd.conf</ref>
}}

We've chosen <code>--info=progress2</code> , so the meaning of the reported percentage is "overall percent complete". Rsync outputs these progress lines in a fairly self-explanatory columnar format:<syntaxhighlight lang="text">
percent complete time remaining
bytes transferred | transfer speed |
| | | |
3,342,336 33% 3.14MB/s 0:00:02
</syntaxhighlight>

Our Port captures output and each line is sent to the library's <code>handle_info</code> callback as <code>{:data, line}</code>. After the transfer is finished we receive a conclusive <code>{:exit_status, status_code}</code> message.

As a first step, we extract the percent_done column and log any unrecognized output:
<syntaxhighlight lang="elixir">
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
percent_done_text when is_binary(percent_done_text) <- Enum.at(terms, 1),
{percent_done, "%"} <- Float.parse(percent_done_text) do
percent_done
else
_ ->
{:unknown, line}
end
</syntaxhighlight>The <code>trim</code> is lifting more than its weight here: it lets us completely ignore spacing and newline trickery—and even a leading carriage return that we can see in the rsync source code,<ref>[https://github.com/RsyncProject/rsync/blob/797e17fc4a6f15e3b1756538a9f812b63942686f/progress.c#L129 rsync/progress.c] source code</ref>
<syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>Carriage return <code>\r</code> deserves a special mention: this "control" character is just a byte in the binary data coming over the pipe from rsync, but its normal role is playing a control function because of how the terminal emulator responds to it. On a terminal the effect is to rewind the cursor so that the current line can be overwritten!

A repeated theme in inter-process communication is that data and control are leaky categories. We come to the more formal control side channels later.

{{Aside|text=
[[File:Chinese typewriter 03.jpg|right|200x200px]]

On the terminal, rsync progress lines are updated in place by beginning each line with a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>. Try this command in a terminal:<syntaxhighlight lang="shell">
echo "three^Mtwo"
</syntaxhighlight>
You'll have to use <control>-v <control>-m to type a literal carriage return, copy-and-paste won't work. Spoiler: the output should read "twoee".

The character seems to be named after pushing the physical paper carriage of a typewriter back to the beginning of the line without feeding the roller.

[[File:Nilgais fighting, Lakeshwari, Gwalior district, India.jpg|left|200x200px]]
[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreement about carriage return]] vs. line feed has caused eye-rolling since the dawn of personal computing.
}}

== OTP generic server ==
The Port API is convenient enough so far, but where Erlang/OTP really starts to shine is when we wrap each Port connection under a gen_server<ref>[https://www.erlang.org/doc/apps/stdlib/gen_server.html Erlang gen_server docs]</ref> module, giving us some properties for free: A dedicated thread coordinates with its rsync independent of anything else. Input and output are asynchronous and buffered, but handled sequentially in a thread-safe way. It holds internal state including the up-to-date completion percentage. And the caller can either request updates manually, or it can listen for pushed statistics.

This gen_server should also be able to run under an [https://adoptingerlang.org/docs/development/supervision_trees/ OTP supervision tree] but this is where the dream falls apart, for the moment. The Port can watch for rsync completion or failure and report this to its caller, but we fail at the second critical property of being able to shut down rsync if the calling code or our library module crashes.

== Problem: runaway processes ==
[[File:CargoNet Di 12 Euro 4000 Lønsdal - Bolna.jpg|thumb]]
The unpleasant real-world consequence of this limitation is that rsync transfers would continue to run in the background even after Elixir had completely shut down, because the BEAM has no way of stopping the process.

It might be possible to send a signal using unix "kill", but BEAM doesn't expose the child process ID and it doesn't include any built-in commands to send a signal. Clearly we're expected to do this another way. Another problem with "kill" is that we want the external process to stop no matter how badly the BEAM is damaged so we can't rely on stored data and on making a few last calls before crashing.

To eliminate variable and to understand whether the failure to stop was specific to rsync, I tried the same Port command but spawning a <code>sleep 60</code>, and I found that it behaves exactly the same way, hanging until the sleep ends naturally regardless of what happened in Elixir or whether its pipes are still open. This happens to have been a lucky choice, as I learned later that "sleep" is also unusual but its behavior is much simpler to reason about.

== Bad assumption: pipe-like processes ==
A pipeline like <code>gzip</code> or <code>cat</code> it built to read from its input and write to its output. These will stop once they detects that input has ended because the main loop usually makes a C system call to <code>read</code> like this:<syntaxhighlight lang="c">
ssize_t n_read = read (input_desc, buf, bufsize);
if (n_read < 0) { error... }
if (n_read == 0) { end of file... }
</syntaxhighlight>The manual for read<ref>[https://man.archlinux.org/man/read.2 libc <code>read</code> docs]</ref> explains that reading 0 bytes indicates the end of file, and a negative number indicates an error such as the input file descriptor already being closed. If you think this sounds weird, I would agree: how do we tell the difference between a stream which is stalled and one which has ended? Does the calling process yield control until input arrives? How do we know if more than bufsize bytes are available? If that word salad excites you, read more about <code>O_NONBLOCK</code><ref>[https://man.archlinux.org/man/open.2.en#O_NONBLOCK O_NONBLOCK docs]</ref> and unix pipes<ref>[https://man.archlinux.org/man/pipe.7.en overview of unix pipes]</ref>.

But here we'll focus on how processes affect each other through pipes. Surprising answer: it doesn't affect very much! Try opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file. Oh no, you killed it! You didn't actually send anything, though—the <control>-d is interpreted by bash and it responds by closing its pipe connected to "[[w:Standard streams|standard input]]" of the child process. This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe. My entry point to learning more is this stty webzine<ref>[https://wizardzines.com/comics/stty/ ★ wizard zines ★: stty]</ref> by Julia Evans. Go ahead and try this command, what could go wrong: <code>stty -a</code>

Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a hard reality. You could even reopen stdin from the application, to the great surprise of your friends and neighbors. For example, try opening "watch ls" or "sleep 60" and try <control>-d all you want—no effect. You did close its stdin but nobody cared, it wasn't listening to you anyway.

Back to the problem at hand, "rsync" is in this latter category of "daemon-like" programs which will carry on even after standard input is closed. This makes sense enough, since rsync isn't interactive and any output is just a side effect of its main purpose.

== Shimming can kill ==
It's possible to write a small adapter which is sensitive to stdin closing, then converts this into a stronger signal like SIGTERM which it forwards to its own child. This is the idea behind a suggested shell script<ref>[https://hexdocs.pm/elixir/1.19.0/Port.html#module-orphan-operating-system-processes Elixir Port docs showing a shim script]</ref> for Elixir and the erlexec<ref>[https://hexdocs.pm/erlexec/readme.html <code>erlexec</code> library]</ref> library. The opposite adapter is also found in the [[w:nohup|nohup]] shell command and the grimsby<ref>[https://github.com/shortishly/grimsby <code>grimsby</code> library]</ref> library: these will keep standard in and/or standard out open for the child process even after the parent exits.

I took the shim approach with my rsync library and included a small C program<ref>[https://gitlab.com/adamwight/rsync_ex/-/blob/main/src/main.c?ref_type=heads rsync_ex C shim program]</ref> which wraps rsync and makes it sensitive to the BEAM port_close. It's featherweight, leaving pipes unchanged as it passes control to rsync—its only real effect is to convert SIGHUP to SIGKILL (but should have been SIGTERM, see the sidebar discussion of different signals below).

== Reliable clean up ==
{{Project|status=in review|url=https://erlangforums.com/t/open-port-and-zombie-processes|source=https://github.com/erlang/otp/pull/9453}}
It's always a pleasure to ask questions in the BEAM communities, they have earned their reputation for being friendly and open. The first big tip was to look at the third-party library [https://hexdocs.pm/erlexec/ erlexec], which demonstrates emerging best practices which could be backported into the language itself. Everyone speaking on the problem has generally agreed that the fragile clean up of external processes is a bug, and supported the idea that some flavor of "terminate" signal should be sent to spawned programs.

I would be lying to hide my disappointment that the required core changes are mostly in a C program and not actually in Erlang, but it was still fascinating to open such an elegant black box and find the technological equivalent of a steam engine inside. All of the futuristic, high-level features we've come to know actually map closely to a few scraps of wizardry with ordinary pipes, using stdlib read, write, and select<ref>[https://man.archlinux.org/man/select.2.en libc <code>select</code> docs]</ref>.

Port drivers<ref>[https://www.erlang.org/doc/system/ports.html Erlang ports docs]</ref> are fundamental to ERTS and external processes are launched through several levels of wiring: the spawn driver starts a forker driver which sends a control message to <code>erl_child_setup</code> to execute your external command. Each BEAM has a single erl_child_setup process to watch over all children.

Letting a child process outlive the one that spawned leaves it in a state called an "orphaned process" in POSIX, and the standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists. This can be seen as undesirable because unix itself has a paradigm similar to OTP's Supervisors, in which each parent is responsible for its children. Without supervision, a process could potentially run forever or do naughty things. The system <code>init</code> process starts and tracks its own children, and can restart them in response to service commands. But init will know nothing about adopted, orphan processes or how to monitor and restart them.

The patch [https://github.com/erlang/otp/pull/9453 PR#9453] adapting port_close to SIGTERM is waiting for review and responses look generally positive so far.

{{Aside|text='''Which signal?'''

Which signal to use is still an open question:

; <code>HUP</code> : the softest "Goodbye!" that a program is free to interpret as it wishes

; <code>TERM</code> : has a clear intention of "kill this thing" but still possible to trap at the target and handle in a customized way

; <code>KILL</code> : bursting with destructive potential, this signal cannot be stopped and you may not clean up

There is a refreshing diversity of opinion, so it could be worthwhile to make the signal configurable for each port.
}}

== Future directions ==
Discussion threads also included some notable grumbling about the Port API in general, it seems this part of ERTS is overdue for a larger redesign.

There's a good opportunity to unify the different platform implementations: Windows lacks the erl_child_setup layer entirely, for example.

Another idea to borrow from the erlexec library is to have an option to kill the entire process group of a child, which is shared by any descendants that haven't explicitly broken out of its original group. This would be useful for managing deep trees of external processes launched by a forked command.

== References ==

Elixir/Ports and external process wiring

2025-10-19T11:11:04Z

Adamw:

A deceivingly simple programming adventure veers unexpectedly into piping and signaling between unix processes.

== Context: controlling "rsync" ==
{{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}}

My exploration begins while writing a beta-quality library for Elixir to transfer files in the background and monitor progress, using rsync.

I was excited to learn how to interface with long-lived external processes—and this project offered more than I hoped for.

{{Aside|text=[[w:rsync|Rsync]] is the best tool for file transfers, locally or over a network. It can resume incomplete transfers and synchronize directories efficiently, and after almost 30 years of usage it can be trusted to handle all the edge cases.

BEAM is a fairly unique ecosystem in which the philosophy is to constantly reinvent a rounder wheel: it's common to port external dependencies into native Erlang—but the complexity of rsync and its dependence on a matching remote daemon makes it unlikely that it will be rewritten any time soon, which is why I've decided to wrap external command execution in a library.}}

[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|300x300px]]

=== Naive shelling ===

Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
System.shell("rsync -a source target")
</syntaxhighlight>
This has a few shortcomings, starting with how we pass the filenames. It's possible to have a dynamic path coming from string interpolation like <code>#{source}</code> but this gets risky: consider what happens if the filenames include whitespace or even special shell characters such as ";".

=== Safe path handling ===
Skipping ahead to <code>System.cmd</code>, which takes a raw argv and can't be fooled special characters in the path arguments:<syntaxhighlight lang="elixir">
System.find_executable(rsync_path)
|> System.cmd([~w(-a), source, target])
</syntaxhighlight>For a short job this would be fine, but during longer transfers our program loses control and we have to wait indefinitely for the monolithic command to finish.

=== Asynchronous call and communication ===
To run a external process asynchronously we will reach for Elixir's low-level <code>Port.open</code> which passes all of its parameters directly<ref>See the [https://github.com/elixir-lang/elixir/blob/809b035dccf046b7b7b4422f42cfb6d075df71d2/lib/elixir/lib/port.ex#L232 port.ex source code]</ref> to ERTS <code>open_port</code><ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref>. These functions are tremendously flexible, here we turn a few knobs:<syntaxhighlight lang="elixir">
Port.open(
{:spawn_executable, rsync_path},
[
:binary,
:exit_status,
:hide,
:use_stdio,
:stderr_to_stdout,
args:
~w(-a --info=progress2) ++
rsync_args ++
sources ++
[args[:target]],
env: env
]
)
</syntaxhighlight>

{{Aside|text=
rsync has a variety of progress options, we chose overall progress above so the meaning of the percentage is "overall percent complete".

Here is the menu of alternatives:

; <code>--info=progress2</code> : report overall progress

; <code>--progress</code> : report statistics per file

; <code>--itemize-changes</code> : list the operations taken on each file

; <code>--out-format=FORMAT</code> : any format using parameters from rsyncd.conf's <code>log format</code><ref>https://man.freebsd.org/cgi/man.cgi?query=rsyncd.conf</ref>
}}

Rsync outputs progress lines in a fairly self-explanatory format:<syntaxhighlight lang="text">
3,342,336 33% 3.14MB/s 0:00:02
</syntaxhighlight>

Our Port captures output and each line is sent to the library's <code>handle_info</code> callback as <code>{:data, line}</code>. After the transfer is finished we receive a conclusive <code>{:exit_status, status_code}</code> message.

As a first step, we extract the percent_done column and log any unrecognized output:
<syntaxhighlight lang="elixir">
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
percent_done_text when is_binary(percent_done_text) <- Enum.at(terms, 1),
{percent_done, "%"} <- Float.parse(percent_done_text) do
percent_done
else
_ ->
{:unknown, line}
end
</syntaxhighlight>The <code>trim</code> is lifting more than its weight here: it lets us completely ignore spacing and newline trickery—and even a leading carriage return that we can see in the rsync source code,
<syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>Carriage return <code>\r</code> deserves a special mention: this "control" character is just a byte in the binary data coming over the pipe from rsync, but its normal role is playing a control function because of how the terminal emulator responds to it. On a terminal the effect is to rewind the cursor and overwrite the current line!

A repeated theme in inter-process communication is that data and control are leaky categories. We come to the more formal control side channels later.

{{Aside|text=
[[File:Chinese typewriter 03.jpg|right|200x200px]]

On the terminal, rsync progress lines are updated in place by beginning each line with a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>. Try this command in a terminal:<syntaxhighlight lang="shell">
echo "one^Mtwo"
</syntaxhighlight>
You'll have to use <control>-v <control>-m to type a literal carriage return. Spoiler: the output should read "two" and nothing else.

The character seems to be named after pushing the physical paper carriage of a typewriter back to the beginning of the line without feeding the roller.

[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreement about carriage return]] vs. line feed has caused eye-rolling since the dawn of personal computing.

[[File:Nilgais fighting, Lakeshwari, Gwalior district, India.jpg|left|200x200px]]
}}

== OTP generic server ==
The Port API is convenient enough so far, but where Erlang/OTP really starts to shine is when we wrap each Port connection under a gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> module, giving us some properties for free: A dedicated thread coordinates with its rsync independent of anything else. Input and output are asynchronous and buffered, but handled sequentially in a thread-safe way. It holds internal state including the up-to-date completion percentage. And the caller can either request updates manually, or it can listen for pushed statistics.

This gen_server should also be able to run under an [https://adoptingerlang.org/docs/development/supervision_trees/ OTP supervision tree] but this is where the dream falls apart, for the moment. The Port can watch for rsync completion or failure and report this to its caller, but we fail at the second critical property of being able to shut down rsync if the calling code or our library module crashes.

== Problem: runaway processes ==
[[File:CargoNet Di 12 Euro 4000 Lønsdal - Bolna.jpg|thumb]]
The unpleasant real-world consequence of this limitation is that rsync transfers would continue to run in the background even after Elixir had completely shut down, because the BEAM has no way of stopping the process.

It might be possible to send a signal using unix "kill", but BEAM doesn't expose the child process ID and it doesn't include any built-in commands to send a signal. Clearly we're expected to do this another way. Another problem with "kill" is that we want the external process to stop no matter how badly the BEAM is damaged so we can't rely on stored data and on making a few last calls before crashing.

To eliminate variable and to understand whether the failure to stop was specific to rsync, I tried the same Port command but spawning a <code>sleep 60</code>, and I found that it behaves exactly the same way, hanging until the sleep ends naturally regardless of what happened in Elixir or whether its pipes are still open. This happens to have been a lucky choice, as I learned later that "sleep" is also unusual but its behavior is much simpler to reason about.

== Bad assumption: pipe-like processes ==
A pipeline like <code>gzip</code> or <code>cat</code> it built to read from its input and write to its output. These will stop once they detects that input has ended because the main loop usually makes a C system call to <code>read</code> like this:<syntaxhighlight lang="c">
ssize_t n_read = read (input_desc, buf, bufsize);
if (n_read < 0) { error... }
if (n_read == 0) { end of file... }
</syntaxhighlight>The manual for read<ref>https://man.archlinux.org/man/read.2</ref> explains that reading 0 bytes indicates the end of file, and a negative number indicates an error such as the input file descriptor already being closed. If you think this sounds weird, I would agree: how do we tell the difference between a stream which is stalled and one which has ended? Does the calling process yield control until input arrives? How do we know if more than bufsize bytes are available? If that word salad excites you, read more about <code>O_NONBLOCK</code><ref>https://man.archlinux.org/man/open.2.en#O_NONBLOCK</ref> and unix pipes<ref>https://man.archlinux.org/man/pipe.7.en</ref>.

But here we'll focus on how processes affect each other through pipes. Surprising answer: it doesn't affect very much! Try opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file. Oh no, you killed it! You didn't actually send anything, though—the <control>-d is interpreted by bash and it responds by closing its pipe connected to "[[w:Standard streams|standard input]]" of the child process. This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe. My entry point to learning more is this stty webzine<ref>https://wizardzines.com/comics/stty/</ref> by Julia Evans. Go ahead and try this command, what could go wrong: <code>stty -a</code>

Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a hard reality. You could even reopen stdin from the application, to the great surprise of your friends and neighbors. For example, try opening "watch ls" or "sleep 60" and try <control>-d all you want—no effect. You did close its stdin but nobody cared, it wasn't listening to you anyway.

Back to the problem at hand, "rsync" is in this latter category of "daemon-like" programs which will carry on even after standard input is closed. This makes sense enough, since rsync isn't interactive and any output is just a side effect of its main purpose.

== Shimming can kill ==
It's possible to write a small adapter which is sensitive to stdin closing, then converts this into a stronger signal like SIGTERM which it forwards to its own child. This is the idea behind a suggested shell script<ref>https://hexdocs.pm/elixir/1.19.0/Port.html#module-orphan-operating-system-processes</ref> for Elixir and the erlexec<ref>[https://hexdocs.pm/erlexec/readme.html https://hexdocs.pm/erlexec/]</ref> library. The opposite adapter is also found in the [[w:nohup|nohup]] shell command and the grimsby<ref>https://github.com/shortishly/grimsby</ref> library: these will keep standard in and/or standard out open for the child process even after the parent exits.

I took the shim approach with my rsync library and included a small C program<ref>https://gitlab.com/adamwight/rsync_ex/-/blob/main/src/main.c?ref_type=heads</ref> which wraps rsync and makes it sensitive to the BEAM port_close. It's featherweight, leaving pipes unchanged as it passes control to rsync—its only real effect is to convert SIGHUP to SIGKILL (but should have been SIGTERM, see the sidebar discussion of different signals below).

== Reliable clean up ==
{{Project|status=in review|url=https://erlangforums.com/t/open-port-and-zombie-processes|source=https://github.com/erlang/otp/pull/9453}}
It's always a pleasure to ask questions in the BEAM communities, they have earned their reputation for being friendly and open. The first big tip was to look at the third-party library [https://hexdocs.pm/erlexec/ erlexec], which demonstrates emerging best practices which could be backported into the language itself. Everyone speaking on the problem has generally agreed that the fragile clean up of external processes is a bug, and supported the idea that some flavor of "terminate" signal should be sent to spawned programs.

I would be lying to hide my disappointment that the required core changes are mostly in a C program and not actually in Erlang, but it was still fascinating to open such an elegant black box and find the technological equivalent of a steam engine inside. All of the futuristic, high-level features we've come to know actually map closely to a few scraps of wizardry with ordinary pipes, using stdlib read, write, and select<ref>https://man.archlinux.org/man/select.2.en</ref>.

Port drivers<ref>https://www.erlang.org/doc/system/ports.html</ref> are fundamental to ERTS and external processes are launched through several levels of wiring: the spawn driver starts a forker driver which sends a control message to <code>erl_child_setup</code> to execute your external command. Each BEAM has a single erl_child_setup process to watch over all children.

Letting a child process outlive the one that spawned leaves it in a state called an "orphaned process" in POSIX, and the standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists. This can be seen as undesirable because unix itself has a paradigm similar to OTP's Supervisors, in which each parent is responsible for its children. Without supervision, a process could potentially run forever or do naughty things. The system <code>init</code> process starts and tracks its own children, and can restart them in response to service commands. But init will know nothing about adopted, orphan processes or how to monitor and restart them.

The patch [https://github.com/erlang/otp/pull/9453 PR#9453] adapting port_close to SIGTERM is waiting for review and responses look generally positive so far.

{{Aside|text='''Which signal?'''

Which signal to use is still an open question:

; <code>HUP</code> : the softest "Goodbye!" that a program is free to interpret as it wishes

; <code>TERM</code> : has a clear intention of "kill this thing" but still possible to trap at the target and handle in a customized way

; <code>KILL</code> : bursting with destructive potential, this signal cannot be stopped and you may not clean up

There is a refreshing diversity of opinion, so it could be worthwhile to make the signal configurable for each port.
}}

== Future directions ==
Discussion threads also included some notable grumbling about the Port API in general, it seems this part of ERTS is overdue for a larger redesign.

There's a good opportunity to unify the different platform implementations: Windows lacks the erl_child_setup layer entirely, for example.

Another idea to borrow from the erlexec library is to have an option to kill the entire process group of a child, which is shared by any descendants that haven't explicitly broken out of its original group. This would be useful for managing deep trees of external processes launched by a forked command.

== References ==

Elixir/Ports and external process wiring

2025-10-19T10:52:42Z

Adamw:

A deceivingly simple programming adventure veers unexpectedly into piping and signaling between unix processes.

== Context: controlling "rsync" ==
{{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}}

My exploration begins while writing a beta-quality library for Elixir to transfer files in the background and monitor progress, using rsync.

{{Aside|text=[[w:rsync|Rsync]] is the best tool for file transfers, locally or over a network. It can resume incomplete transfers and synchronize directories efficiently, and after almost 30 years of usage it can be trusted to handle all the edge cases.
<br>
BEAM is a fairly unique ecosystem in which the philosophy is to constantly reinvent a rounder wheel: it's common to port external dependencies into native Erlang, but the complexity of rsync and its dependence on a matching remote daemon makes it unlikely that it will be rewritten any time soon, which is why I've decided to wrap external command execution in a library.}}

I was excited to learn how to interface with long-lived external processes—and this project offered more than I hoped for.

[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|300x300px]]

=== Naive shelling ===

Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
System.shell("rsync -a source target")
</syntaxhighlight>
This has a few shortcomings, starting with how we pass the filenames. It's possible to have a dynamic path coming from string interpolation like <code>#{source}</code> but this gets risky: consider what happens if the filenames include whitespace or even special shell characters such as ";".

=== Safe path handling ===
Skipping ahead to <code>System.cmd</code>, which takes a raw argv and can't be fooled special characters in the path arguments:<syntaxhighlight lang="elixir">
System.find_executable(rsync_path)
|> System.cmd([~w(-a), source, target])
</syntaxhighlight>For a short job this would be fine, but during longer transfers our program loses control and we have to wait indefinitely for the monolithic command to finish.

=== Asynchronous call and communication ===
To run a external process asynchronously we will reach for Elixir's low-level <code>Port.open</code> which passes all of its parameters directly<ref>See the [https://github.com/elixir-lang/elixir/blob/809b035dccf046b7b7b4422f42cfb6d075df71d2/lib/elixir/lib/port.ex#L232 port.ex source code]</ref> to ERTS <code>open_port</code><ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref>. These functions are tremendously flexible, here we turn a few knobs:<syntaxhighlight lang="elixir">
Port.open(
{:spawn_executable, rsync_path},
[
:binary,
:exit_status,
:hide,
:use_stdio,
:stderr_to_stdout,
args:
~w(-a --info=progress2) ++
rsync_args ++
sources ++
[args[:target]],
env: env
]
)
</syntaxhighlight>

{{Aside|text=
rsync has a variety of progress options, we chose overall progress above so the meaning of the percentage is "overall percent complete".

Here is the menu of alternatives:

; <code>--info=progress2</code> : report overall progress

; <code>--progress</code> : report statistics per file

; <code>--itemize-changes</code> : list the operations taken on each file

; <code>--out-format=FORMAT</code> : any format using parameters from rsyncd.conf's <code>log format</code><ref>https://man.freebsd.org/cgi/man.cgi?query=rsyncd.conf</ref>
}}

Rsync outputs progress lines in a fairly self-explanatory format:<syntaxhighlight lang="text">
3,342,336 33% 3.14MB/s 0:00:02
</syntaxhighlight>

Our Port captures output and each line is sent to the library's <code>handle_info</code> callback as <code>{:data, line}</code>. After the transfer is finished we receive a conclusive <code>{:exit_status, status_code}</code> message.

As a first step, we extract the percent_done column and log any unrecognized output:
<syntaxhighlight lang="elixir">
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
percent_done_text when is_binary(percent_done_text) <- Enum.at(terms, 1),
{percent_done, "%"} <- Float.parse(percent_done_text) do
percent_done
else
_ ->
{:unknown, line}
end
</syntaxhighlight>The <code>trim</code> is lifting more than its weight here: it lets us completely ignore spacing and newline trickery—and even a leading carriage return that we can see in the rsync source code,
<syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>Carriage return <code>\r</code> deserves a special mention: this "control" character is just a byte in the binary data coming over the pipe from rsync, but its normal role is playing a control function because of how the terminal emulator responds to it. On a terminal the effect is to rewind the cursor and overwrite the current line!

A repeated theme in inter-process communication is that data and control are leaky categories. We come to the more formal control side channels later.

{{Aside|text=
[[File:Chinese typewriter 03.jpg|right|200x200px]]

On the terminal, rsync progress lines are updated in place by emitting a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>. The character seems to be named after pushing the physical paper carriage of a typewriter back to the beginning of the line without feeding the roller.

[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreement about carriage return]] vs. line feed has caused eye-rolling since the dawn of personal computing.

[[File:Nilgais fighting, Lakeshwari, Gwalior district, India.jpg|left|200x200px]]
}}

== OTP generic server ==
This is where Erlang/OTP really starts to shine: our rsync library wraps the Port calls under a gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> module and this gives us some special properties for free: a dedicated thread which coordinates with rsync independently from anything else, receiving and sending asynchronous messages. It has an internal state including the latest percent done and this can be probed by calling code, or it can be set up to push updates to a listener.

A gen_server should be able to run under a [https://adoptingerlang.org/docs/development/supervision_trees/ OTP supervision tree] as well but our module has a major flaw: although it can correctly detect and report when rsync crashes or completes, when our gen_server is stopped by its supervisor it cannot stop its external child process in turn.

== Problem: runaway processes ==
[[File:CargoNet Di 12 Euro 4000 Lønsdal - Bolna.jpg|thumb]]
What this means is that rsync transfers would continue to run in the background even after Elixir had completely shut down, because the BEAM has no way of stopping the process.

To check whether this was something specific to rsync, I tried to open a Port spawning the command <code>sleep 60</code> and I found that it behaves exactly the same way, hanging until the sleep ends naturally regardless of what happened in Elixir or whether its pipes are still open.

== Bad assumption: pipe-like processes ==
A program like <code>gzip</code> or <code>cat</code> will stop once it detects that its input has ended because the main loop usually makes a C system call to <code>read</code> like this:<syntaxhighlight lang="c">
ssize_t n_read = read (input_desc, buf, bufsize);
if (n_read < 0) { error... }
if (n_read == 0) { end of file... }
</syntaxhighlight>The manual for read<ref>https://man.archlinux.org/man/read.2</ref> explains that reading 0 bytes indicates the end of file, and a negative number indicates an error such as the input file descriptor already being closed. If you think this sounds weird, I would agree: how do we tell the difference between a stream which is stalled and one which has ended? Does the calling process yield control until input arrives? How do we know if more than bufsize bytes are available? If that word salad excites you, read more about <code>O_NONBLOCK</code><ref>https://man.archlinux.org/man/open.2.en#O_NONBLOCK</ref> and unix pipes<ref>https://man.archlinux.org/man/pipe.7.en</ref>.

But here we'll focus on how processes affect each other through pipes. Surprising answer: it doesn't affect very much! Try opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file. Oh no, you killed it! You didn't actually send anything, though—the <control>-d is interpreted by bash and it responds by closing its pipe connected to "[[w:Standard streams|standard input]]" of the child process. This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe. My entry point to learning more is this stty webzine<ref>https://wizardzines.com/comics/stty/</ref> by Julia Evans. Go ahead and try this command, what could go wrong: <code>stty -a</code>

Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a hard reality. You could even reopen stdin from the application, to the great surprise of your friends and neighbors. For example, try opening "watch ls" or "sleep 60" and try <control>-d all you want—no effect. You did close its stdin but nobody cared, it wasn't listening to you anyway.

Back to the problem at hand, "rsync" is in this latter category of "daemon-like" programs which will carry on even after standard input is closed. This makes sense enough, since rsync isn't interactive and any output is just a side effect of its main purpose.

== Shimming can kill ==
It's possible to write a small adapter which is sensitive to stdin closing, then converts this into a stronger signal like SIGTERM which it forwards to its own child. This is the idea behind a suggested shell script<ref>https://hexdocs.pm/elixir/1.19.0/Port.html#module-orphan-operating-system-processes</ref> for Elixir and the erlexec<ref>[https://hexdocs.pm/erlexec/readme.html https://hexdocs.pm/erlexec/]</ref> library. The opposite adapter is also found in the [[w:nohup|nohup]] shell command and the grimsby<ref>https://github.com/shortishly/grimsby</ref> library: these will keep standard in and/or standard out open for the child process even after the parent exits.

I took the shim approach with my rsync library and included a small C program<ref>https://gitlab.com/adamwight/rsync_ex/-/blob/main/src/main.c?ref_type=heads</ref> which wraps rsync and makes it sensitive to the BEAM port_close. It's featherweight, leaving pipes unchanged as it passes control to rsync—its only real effect is to convert SIGHUP to SIGKILL (but should have been SIGTERM, see the sidebar discussion of different signals below).

== Reliable clean up ==
{{Project|status=in review|url=https://erlangforums.com/t/open-port-and-zombie-processes|source=https://github.com/erlang/otp/pull/9453}}
It's always a pleasure to ask questions in the BEAM communities, they have earned their reputation for being friendly and open. The first big tip was to look at the third-party library [https://hexdocs.pm/erlexec/ erlexec], which demonstrates emerging best practices which could be backported into the language itself. Everyone speaking on the problem has generally agreed that the fragile clean up of external processes is a bug, and supported the idea that some flavor of "terminate" signal should be sent to spawned programs.

I would be lying to hide my disappointment that the required core changes are mostly in a C program and not actually in Erlang, but it was still fascinating to open such an elegant black box and find the technological equivalent of a steam engine inside. All of the futuristic, high-level features we've come to know actually map closely to a few scraps of wizardry with ordinary pipes, using stdlib read, write, and select<ref>https://man.archlinux.org/man/select.2.en</ref>.

Port drivers<ref>https://www.erlang.org/doc/system/ports.html</ref> are fundamental to ERTS and external processes are launched through several levels of wiring: the spawn driver starts a forker driver which sends a control message to <code>erl_child_setup</code> to execute your external command. Each BEAM has a single erl_child_setup process to watch over all children.

Letting a child process outlive the one that spawned leaves it in a state called an "orphaned process" in POSIX, and the standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists. This can be seen as undesirable because unix itself has a paradigm similar to OTP's Supervisors, in which each parent is responsible for its children. Without supervision, a process could potentially run forever or do naughty things. The system <code>init</code> process starts and tracks its own children, and can restart them in response to service commands. But init will know nothing about adopted, orphan processes or how to monitor and restart them.

The patch [https://github.com/erlang/otp/pull/9453 PR#9453] adapting port_close to SIGTERM is waiting for review and responses look generally positive so far.

{{Aside|text='''Which signal?'''

Which signal to use is still an open question:

; <code>HUP</code> : the softest "Goodbye!" that a program is free to interpret as it wishes

; <code>TERM</code> : has a clear intention of "kill this thing" but still possible to trap at the target and handle in a customized way

; <code>KILL</code> : bursting with destructive potential, this signal cannot be stopped and you may not clean up

There is a refreshing diversity of opinion, so it could be worthwhile to make the signal configurable for each port.
}}

== Future directions ==
Discussion threads also included some notable grumbling about the Port API in general, it seems this part of ERTS is overdue for a larger redesign.

There's a good opportunity to unify the different platform implementations: Windows lacks the erl_child_setup layer entirely, for example.

Another idea to borrow from the erlexec library is to have an option to kill the entire process group of a child, which is shared by any descendants that haven't explicitly broken out of its original group. This would be useful for managing deep trees of external processes launched by a forked command.

== References ==

Elixir/Ports and external process wiring

2025-10-19T10:44:34Z

Adamw: more aside

A deceivingly simple programming adventure veers unexpectedly into piping and signaling between unix processes.

== Context: controlling "rsync" ==
{{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}}

My exploration begins while writing a beta-quality library for Elixir to transfer files in the background and monitor progress, using rsync.

{{Aside|text=[[w:rsync|Rsync]] is the best tool for file transfers, locally or over a network. It can resume incomplete transfers and synchronize directories efficiently, and after almost 30 years of usage it can be trusted to handle all the edge cases.
<br>
BEAM is a fairly unique ecosystem in which the philosophy is to constantly reinvent a rounder wheel: it's common to port external dependencies into native Erlang, but the complexity of rsync and its dependence on a matching remote daemon makes it unlikely that it will be rewritten any time soon, which is why I've decided to wrap external command execution in a library.}}

I was excited to learn how to interface with long-lived external processes—and this project offered more than I hoped for.

[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|300x300px]]

=== Naive shelling ===

Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
System.shell("rsync -a source target")
</syntaxhighlight>
This has a few shortcomings, starting with how we pass the filenames. It's possible to have a dynamic path coming from string interpolation like <code>#{source}</code> but this gets risky: consider what happens if the filenames include whitespace or even special shell characters such as ";".

=== Safe path handling ===
Skipping ahead to <code>System.cmd</code>, which takes a raw argv and can't be fooled special characters in the path arguments:<syntaxhighlight lang="elixir">
System.find_executable(rsync_path)
|> System.cmd([~w(-a), source, target])
</syntaxhighlight>For a short job this would be fine, but during longer transfers our program loses control and we have to wait indefinitely for the monolithic command to finish.

=== Asynchronous call and communication ===
To run a external process asynchronously we will reach for Elixir's low-level <code>Port.open</code> which passes all of its parameters directly<ref>See the [https://github.com/elixir-lang/elixir/blob/809b035dccf046b7b7b4422f42cfb6d075df71d2/lib/elixir/lib/port.ex#L232 port.ex source code]</ref> to ERTS <code>open_port</code><ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref>. These functions are tremendously flexible, here we turn a few knobs:<syntaxhighlight lang="elixir">
Port.open(
{:spawn_executable, rsync_path},
[
:binary,
:exit_status,
:hide,
:use_stdio,
:stderr_to_stdout,
args:
~w(-a --info=progress2) ++
rsync_args ++
sources ++
[args[:target]],
env: env
]
)
</syntaxhighlight>

Progress lines come in with a fairly self-explanatory format:
<syntaxhighlight lang="text">
3,342,336 33% 3.14MB/s 0:00:02
</syntaxhighlight>

{{Aside|text=
rsync has a variety of progress options, we chose overall progress above so the meaning of the percentage is "overall percent complete".

Here is the menu of alternatives:

; <code>--info=progress2</code> : report overall progress

; <code>--progress</code> : report statistics per file

; <code>--itemize-changes</code> : list the operations taken on each file

; <code>--out-format=FORMAT</code> : any format using parameters from rsyncd.conf's <code>log format</code><ref>https://man.freebsd.org/cgi/man.cgi?query=rsyncd.conf</ref>
}}

Each rsync output line is sent to the library's <code>handle_info</code> callback as <code>{:data, line}</code> and after the transfer is finished we receive a conclusive <code>{:exit_status, status_code}</code>.

We extract the percent_done column and strictly reject any other output:
<syntaxhighlight lang="elixir">
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
percent_done_text when is_binary(percent_done_text) <- Enum.at(terms, 1),
{percent_done, "%"} <- Float.parse(percent_done_text) do
percent_done
else
_ ->
{:unknown, line}
end
</syntaxhighlight>The <code>trim</code> lets us ignore spacing and newline trickery—or even a leading carriage return as you can see in the rsync source code,
<syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>The carriage return <code>\r</code> deserves a special mention: this "control" character is just a byte in the binary data coming over the pipe from rsync, but it plays a control function because of how the tty interprets it. On the terminal the effect is to overwrite the current line!

A repeated theme is that data and control are leaky categories. We come to the more formal control side channels later.
{{Aside|text=
[[File:Chinese typewriter 03.jpg|right|200x200px]]

On the terminal, rsync progress lines are updated in place by emitting a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>. The character seems to be named after pushing the physical paper carriage of a typewriter back to the beginning of the line without feeding the roller.

[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreement about carriage return]] vs. newline has caused eye-rolling since the dawn of personal computing.

[[File:Nilgais fighting, Lakeshwari, Gwalior district, India.jpg|left|200x200px]]
}}

== OTP generic server ==
This is where Erlang/OTP really starts to shine: our rsync library wraps the Port calls under a gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> module and this gives us some special properties for free: a dedicated thread which coordinates with rsync independently from anything else, receiving and sending asynchronous messages. It has an internal state including the latest percent done and this can be probed by calling code, or it can be set up to push updates to a listener.

A gen_server should be able to run under a [https://adoptingerlang.org/docs/development/supervision_trees/ OTP supervision tree] as well but our module has a major flaw: although it can correctly detect and report when rsync crashes or completes, when our gen_server is stopped by its supervisor it cannot stop its external child process in turn.

== Problem: runaway processes ==
[[File:CargoNet Di 12 Euro 4000 Lønsdal - Bolna.jpg|thumb]]
What this means is that rsync transfers would continue to run in the background even after Elixir had completely shut down, because the BEAM has no way of stopping the process.

To check whether this was something specific to rsync, I tried to open a Port spawning the command <code>sleep 60</code> and I found that it behaves exactly the same way, hanging until the sleep ends naturally regardless of what happened in Elixir or whether its pipes are still open.

== Bad assumption: pipe-like processes ==
A program like <code>gzip</code> or <code>cat</code> will stop once it detects that its input has ended because the main loop usually makes a C system call to <code>read</code> like this:<syntaxhighlight lang="c">
ssize_t n_read = read (input_desc, buf, bufsize);
if (n_read < 0) { error... }
if (n_read == 0) { end of file... }
</syntaxhighlight>The manual for read<ref>https://man.archlinux.org/man/read.2</ref> explains that reading 0 bytes indicates the end of file, and a negative number indicates an error such as the input file descriptor already being closed. If you think this sounds weird, I would agree: how do we tell the difference between a stream which is stalled and one which has ended? Does the calling process yield control until input arrives? How do we know if more than bufsize bytes are available? If that word salad excites you, read more about <code>O_NONBLOCK</code><ref>https://man.archlinux.org/man/open.2.en#O_NONBLOCK</ref> and unix pipes<ref>https://man.archlinux.org/man/pipe.7.en</ref>.

But here we'll focus on how processes affect each other through pipes. Surprising answer: it doesn't affect very much! Try opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file. Oh no, you killed it! You didn't actually send anything, though—the <control>-d is interpreted by bash and it responds by closing its pipe connected to "[[w:Standard streams|standard input]]" of the child process. This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe. My entry point to learning more is this stty webzine<ref>https://wizardzines.com/comics/stty/</ref> by Julia Evans. Go ahead and try this command, what could go wrong: <code>stty -a</code>

Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a hard reality. You could even reopen stdin from the application, to the great surprise of your friends and neighbors. For example, try opening "watch ls" or "sleep 60" and try <control>-d all you want—no effect. You did close its stdin but nobody cared, it wasn't listening to you anyway.

Back to the problem at hand, "rsync" is in this latter category of "daemon-like" programs which will carry on even after standard input is closed. This makes sense enough, since rsync isn't interactive and any output is just a side effect of its main purpose.

== Shimming can kill ==
It's possible to write a small adapter which is sensitive to stdin closing, then converts this into a stronger signal like SIGTERM which it forwards to its own child. This is the idea behind a suggested shell script<ref>https://hexdocs.pm/elixir/1.19.0/Port.html#module-orphan-operating-system-processes</ref> for Elixir and the erlexec<ref>[https://hexdocs.pm/erlexec/readme.html https://hexdocs.pm/erlexec/]</ref> library. The opposite adapter is also found in the [[w:nohup|nohup]] shell command and the grimsby<ref>https://github.com/shortishly/grimsby</ref> library: these will keep standard in and/or standard out open for the child process even after the parent exits.

I took the shim approach with my rsync library and included a small C program<ref>https://gitlab.com/adamwight/rsync_ex/-/blob/main/src/main.c?ref_type=heads</ref> which wraps rsync and makes it sensitive to the BEAM port_close. It's featherweight, leaving pipes unchanged as it passes control to rsync—its only real effect is to convert SIGHUP to SIGKILL (but should have been SIGTERM, see the sidebar discussion of different signals below).

== Reliable clean up ==
{{Project|status=in review|url=https://erlangforums.com/t/open-port-and-zombie-processes|source=https://github.com/erlang/otp/pull/9453}}
It's always a pleasure to ask questions in the BEAM communities, they have earned their reputation for being friendly and open. The first big tip was to look at the third-party library [https://hexdocs.pm/erlexec/ erlexec], which demonstrates emerging best practices which could be backported into the language itself. Everyone speaking on the problem has generally agreed that the fragile clean up of external processes is a bug, and supported the idea that some flavor of "terminate" signal should be sent to spawned programs.

I would be lying to hide my disappointment that the required core changes are mostly in a C program and not actually in Erlang, but it was still fascinating to open such an elegant black box and find the technological equivalent of a steam engine inside. All of the futuristic, high-level features we've come to know actually map closely to a few scraps of wizardry with ordinary pipes, using stdlib read, write, and select<ref>https://man.archlinux.org/man/select.2.en</ref>.

Port drivers<ref>https://www.erlang.org/doc/system/ports.html</ref> are fundamental to ERTS and external processes are launched through several levels of wiring: the spawn driver starts a forker driver which sends a control message to <code>erl_child_setup</code> to execute your external command. Each BEAM has a single erl_child_setup process to watch over all children.

Letting a child process outlive the one that spawned leaves it in a state called an "orphaned process" in POSIX, and the standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists. This can be seen as undesirable because unix itself has a paradigm similar to OTP's Supervisors, in which each parent is responsible for its children. Without supervision, a process could potentially run forever or do naughty things. The system <code>init</code> process starts and tracks its own children, and can restart them in response to service commands. But init will know nothing about adopted, orphan processes or how to monitor and restart them.

The patch [https://github.com/erlang/otp/pull/9453 PR#9453] adapting port_close to SIGTERM is waiting for review and responses look generally positive so far.

{{Aside|text='''Which signal?'''

Which signal to use is still an open question:

; <code>HUP</code> : the softest "Goodbye!" that a program is free to interpret as it wishes

; <code>TERM</code> : has a clear intention of "kill this thing" but still possible to trap at the target and handle in a customized way

; <code>KILL</code> : bursting with destructive potential, this signal cannot be stopped and you may not clean up

There is a refreshing diversity of opinion, so it could be worthwhile to make the signal configurable for each port.
}}

== Future directions ==
Discussion threads also included some notable grumbling about the Port API in general, it seems this part of ERTS is overdue for a larger redesign.

There's a good opportunity to unify the different platform implementations: Windows lacks the erl_child_setup layer entirely, for example.

Another idea to borrow from the erlexec library is to have an option to kill the entire process group of a child, which is shared by any descendants that haven't explicitly broken out of its original group. This would be useful for managing deep trees of external processes launched by a forked command.

== References ==

Elixir/Ports and external process wiring

2025-10-19T10:39:13Z

Adamw: c/e first page

A deceivingly simple programming adventure veers unexpectedly into piping and signaling between unix processes.

== Context: controlling "rsync" ==
{{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}}

My exploration begins while writing a beta-quality library for Elixir to transfer files in the background and monitor progress, using rsync.

{{Aside|text=[[w:rsync|Rsync]] is usually the best tool for file transfer, locally or over a network. It can resume incomplete transfers and synchronize directories efficiently, and it's complex enough that nobody is reimplementing it in pure Erlang any time soon.}}

I was excited to learn how to interface with long-lived external processes—and this project offered more than I hoped for.

[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|300x300px]]

=== Naive shelling ===

Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
System.shell("rsync -a source target")
</syntaxhighlight>
This has a few shortcomings, starting with how we pass the filenames. It's possible to have a dynamic path coming from string interpolation like <code>#{source}</code> but this gets risky: consider what happens if the filenames include whitespace or even special shell characters such as ";".

=== Safe path handling ===
Skipping ahead to <code>System.cmd</code>, which takes a raw argv and can't be fooled special characters in the path arguments:<syntaxhighlight lang="elixir">
System.find_executable(rsync_path)
|> System.cmd([~w(-a), source, target])
</syntaxhighlight>For a short job this would be fine, but during longer transfers our program loses control and we have to wait indefinitely for the monolithic command to finish.

=== Asynchronous call and communication ===
To run a external process asynchronously we will reach for Elixir's low-level <code>Port.open</code> which passes all of its parameters directly<ref>See the [https://github.com/elixir-lang/elixir/blob/809b035dccf046b7b7b4422f42cfb6d075df71d2/lib/elixir/lib/port.ex#L232 port.ex source code]</ref> to ERTS <code>open_port</code><ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref>. These functions are tremendously flexible, here we turn a few knobs:<syntaxhighlight lang="elixir">
Port.open(
{:spawn_executable, rsync_path},
[
:binary,
:exit_status,
:hide,
:use_stdio,
:stderr_to_stdout,
args:
~w(-a --info=progress2) ++
rsync_args ++
sources ++
[args[:target]],
env: env
]
)
</syntaxhighlight>

Progress lines come in with a fairly self-explanatory format:
<syntaxhighlight lang="text">
3,342,336 33% 3.14MB/s 0:00:02
</syntaxhighlight>

{{Aside|text=
rsync has a variety of progress options, we chose overall progress above so the meaning of the percentage is "overall percent complete".

Here is the menu of alternatives:

; <code>--info=progress2</code> : report overall progress

; <code>--progress</code> : report statistics per file

; <code>--itemize-changes</code> : list the operations taken on each file

; <code>--out-format=FORMAT</code> : any format using parameters from rsyncd.conf's <code>log format</code><ref>https://man.freebsd.org/cgi/man.cgi?query=rsyncd.conf</ref>
}}

Each rsync output line is sent to the library's <code>handle_info</code> callback as <code>{:data, line}</code> and after the transfer is finished we receive a conclusive <code>{:exit_status, status_code}</code>.

We extract the percent_done column and strictly reject any other output:
<syntaxhighlight lang="elixir">
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
percent_done_text when is_binary(percent_done_text) <- Enum.at(terms, 1),
{percent_done, "%"} <- Float.parse(percent_done_text) do
percent_done
else
_ ->
{:unknown, line}
end
</syntaxhighlight>The <code>trim</code> lets us ignore spacing and newline trickery—or even a leading carriage return as you can see in the rsync source code,
<syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>The carriage return <code>\r</code> deserves a special mention: this "control" character is just a byte in the binary data coming over the pipe from rsync, but it plays a control function because of how the tty interprets it. On the terminal the effect is to overwrite the current line!

A repeated theme is that data and control are leaky categories. We come to the more formal control side channels later.
{{Aside|text=
[[File:Chinese typewriter 03.jpg|right|200x200px]]

On the terminal, rsync progress lines are updated in place by emitting a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>. The character seems to be named after pushing the physical paper carriage of a typewriter back to the beginning of the line without feeding the roller.

[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreement about carriage return]] vs. newline has caused eye-rolling since the dawn of personal computing.

[[File:Nilgais fighting, Lakeshwari, Gwalior district, India.jpg|left|200x200px]]
}}

== OTP generic server ==
This is where Erlang/OTP really starts to shine: our rsync library wraps the Port calls under a gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> module and this gives us some special properties for free: a dedicated thread which coordinates with rsync independently from anything else, receiving and sending asynchronous messages. It has an internal state including the latest percent done and this can be probed by calling code, or it can be set up to push updates to a listener.

A gen_server should be able to run under a [https://adoptingerlang.org/docs/development/supervision_trees/ OTP supervision tree] as well but our module has a major flaw: although it can correctly detect and report when rsync crashes or completes, when our gen_server is stopped by its supervisor it cannot stop its external child process in turn.

== Problem: runaway processes ==
[[File:CargoNet Di 12 Euro 4000 Lønsdal - Bolna.jpg|thumb]]
What this means is that rsync transfers would continue to run in the background even after Elixir had completely shut down, because the BEAM has no way of stopping the process.

To check whether this was something specific to rsync, I tried to open a Port spawning the command <code>sleep 60</code> and I found that it behaves exactly the same way, hanging until the sleep ends naturally regardless of what happened in Elixir or whether its pipes are still open.

== Bad assumption: pipe-like processes ==
A program like <code>gzip</code> or <code>cat</code> will stop once it detects that its input has ended because the main loop usually makes a C system call to <code>read</code> like this:<syntaxhighlight lang="c">
ssize_t n_read = read (input_desc, buf, bufsize);
if (n_read < 0) { error... }
if (n_read == 0) { end of file... }
</syntaxhighlight>The manual for read<ref>https://man.archlinux.org/man/read.2</ref> explains that reading 0 bytes indicates the end of file, and a negative number indicates an error such as the input file descriptor already being closed. If you think this sounds weird, I would agree: how do we tell the difference between a stream which is stalled and one which has ended? Does the calling process yield control until input arrives? How do we know if more than bufsize bytes are available? If that word salad excites you, read more about <code>O_NONBLOCK</code><ref>https://man.archlinux.org/man/open.2.en#O_NONBLOCK</ref> and unix pipes<ref>https://man.archlinux.org/man/pipe.7.en</ref>.

But here we'll focus on how processes affect each other through pipes. Surprising answer: it doesn't affect very much! Try opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file. Oh no, you killed it! You didn't actually send anything, though—the <control>-d is interpreted by bash and it responds by closing its pipe connected to "[[w:Standard streams|standard input]]" of the child process. This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe. My entry point to learning more is this stty webzine<ref>https://wizardzines.com/comics/stty/</ref> by Julia Evans. Go ahead and try this command, what could go wrong: <code>stty -a</code>

Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a hard reality. You could even reopen stdin from the application, to the great surprise of your friends and neighbors. For example, try opening "watch ls" or "sleep 60" and try <control>-d all you want—no effect. You did close its stdin but nobody cared, it wasn't listening to you anyway.

Back to the problem at hand, "rsync" is in this latter category of "daemon-like" programs which will carry on even after standard input is closed. This makes sense enough, since rsync isn't interactive and any output is just a side effect of its main purpose.

== Shimming can kill ==
It's possible to write a small adapter which is sensitive to stdin closing, then converts this into a stronger signal like SIGTERM which it forwards to its own child. This is the idea behind a suggested shell script<ref>https://hexdocs.pm/elixir/1.19.0/Port.html#module-orphan-operating-system-processes</ref> for Elixir and the erlexec<ref>[https://hexdocs.pm/erlexec/readme.html https://hexdocs.pm/erlexec/]</ref> library. The opposite adapter is also found in the [[w:nohup|nohup]] shell command and the grimsby<ref>https://github.com/shortishly/grimsby</ref> library: these will keep standard in and/or standard out open for the child process even after the parent exits.

I took the shim approach with my rsync library and included a small C program<ref>https://gitlab.com/adamwight/rsync_ex/-/blob/main/src/main.c?ref_type=heads</ref> which wraps rsync and makes it sensitive to the BEAM port_close. It's featherweight, leaving pipes unchanged as it passes control to rsync—its only real effect is to convert SIGHUP to SIGKILL (but should have been SIGTERM, see the sidebar discussion of different signals below).

== Reliable clean up ==
{{Project|status=in review|url=https://erlangforums.com/t/open-port-and-zombie-processes|source=https://github.com/erlang/otp/pull/9453}}
It's always a pleasure to ask questions in the BEAM communities, they have earned their reputation for being friendly and open. The first big tip was to look at the third-party library [https://hexdocs.pm/erlexec/ erlexec], which demonstrates emerging best practices which could be backported into the language itself. Everyone speaking on the problem has generally agreed that the fragile clean up of external processes is a bug, and supported the idea that some flavor of "terminate" signal should be sent to spawned programs.

I would be lying to hide my disappointment that the required core changes are mostly in a C program and not actually in Erlang, but it was still fascinating to open such an elegant black box and find the technological equivalent of a steam engine inside. All of the futuristic, high-level features we've come to know actually map closely to a few scraps of wizardry with ordinary pipes, using stdlib read, write, and select<ref>https://man.archlinux.org/man/select.2.en</ref>.

Port drivers<ref>https://www.erlang.org/doc/system/ports.html</ref> are fundamental to ERTS and external processes are launched through several levels of wiring: the spawn driver starts a forker driver which sends a control message to <code>erl_child_setup</code> to execute your external command. Each BEAM has a single erl_child_setup process to watch over all children.

Letting a child process outlive the one that spawned leaves it in a state called an "orphaned process" in POSIX, and the standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists. This can be seen as undesirable because unix itself has a paradigm similar to OTP's Supervisors, in which each parent is responsible for its children. Without supervision, a process could potentially run forever or do naughty things. The system <code>init</code> process starts and tracks its own children, and can restart them in response to service commands. But init will know nothing about adopted, orphan processes or how to monitor and restart them.

The patch [https://github.com/erlang/otp/pull/9453 PR#9453] adapting port_close to SIGTERM is waiting for review and responses look generally positive so far.

{{Aside|text='''Which signal?'''

Which signal to use is still an open question:

; <code>HUP</code> : the softest "Goodbye!" that a program is free to interpret as it wishes

; <code>TERM</code> : has a clear intention of "kill this thing" but still possible to trap at the target and handle in a customized way

; <code>KILL</code> : bursting with destructive potential, this signal cannot be stopped and you may not clean up

There is a refreshing diversity of opinion, so it could be worthwhile to make the signal configurable for each port.
}}

== Future directions ==
Discussion threads also included some notable grumbling about the Port API in general, it seems this part of ERTS is overdue for a larger redesign.

There's a good opportunity to unify the different platform implementations: Windows lacks the erl_child_setup layer entirely, for example.

Another idea to borrow from the erlexec library is to have an option to kill the entire process group of a child, which is shared by any descendants that haven't explicitly broken out of its original group. This would be useful for managing deep trees of external processes launched by a forked command.

== References ==

Elixir/Ports and external process wiring

2025-10-17T12:12:42Z

Adamw: future directions section

A deceivingly simple programming adventure veers unexpectedly into piping and signaling between unix processes.

== Context: controlling "rsync" ==
{{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}}

My exploration begins while writing a beta-quality rsync library for Elixir which transfers files in the background while monitoring progress. Rsync is the best tool for this since it can resume incomplete transfers and synchronize directories efficiently and it's complex enough that nobody will reimplement it in pure Erlang. I had hoped that this project would teach me how to interface with long-lived external processes—and I learned more than I wished for.

[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|300x300px]]

Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
System.shell("rsync -a source target")
</syntaxhighlight>
This has a few shortcomings, such as the static filenames—it feels unsafe to even demonstrate how string interpolation like <code>#{source}</code> could be misused to make this dynamic so let's skip ahead to how to <code>System.cmd</code> which is safer because it doesn't expand its argv:<syntaxhighlight lang="elixir">
System.find_executable(rsync_path)
|> System.cmd([~w(-a), source, target])
</syntaxhighlight>Better but the calling thread loses control and gets no feedback until the transfer is complete.

To run a external process asynchronously we will reach for Elixir's low-level <code>Port.open</code> which maps directly to ERTS <code>open_port</code><ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref>. These functions are tremendously flexible, and here we demonstrate how to turn a few knobs:<syntaxhighlight lang="elixir">
Port.open(
{:spawn_executable, rsync_path},
[
:binary,
:exit_status,
:hide,
:use_stdio,
:stderr_to_stdout,
args:
~w(-a --info=progress2) ++
rsync_args ++
sources ++
[args[:target]],
env: env
]
)
</syntaxhighlight>

Progress lines come in with a fairly self-explanatory format:
<syntaxhighlight lang="text">
3,342,336 33% 3.14MB/s 0:00:02
</syntaxhighlight>

{{Aside|text=
rsync has a variety of progress options, we chose overall progress above so the meaning of the percentage is "overall percent complete".

Here is the menu of alternatives:

; <code>--info=progress2</code> : report overall progress

; <code>--progress</code> : report statistics per file

; <code>--itemize-changes</code> : list the operations taken on each file

; <code>--out-format=FORMAT</code> : any format using parameters from rsyncd.conf's <code>log format</code><ref>https://man.freebsd.org/cgi/man.cgi?query=rsyncd.conf</ref>
}}

Each rsync output line is sent to the library's <code>handle_info</code> callback as <code>{:data, line}</code> and after the transfer is finished we receive a conclusive <code>{:exit_status, status_code}</code>.

We extract the percent_done column and strictly reject any other output:
<syntaxhighlight lang="elixir">
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
percent_done_text when is_binary(percent_done_text) <- Enum.at(terms, 1),
{percent_done, "%"} <- Float.parse(percent_done_text) do
percent_done
else
_ ->
{:unknown, line}
end
</syntaxhighlight>The <code>trim</code> lets us ignore spacing and newline trickery—or even a leading carriage return as you can see in the rsync source code,
<syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>The carriage return <code>\r</code> deserves a special mention: this "control" character is just a byte in the binary data coming over the pipe from rsync, but it plays a control function because of how the tty interprets it. On the terminal the effect is to overwrite the current line!

A repeated theme is that data and control are leaky categories. We come to the more formal control side channels later.
{{Aside|text=
[[File:Chinese typewriter 03.jpg|right|200x200px]]

On the terminal, rsync progress lines are updated in place by emitting a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>. The character seems to be named after pushing the physical paper carriage of a typewriter back to the beginning of the line without feeding the roller.

[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreement about carriage return]] vs. newline has caused eye-rolling since the dawn of personal computing.

[[File:Nilgais fighting, Lakeshwari, Gwalior district, India.jpg|left|200x200px]]
}}

== OTP generic server ==
This is where Erlang/OTP really starts to shine: our rsync library wraps the Port calls under a gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> module and this gives us some special properties for free: a dedicated thread which coordinates with rsync independently from anything else, receiving and sending asynchronous messages. It has an internal state including the latest percent done and this can be probed by calling code, or it can be set up to push updates to a listener.

A gen_server should be able to run under a [https://adoptingerlang.org/docs/development/supervision_trees/ OTP supervision tree] as well but our module has a major flaw: although it can correctly detect and report when rsync crashes or completes, when our gen_server is stopped by its supervisor it cannot stop its external child process in turn.

== Problem: runaway processes ==
[[File:CargoNet Di 12 Euro 4000 Lønsdal - Bolna.jpg|thumb]]
What this means is that rsync transfers would continue to run in the background even after Elixir had completely shut down, because the BEAM has no way of stopping the process.

To check whether this was something specific to rsync, I tried to open a Port spawning the command <code>sleep 60</code> and I found that it behaves exactly the same way, hanging until the sleep ends naturally regardless of what happened in Elixir or whether its pipes are still open.

== Bad assumption: pipe-like processes ==
A program like <code>gzip</code> or <code>cat</code> will stop once it detects that its input has ended because the main loop usually makes a C system call to <code>read</code> like this:<syntaxhighlight lang="c">
ssize_t n_read = read (input_desc, buf, bufsize);
if (n_read < 0) { error... }
if (n_read == 0) { end of file... }
</syntaxhighlight>The manual for read<ref>https://man.archlinux.org/man/read.2</ref> explains that reading 0 bytes indicates the end of file, and a negative number indicates an error such as the input file descriptor already being closed. If you think this sounds weird, I would agree: how do we tell the difference between a stream which is stalled and one which has ended? Does the calling process yield control until input arrives? How do we know if more than bufsize bytes are available? If that word salad excites you, read more about <code>O_NONBLOCK</code><ref>https://man.archlinux.org/man/open.2.en#O_NONBLOCK</ref> and unix pipes<ref>https://man.archlinux.org/man/pipe.7.en</ref>.

But here we'll focus on how processes affect each other through pipes. Surprising answer: it doesn't affect very much! Try opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file. Oh no, you killed it! You didn't actually send anything, though—the <control>-d is interpreted by bash and it responds by closing its pipe connected to "[[w:Standard streams|standard input]]" of the child process. This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe. My entry point to learning more is this stty webzine<ref>https://wizardzines.com/comics/stty/</ref> by Julia Evans. Go ahead and try this command, what could go wrong: <code>stty -a</code>

Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a hard reality. You could even reopen stdin from the application, to the great surprise of your friends and neighbors. For example, try opening "watch ls" or "sleep 60" and try <control>-d all you want—no effect. You did close its stdin but nobody cared, it wasn't listening to you anyway.

Back to the problem at hand, "rsync" is in this latter category of "daemon-like" programs which will carry on even after standard input is closed. This makes sense enough, since rsync isn't interactive and any output is just a side effect of its main purpose.

== Shimming can kill ==
It's possible to write a small adapter which is sensitive to stdin closing, then converts this into a stronger signal like SIGTERM which it forwards to its own child. This is the idea behind a suggested shell script<ref>https://hexdocs.pm/elixir/1.19.0/Port.html#module-orphan-operating-system-processes</ref> for Elixir and the erlexec<ref>[https://hexdocs.pm/erlexec/readme.html https://hexdocs.pm/erlexec/]</ref> library. The opposite adapter is also found in the [[w:nohup|nohup]] shell command and the grimsby<ref>https://github.com/shortishly/grimsby</ref> library: these will keep standard in and/or standard out open for the child process even after the parent exits.

I took the shim approach with my rsync library and included a small C program<ref>https://gitlab.com/adamwight/rsync_ex/-/blob/main/src/main.c?ref_type=heads</ref> which wraps rsync and makes it sensitive to the BEAM port_close. It's featherweight, leaving pipes unchanged as it passes control to rsync—its only real effect is to convert SIGHUP to SIGKILL (but should have been SIGTERM, see the sidebar discussion of different signals below).

== Reliable clean up ==
{{Project|status=in review|url=https://erlangforums.com/t/open-port-and-zombie-processes|source=https://github.com/erlang/otp/pull/9453}}
It's always a pleasure to ask questions in the BEAM communities, they have earned their reputation for being friendly and open. The first big tip was to look at the third-party library [https://hexdocs.pm/erlexec/ erlexec], which demonstrates emerging best practices which could be backported into the language itself. Everyone speaking on the problem has generally agreed that the fragile clean up of external processes is a bug, and supported the idea that some flavor of "terminate" signal should be sent to spawned programs.

I would be lying to hide my disappointment that the required core changes are mostly in a C program and not actually in Erlang, but it was still fascinating to open such an elegant black box and find the technological equivalent of a steam engine inside. All of the futuristic, high-level features we've come to know actually map closely to a few scraps of wizardry with ordinary pipes, using stdlib read, write, and select<ref>https://man.archlinux.org/man/select.2.en</ref>.

Port drivers<ref>https://www.erlang.org/doc/system/ports.html</ref> are fundamental to ERTS and external processes are launched through several levels of wiring: the spawn driver starts a forker driver which sends a control message to <code>erl_child_setup</code> to execute your external command. Each BEAM has a single erl_child_setup process to watch over all children.

Letting a child process outlive the one that spawned leaves it in a state called an "orphaned process" in POSIX, and the standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists. This can be seen as undesirable because unix itself has a paradigm similar to OTP's Supervisors, in which each parent is responsible for its children. Without supervision, a process could potentially run forever or do naughty things. The system <code>init</code> process starts and tracks its own children, and can restart them in response to service commands. But init will know nothing about adopted, orphan processes or how to monitor and restart them.

The patch [https://github.com/erlang/otp/pull/9453 PR#9453] adapting port_close to SIGTERM is waiting for review and responses look generally positive so far.

{{Aside|text='''Which signal?'''

Which signal to use is still an open question:

; <code>HUP</code> : the softest "Goodbye!" that a program is free to interpret as it wishes

; <code>TERM</code> : has a clear intention of "kill this thing" but still possible to trap at the target and handle in a customized way

; <code>KILL</code> : bursting with destructive potential, this signal cannot be stopped and you may not clean up

There is a refreshing diversity of opinion, so it could be worthwhile to make the signal configurable for each port.
}}

== Future directions ==
Discussion threads also included some notable grumbling about the Port API in general, it seems this part of ERTS is overdue for a larger redesign.

There's a good opportunity to unify the different platform implementations: Windows lacks the erl_child_setup layer entirely, for example.

Another idea to borrow from the erlexec library is to have an option to kill the entire process group of a child, which is shared by any descendants that haven't explicitly broken out of its original group. This would be useful for managing deep trees of external processes launched by a forked command.

== References ==

Draft:Elixir/Ports and external process wiring

2025-10-17T11:57:57Z

Adamw: Adamw moved page Draft:Elixir/Ports and external process wiring to Elixir/Ports and external process wiring

#REDIRECT [[Elixir/Ports and external process wiring]]

Elixir/Ports and external process wiring

2025-10-17T11:57:57Z

Adamw: Adamw moved page Draft:Elixir/Ports and external process wiring to Elixir/Ports and external process wiring

Elixir/Ports and external process wiring

2025-10-17T11:57:43Z

Adamw: /* Reliable clean up */

Elixir/Ports and external process wiring

2025-10-17T11:56:06Z

Adamw: /* Shimming can kill */

Elixir/Ports and external process wiring

2025-10-17T11:54:44Z

Adamw: /* Bad assumption: pipe-like processes */

Elixir/Ports and external process wiring

2025-10-17T11:53:13Z

Adamw: /* Problem: runaway processes */

A deceivingly simple programming adventure veers unexpectedly into piping and signaling between unix processes.

== Context: controlling "rsync" ==
{{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}}

My exploration begins while writing a beta-quality rsync library for Elixir which transfers files in the background while monitoring progress. Rsync is the best tool for this since it can resume incomplete transfers and synchronize directories efficiently and it's complex enough that nobody will reimplement it in pure Erlang. I had hoped that this project would teach me how to interface with long-lived external processes—and I learned more than I wished for.

[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|300x300px]]

Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
System.shell("rsync -a source target")
</syntaxhighlight>
This has a few shortcomings, such as the static filenames—it feels unsafe to even demonstrate how string interpolation like <code>#{source}</code> could be misused to make this dynamic so let's skip ahead to how to <code>System.cmd</code> which is safer because it doesn't expand its argv:<syntaxhighlight lang="elixir">
System.find_executable(rsync_path)
|> System.cmd([~w(-a), source, target])
</syntaxhighlight>Better but the calling thread loses control and gets no feedback until the transfer is complete.

To run a external process asynchronously we will reach for Elixir's low-level <code>Port.open</code> which maps directly to ERTS <code>open_port</code><ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref>. These functions are tremendously flexible, and here we demonstrate how to turn a few knobs:<syntaxhighlight lang="elixir">
Port.open(
{:spawn_executable, rsync_path},
[
:binary,
:exit_status,
:hide,
:use_stdio,
:stderr_to_stdout,
args:
~w(-a --info=progress2) ++
rsync_args ++
sources ++
[args[:target]],
env: env
]
)
</syntaxhighlight>

Progress lines come in with a fairly self-explanatory format:
<syntaxhighlight lang="text">
3,342,336 33% 3.14MB/s 0:00:02
</syntaxhighlight>

{{Aside|text=
rsync has a variety of progress options, we chose overall progress above so the meaning of the percentage is "overall percent complete".

Here is the menu of alternatives:

; <code>--info=progress2</code> : report overall progress

; <code>--progress</code> : report statistics per file

; <code>--itemize-changes</code> : list the operations taken on each file

; <code>--out-format=FORMAT</code> : any format using parameters from rsyncd.conf's <code>log format</code><ref>https://man.freebsd.org/cgi/man.cgi?query=rsyncd.conf</ref>
}}

Each rsync output line is sent to the library's <code>handle_info</code> callback as <code>{:data, line}</code> and after the transfer is finished we receive a conclusive <code>{:exit_status, status_code}</code>.

We extract the percent_done column and strictly reject any other output:
<syntaxhighlight lang="elixir">
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
percent_done_text when is_binary(percent_done_text) <- Enum.at(terms, 1),
{percent_done, "%"} <- Float.parse(percent_done_text) do
percent_done
else
_ ->
{:unknown, line}
end
</syntaxhighlight>The <code>trim</code> lets us ignore spacing and newline trickery—or even a leading carriage return as you can see in the rsync source code,
<syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>The carriage return <code>\r</code> deserves a special mention: this "control" character is just a byte in the binary data coming over the pipe from rsync, but it plays a control function because of how the tty interprets it. On the terminal the effect is to overwrite the current line!

A repeated theme is that data and control are leaky categories. We come to the more formal control side channels later.
{{Aside|text=
[[File:Chinese typewriter 03.jpg|right|200x200px]]

On the terminal, rsync progress lines are updated in place by emitting a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>. The character seems to be named after pushing the physical paper carriage of a typewriter back to the beginning of the line without feeding the roller.

[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreement about carriage return]] vs. newline has caused eye-rolling since the dawn of personal computing.

[[File:Nilgais fighting, Lakeshwari, Gwalior district, India.jpg|left|200x200px]]
}}

== OTP generic server ==
This is where Erlang/OTP really starts to shine: our rsync library wraps the Port calls under a gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> module and this gives us some special properties for free: a dedicated thread which coordinates with rsync independently from anything else, receiving and sending asynchronous messages. It has an internal state including the latest percent done and this can be probed by calling code, or it can be set up to push updates to a listener.

A gen_server should be able to run under a [https://adoptingerlang.org/docs/development/supervision_trees/ OTP supervision tree] as well but our module has a major flaw: although it can correctly detect and report when rsync crashes or completes, when our gen_server is stopped by its supervisor it cannot stop its external child process in turn.

== Problem: runaway processes ==
[[File:CargoNet Di 12 Euro 4000 Lønsdal - Bolna.jpg|thumb]]
What this means is that rsync transfers would continue to run in the background even after Elixir had completely shut down, because the BEAM has no way of stopping the process.

To check whether this was something specific to rsync, I tried to open a Port spawning the command <code>sleep 60</code> and I found that it behaves exactly the same way, hanging until the sleep ends naturally regardless of what happened in Elixir or whether its pipes are still open.

== Bad assumption: pipe-like processes ==
A straightforward use case for external processes would be to run a standard transformation such as compression or decompression. A program like <code>gzip</code> or <code>cat</code> will stop once it detects that its input has ended, because the main loop usually makes a C system call to <code>read</code> like this:<syntaxhighlight lang="c">
ssize_t n_read = read (input_desc, buf, bufsize);
if (n_read < 0) { error... }
if (n_read == 0) { end of file... }
</syntaxhighlight>The manual for read<ref>https://man.archlinux.org/man/read.2</ref> explains that reading 0 bytes indicates the end of file, and a negative number indicates an error such as the input file descriptor already being closed. If you think this sounds weird, I would agree: how do we tell the difference between a stream which is stalled and one which has ended? Does the calling process yield control until input arrives? How do we know if more than bufsize bytes are available? If that word salad excites you, read more about <code>O_NONBLOCK</code><ref>https://man.archlinux.org/man/open.2.en#O_NONBLOCK</ref> and unix pipes<ref>https://man.archlinux.org/man/pipe.7.en</ref>.

But here we'll focus on how processes affect each other through pipes. Surprising answer: they don't affect very much! Try opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file. Oh no, you killed it! You didn't actually send anything, though—the <control>-d is interpreted by bash and it responds by closing its pipe connected to "[[w:Standard streams|standard input]]" of the child process. This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe. My entry point to learning more is this stty webzine<ref>https://wizardzines.com/comics/stty/</ref> by Julia Evans. Go ahead and try this command, what could go wrong: <code>stty -a</code>

Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a hard reality. You could even reopen stdin from the application, to the great surprise of your friends and neighbors. For example, try opening "watch ls" or "sleep 60" and try <control>-d all you want—no effect. You did close its stdin but nobody cared, it wasn't listening to you anyway.

Back to the problem at hand, "rsync" is in this latter category of "daemon-like" programs which will carry on even after standard input is closed. This makes sense enough, since rsync isn't interactive and any output is just a side effect of its main purpose.

== Shimming can kill ==
It's possible to write a small adapter which is sensitive to stdin closing, then converts this into a stronger signal like SIGTERM which it forwards to its own child. This is the idea behind a suggested shell script<ref>https://hexdocs.pm/elixir/1.19.0/Port.html#module-orphan-operating-system-processes</ref> for Elixir and the erlexec<ref>[https://hexdocs.pm/erlexec/readme.html https://hexdocs.pm/erlexec/]</ref> library. The opposite adapter is also found in the [[w:nohup|nohup]] shell command and the grimsby<ref>https://github.com/shortishly/grimsby</ref> library: these will keep standard in or out open for the child process even after the parent exits.

I took this approach with my rsync library and included a small C program<ref>https://gitlab.com/adamwight/rsync_ex/-/blob/main/src/main.c?ref_type=heads</ref> which wraps rsync and makes it sensitive to the BEAM port_close. It's featherweight, leaving pipes unchanged as it passes control to rsync—its only real effect is to convert SIGHUP to SIGKILL (see the sidebar discussion of different signals below).

== Reliable clean up ==
{{Project|status=in review|url=https://erlangforums.com/t/open-port-and-zombie-processes|source=https://github.com/erlang/otp/pull/9453}}
It's always a pleasure to ask questions in the BEAM communities, they have earned their reputation for being friendly and open. The first big tip was to look at the third-party library [https://hexdocs.pm/erlexec/ erlexec], which demonstrates best practices that can be backported into the language itself. Everyone speaking on the problem has generally agreed that the fragile clean up of external processes is a bug, and supported the idea that some flavor of "terminate" signal should be sent to spawned programs.

I would be lying to hide my disappointment that the required core changes are mostly to a C program and not actually in Erlang, but it was fascinating to open such an elegant black box and find the technological equivalent of a steam engine inside. All of the futuristic, high-level features we've come to know actually map closely to a few scraps of wizardry with ordinary pipes, using stdlib read, write, and select<ref>https://man.archlinux.org/man/select.2.en</ref>.

Port drivers<ref>https://www.erlang.org/doc/system/ports.html</ref> are fundamental to ERTS and external processes are launched through several levels of wiring: the spawn driver starts a forker driver which sends a control message to <code>erl_child_setup</code> to execute your external command. Each BEAM has a single erl_child_setup process to watch over all children.

Letting a child process outlive the one that spawned leaves it in a state called an "orphaned process" in POSIX, and the standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists. This can be seen as undesirable because unix itself has a paradigm similar to OTP's Supervisors, in which each parent is responsible for its children. Without supervision, a process could potentially run forever or do naughty things. The system <code>init</code> process starts and tracks its own children, and can restart them in response to service commands. But init will know nothing about adopted, orphan processes or how to monitor and restart them.

The patch [https://github.com/erlang/otp/pull/9453 PR#9453] adapting port_close to SIGTERM is waiting for review and responses look generally positive so far.

{{Aside|text='''Which signal?'''

Which signal to use is still an open question:

; <code>HUP</code> : the softest "Goodbye!" that a program is free to interpret as it wishes

; <code>TERM</code> : has a clear intention of "kill this thing" but still possible to trap at the target and handle in a customized way

; <code>KILL</code> : bursting with destructive potential, this signal cannot be stopped and you may not clean up

There is a refreshing diversity of opinion, so it could be worthwhile to make the signal configurable for each port.
}}

Discussion threads also included some notable grumbling about the Port API in general, it seems this part of ERTS is overdue for a larger redesign. There's a good opportunity to unify the different platform implementations: Windows lacks the erl_child_setup layer entirely.

== References ==

Elixir/Ports and external process wiring

2025-10-17T11:52:26Z

Adamw: /* OTP generic server */

A deceivingly simple programming adventure veers unexpectedly into piping and signaling between unix processes.

== Context: controlling "rsync" ==
{{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}}

My exploration begins while writing a beta-quality rsync library for Elixir which transfers files in the background while monitoring progress. Rsync is the best tool for this since it can resume incomplete transfers and synchronize directories efficiently and it's complex enough that nobody will reimplement it in pure Erlang. I had hoped that this project would teach me how to interface with long-lived external processes—and I learned more than I wished for.

[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|300x300px]]

Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
System.shell("rsync -a source target")
</syntaxhighlight>
This has a few shortcomings, such as the static filenames—it feels unsafe to even demonstrate how string interpolation like <code>#{source}</code> could be misused to make this dynamic so let's skip ahead to how to <code>System.cmd</code> which is safer because it doesn't expand its argv:<syntaxhighlight lang="elixir">
System.find_executable(rsync_path)
|> System.cmd([~w(-a), source, target])
</syntaxhighlight>Better but the calling thread loses control and gets no feedback until the transfer is complete.

To run a external process asynchronously we will reach for Elixir's low-level <code>Port.open</code> which maps directly to ERTS <code>open_port</code><ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref>. These functions are tremendously flexible, and here we demonstrate how to turn a few knobs:<syntaxhighlight lang="elixir">
Port.open(
{:spawn_executable, rsync_path},
[
:binary,
:exit_status,
:hide,
:use_stdio,
:stderr_to_stdout,
args:
~w(-a --info=progress2) ++
rsync_args ++
sources ++
[args[:target]],
env: env
]
)
</syntaxhighlight>

Progress lines come in with a fairly self-explanatory format:
<syntaxhighlight lang="text">
3,342,336 33% 3.14MB/s 0:00:02
</syntaxhighlight>

{{Aside|text=
rsync has a variety of progress options, we chose overall progress above so the meaning of the percentage is "overall percent complete".

Here is the menu of alternatives:

; <code>--info=progress2</code> : report overall progress

; <code>--progress</code> : report statistics per file

; <code>--itemize-changes</code> : list the operations taken on each file

; <code>--out-format=FORMAT</code> : any format using parameters from rsyncd.conf's <code>log format</code><ref>https://man.freebsd.org/cgi/man.cgi?query=rsyncd.conf</ref>
}}

Each rsync output line is sent to the library's <code>handle_info</code> callback as <code>{:data, line}</code> and after the transfer is finished we receive a conclusive <code>{:exit_status, status_code}</code>.

We extract the percent_done column and strictly reject any other output:
<syntaxhighlight lang="elixir">
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
percent_done_text when is_binary(percent_done_text) <- Enum.at(terms, 1),
{percent_done, "%"} <- Float.parse(percent_done_text) do
percent_done
else
_ ->
{:unknown, line}
end
</syntaxhighlight>The <code>trim</code> lets us ignore spacing and newline trickery—or even a leading carriage return as you can see in the rsync source code,
<syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>The carriage return <code>\r</code> deserves a special mention: this "control" character is just a byte in the binary data coming over the pipe from rsync, but it plays a control function because of how the tty interprets it. On the terminal the effect is to overwrite the current line!

A repeated theme is that data and control are leaky categories. We come to the more formal control side channels later.
{{Aside|text=
[[File:Chinese typewriter 03.jpg|right|200x200px]]

On the terminal, rsync progress lines are updated in place by emitting a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>. The character seems to be named after pushing the physical paper carriage of a typewriter back to the beginning of the line without feeding the roller.

[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreement about carriage return]] vs. newline has caused eye-rolling since the dawn of personal computing.

[[File:Nilgais fighting, Lakeshwari, Gwalior district, India.jpg|left|200x200px]]
}}

== OTP generic server ==
This is where Erlang/OTP really starts to shine: our rsync library wraps the Port calls under a gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> module and this gives us some special properties for free: a dedicated thread which coordinates with rsync independently from anything else, receiving and sending asynchronous messages. It has an internal state including the latest percent done and this can be probed by calling code, or it can be set up to push updates to a listener.

A gen_server should be able to run under a [https://adoptingerlang.org/docs/development/supervision_trees/ OTP supervision tree] as well but our module has a major flaw: although it can correctly detect and report when rsync crashes or completes, when our gen_server is stopped by its supervisor it cannot stop its external child process in turn.

== Problem: runaway processes ==
[[File:CargoNet Di 12 Euro 4000 Lønsdal - Bolna.jpg|thumb]]
What this means is that rsync transfers would continue to run in the background even after Elixir had completely shut down, because the BEAM had no way of stopping the process.

To check whether this was something specific to rsync, I tried the same thing with <code>sleep 60</code> and I found that it behaves exactly the same way, hanging until the sleep ends naturally regardless of what happened in Elixir or whether its pipes are still open.

== Bad assumption: pipe-like processes ==
A straightforward use case for external processes would be to run a standard transformation such as compression or decompression. A program like <code>gzip</code> or <code>cat</code> will stop once it detects that its input has ended, because the main loop usually makes a C system call to <code>read</code> like this:<syntaxhighlight lang="c">
ssize_t n_read = read (input_desc, buf, bufsize);
if (n_read < 0) { error... }
if (n_read == 0) { end of file... }
</syntaxhighlight>The manual for read<ref>https://man.archlinux.org/man/read.2</ref> explains that reading 0 bytes indicates the end of file, and a negative number indicates an error such as the input file descriptor already being closed. If you think this sounds weird, I would agree: how do we tell the difference between a stream which is stalled and one which has ended? Does the calling process yield control until input arrives? How do we know if more than bufsize bytes are available? If that word salad excites you, read more about <code>O_NONBLOCK</code><ref>https://man.archlinux.org/man/open.2.en#O_NONBLOCK</ref> and unix pipes<ref>https://man.archlinux.org/man/pipe.7.en</ref>.

But here we'll focus on how processes affect each other through pipes. Surprising answer: they don't affect very much! Try opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file. Oh no, you killed it! You didn't actually send anything, though—the <control>-d is interpreted by bash and it responds by closing its pipe connected to "[[w:Standard streams|standard input]]" of the child process. This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe. My entry point to learning more is this stty webzine<ref>https://wizardzines.com/comics/stty/</ref> by Julia Evans. Go ahead and try this command, what could go wrong: <code>stty -a</code>

Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a hard reality. You could even reopen stdin from the application, to the great surprise of your friends and neighbors. For example, try opening "watch ls" or "sleep 60" and try <control>-d all you want—no effect. You did close its stdin but nobody cared, it wasn't listening to you anyway.

Back to the problem at hand, "rsync" is in this latter category of "daemon-like" programs which will carry on even after standard input is closed. This makes sense enough, since rsync isn't interactive and any output is just a side effect of its main purpose.

== Shimming can kill ==
It's possible to write a small adapter which is sensitive to stdin closing, then converts this into a stronger signal like SIGTERM which it forwards to its own child. This is the idea behind a suggested shell script<ref>https://hexdocs.pm/elixir/1.19.0/Port.html#module-orphan-operating-system-processes</ref> for Elixir and the erlexec<ref>[https://hexdocs.pm/erlexec/readme.html https://hexdocs.pm/erlexec/]</ref> library. The opposite adapter is also found in the [[w:nohup|nohup]] shell command and the grimsby<ref>https://github.com/shortishly/grimsby</ref> library: these will keep standard in or out open for the child process even after the parent exits.

I took this approach with my rsync library and included a small C program<ref>https://gitlab.com/adamwight/rsync_ex/-/blob/main/src/main.c?ref_type=heads</ref> which wraps rsync and makes it sensitive to the BEAM port_close. It's featherweight, leaving pipes unchanged as it passes control to rsync—its only real effect is to convert SIGHUP to SIGKILL (see the sidebar discussion of different signals below).

== Reliable clean up ==
{{Project|status=in review|url=https://erlangforums.com/t/open-port-and-zombie-processes|source=https://github.com/erlang/otp/pull/9453}}
It's always a pleasure to ask questions in the BEAM communities, they have earned their reputation for being friendly and open. The first big tip was to look at the third-party library [https://hexdocs.pm/erlexec/ erlexec], which demonstrates best practices that can be backported into the language itself. Everyone speaking on the problem has generally agreed that the fragile clean up of external processes is a bug, and supported the idea that some flavor of "terminate" signal should be sent to spawned programs.

I would be lying to hide my disappointment that the required core changes are mostly to a C program and not actually in Erlang, but it was fascinating to open such an elegant black box and find the technological equivalent of a steam engine inside. All of the futuristic, high-level features we've come to know actually map closely to a few scraps of wizardry with ordinary pipes, using stdlib read, write, and select<ref>https://man.archlinux.org/man/select.2.en</ref>.

Port drivers<ref>https://www.erlang.org/doc/system/ports.html</ref> are fundamental to ERTS and external processes are launched through several levels of wiring: the spawn driver starts a forker driver which sends a control message to <code>erl_child_setup</code> to execute your external command. Each BEAM has a single erl_child_setup process to watch over all children.

Letting a child process outlive the one that spawned leaves it in a state called an "orphaned process" in POSIX, and the standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists. This can be seen as undesirable because unix itself has a paradigm similar to OTP's Supervisors, in which each parent is responsible for its children. Without supervision, a process could potentially run forever or do naughty things. The system <code>init</code> process starts and tracks its own children, and can restart them in response to service commands. But init will know nothing about adopted, orphan processes or how to monitor and restart them.

The patch [https://github.com/erlang/otp/pull/9453 PR#9453] adapting port_close to SIGTERM is waiting for review and responses look generally positive so far.

{{Aside|text='''Which signal?'''

Which signal to use is still an open question:

; <code>HUP</code> : the softest "Goodbye!" that a program is free to interpret as it wishes

; <code>TERM</code> : has a clear intention of "kill this thing" but still possible to trap at the target and handle in a customized way

; <code>KILL</code> : bursting with destructive potential, this signal cannot be stopped and you may not clean up

There is a refreshing diversity of opinion, so it could be worthwhile to make the signal configurable for each port.
}}

Discussion threads also included some notable grumbling about the Port API in general, it seems this part of ERTS is overdue for a larger redesign. There's a good opportunity to unify the different platform implementations: Windows lacks the erl_child_setup layer entirely.

== References ==

Elixir/Ports and external process wiring

2025-10-17T11:50:21Z

Adamw: light edits

Elixir/Ports and external process wiring

2025-10-17T11:38:29Z

Adamw: c/e to the end

This deceivingly simple programming adventure veers unexpectedly into piping and signaling between unix processes.

== Context: controlling "rsync" ==
{{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}}

My exploration begins while writing a beta-quality rsync library for Elixir which transfers files in the background and can monitor progress. I hoped to learn better how to interface with long-lived external processes—and I got more than I wished for.

[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|300x300px]]

Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
System.shell("rsync -a source target")
</syntaxhighlight>
This has a few shortcomings, such as the static filenames—it feels unsafe to even demonstrate how string interpolation like <code>#{source}</code> could be misused so let's skip straight to the next tool, <code>System.cmd</code> which doesn't expand its argv:<syntaxhighlight lang="elixir">
System.find_executable(rsync_path)
|> System.cmd([~w(-a), source, target])
</syntaxhighlight>This is safer, but the calling thread loses control and gets no feedback until the transfer is complete.

To run a external process asynchronously we reach for Elixir's lowest-level <code>Port.open</code> which maps directly to ERTS <code>open_port</code><ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref>. These are tremendously flexible, here we demonstrate turning a few knobs:<syntaxhighlight lang="elixir">
Port.open(
{:spawn_executable, rsync_path},
[
:binary,
:exit_status,
:hide,
:use_stdio,
:stderr_to_stdout,
args:
~w(-a --info=progress2) ++
rsync_args ++
sources ++
[args[:target]],
env: env
]
)
</syntaxhighlight>

Progress lines come in with a fairly self-explanatory format:
<syntaxhighlight lang="text">
3,342,336 33% 3.14MB/s 0:00:02
</syntaxhighlight>

{{Aside|text=
rsync has a variety of progress options, we chose overall progress above so the meaning of the percentage is "overall percent complete".

Here is the menu of alternatives:

; <code>--info=progress2</code> : report overall progress

; <code>--progress</code> : report statistics per file

; <code>--itemize-changes</code> : list the operations taken on each file
}}

Each rsync output line is sent to the library's <code>handle_info</code> callback as <code>{:data, line}</code> and after the transfer is finished we receive a conclusive <code>{:exit_status, status_code}</code>.

We extract the percent_done column and strictly reject any other output:
<syntaxhighlight lang="elixir">
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
percent_done_text when is_binary(percent_done_text) <- Enum.at(terms, 1),
{percent_done, "%"} <- Float.parse(percent_done_text) do
percent_done
else
_ ->
{:unknown, line}
end
</syntaxhighlight>The <code>trim</code> lets us ignore spacing and newline trickery—or even a leading carriage return as you can see in the rsync source code,
<syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>The carriage return <code>\r</code> deserves a special mention: this "control" character is just a byte in the binary data coming over the pipe from rsync, but it plays a control function because of how the tty interprets it. On the terminal the effect is to overwrite the current line!

A repeated theme is that data and control are leaky categories. We come to the more formal control side channels later.
{{Aside|text=
[[File:Chinese typewriter 03.jpg|right|200x200px]]

On the terminal, rsync progress lines are updated in place by emitting a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>. The character seems to be named after pushing the physical paper carriage of a typewriter back to the beginning of the line without feeding the roller.

[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreement about carriage return]] vs. newline has caused eye-rolling since the dawn of personal computing.

[[File:Nilgais fighting, Lakeshwari, Gwalior district, India.jpg|left|200x200px]]
}}

== OTP generic server ==
This is where Erlang/OTP really starts to shine: our rsync library wraps the Port calls under a gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> module and this gives us some special properties for free: a dedicated thread which coordinates with rsync independently from anything else, receiving and sending asynchronous messages. It has an internal state including the latest percent done and this can be probed by calling code, or it can be set up to push updates to a listener.

A gen_server should also be able to run under a [https://adoptingerlang.org/docs/development/supervision_trees/ OTP supervision tree] as well but our module has a major flaw: it can correctly detect and report when rsync crashes or completes, but if our module is stopped by its supervisor it cannot stop its external child process in turn.

== Problem: runaway processes ==
[[File:CargoNet Di 12 Euro 4000 Lønsdal - Bolna.jpg|thumb]]
What this means is that rsync transfers would continue to run in the background even after Elixir had completely shut down, because the BEAM had no way of stopping the process.

To check whether this was something specific to rsync, I tried the same thing with <code>sleep 60</code> and I found that it behaves exactly the same way, hanging until the sleep ends naturally regardless of what happened in Elixir or whether its pipes are still open.

== Bad assumption: pipe-like processes ==
A straightforward use case for external processes would be to run a standard transformation such as compression or decompression. A program like <code>gzip</code> or <code>cat</code> will stop once it detects that its input has ended, because the main loop usually makes a C system call to <code>read</code> like this:<syntaxhighlight lang="c">
ssize_t n_read = read (input_desc, buf, bufsize);
if (n_read < 0) { error... }
if (n_read == 0) { end of file... }
</syntaxhighlight>The manual for read<ref>https://man.archlinux.org/man/read.2</ref> explains that reading 0 bytes indicates the end of file, and a negative number indicates an error such as the input file descriptor already being closed. If you think this sounds weird, I would agree: how do we tell the difference between a stream which is stalled and one which has ended? Does the calling process yield control until input arrives? How do we know if more than bufsize bytes are available? If that word salad excites you, read more about <code>O_NONBLOCK</code><ref>https://man.archlinux.org/man/open.2.en#O_NONBLOCK</ref> and unix pipes<ref>https://man.archlinux.org/man/pipe.7.en</ref>.

But here we'll focus on how processes affect each other through pipes. Surprising answer: they don't affect very much! Try opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file. Oh no, you killed it! You didn't actually send anything, though—the <control>-d is interpreted by bash and it responds by closing its pipe connected to "[[w:Standard streams|standard input]]" of the child process. This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe. My entry point to learning more is this stty webzine<ref>https://wizardzines.com/comics/stty/</ref> by Julia Evans. Go ahead and try this command, what could go wrong: <code>stty -a</code>

Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a hard reality. You could even reopen stdin from the application, to the great surprise of your friends and neighbors. For example, try opening "watch ls" or "sleep 60" and try <control>-d all you want—no effect. You did close its stdin but nobody cared, it wasn't listening to you anyway.

Back to the problem at hand, "rsync" is in this latter category of "daemon-like" programs which will carry on even after standard input is closed. This makes sense enough, since rsync isn't interactive and any output is just a side effect of its main purpose.

== Shimming can kill ==
It's possible to write a small adapter which is sensitive to stdin closing, then converts this into a stronger signal like SIGTERM which it forwards to its own child. This is the idea behind a suggested shell script<ref>https://hexdocs.pm/elixir/1.19.0/Port.html#module-orphan-operating-system-processes</ref> for Elixir and the erlexec<ref>[https://hexdocs.pm/erlexec/readme.html https://hexdocs.pm/erlexec/]</ref> library. The opposite adapter is also found in the [[w:nohup|nohup]] shell command and the grimsby<ref>https://github.com/shortishly/grimsby</ref> library: these will keep standard in or out open for the child process even after the parent exits.

I took this approach with my rsync library and included a small C program<ref>https://gitlab.com/adamwight/rsync_ex/-/blob/main/src/main.c?ref_type=heads</ref> which wraps rsync and makes it sensitive to the BEAM port_close. It's featherweight, leaving pipes unchanged as it passes control to rsync—its only real effect is to convert SIGHUP to SIGKILL (see the sidebar discussion of different signals below).

== Reliable clean up ==
{{Project|status=in review|url=https://erlangforums.com/t/open-port-and-zombie-processes|source=https://github.com/erlang/otp/pull/9453}}
It's always a pleasure to ask questions in the BEAM communities, they have earned their reputation for being friendly and open. The first big tip was to look at the third-party library [https://hexdocs.pm/erlexec/ erlexec], which demonstrates best practices that can be backported into the language itself. Everyone speaking on the problem has generally agreed that the fragile clean up of external processes is a bug, and supported the idea that some flavor of "terminate" signal should be sent to spawned programs.

I would be lying to hide my disappointment that the required core changes are mostly to a C program and not actually in Erlang, but it was fascinating to open such an elegant black box and find the technological equivalent of a steam engine inside. All of the futuristic, high-level features we've come to know actually map closely to a few scraps of wizardry with ordinary pipes, using stdlib read, write, and select<ref>https://man.archlinux.org/man/select.2.en</ref>.

Port drivers<ref>https://www.erlang.org/doc/system/ports.html</ref> are fundamental to ERTS and external processes are launched through several levels of wiring: the spawn driver starts a forker driver which sends a control message to <code>erl_child_setup</code> to execute your external command. Each BEAM has a single erl_child_setup process to watch over all children.

Letting a child process outlive the one that spawned leaves it in a state called an "orphaned process" in POSIX, and the standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists. This can be seen as undesirable because unix itself has a paradigm similar to OTP's Supervisors, in which each parent is responsible for its children. Without supervision, a process could potentially run forever or do naughty things. The system <code>init</code> process starts and tracks its own children, and can restart them in response to service commands. But init will know nothing about adopted, orphan processes or how to monitor and restart them.

The patch [https://github.com/erlang/otp/pull/9453 PR#9453] adapting port_close to SIGTERM is waiting for review and responses look generally positive so far.

{{Aside|text='''Which signal?'''

Which signal to use is still an open question:

; <code>HUP</code> : the softest "Goodbye!" that a program is free to interpret as it wishes

; <code>TERM</code> : has a clear intention of "kill this thing" but still possible to trap at the target and handle in a customized way

; <code>KILL</code> : bursting with destructive potential, this signal cannot be stopped and you may not clean up

There is a refreshing diversity of opinion, so it could be worthwhile to make the signal configurable for each port.
}}

Discussion threads also included some notable grumbling about the Port API in general, it seems this part of ERTS is overdue for a larger redesign. There's a good opportunity to unify the different platform implementations: Windows lacks the erl_child_setup layer entirely.

== References ==

Elixir/Ports and external process wiring

2025-10-17T09:46:21Z

Adamw: clarify

This deceivingly simple programming adventure veers unexpectedly into piping and signaling between unix processes.

== Context: controlling "rsync" ==
{{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}}

My exploration begins while writing a beta-quality rsync library for Elixir which transfers files in the background and can monitor progress. I hoped to learn better how to interface with long-lived external processes—and I got more than I wished for.

[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|400x400px]]

Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
System.shell("rsync -a source target")
</syntaxhighlight>
This has a few shortcomings, such as the static filenames—it feels unsafe to even demonstrate how string interpolation like <code>#{source}</code> could be misused so let's skip straight to the next tool, <code>System.cmd</code> which doesn't expand its argv:<syntaxhighlight lang="elixir">
System.find_executable(rsync_path)
|> System.cmd([~w(-a), source, target])
</syntaxhighlight>This is safer, but the calling thread loses control and gets no feedback until the transfer is complete.

To run a external process asynchronously we reach for Elixir's lowest-level <code>Port.open</code> which maps directly to ERTS <code>open_port</code><ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref>. These are tremendously flexible, here we demonstrate turning a few knobs:<syntaxhighlight lang="elixir">
Port.open(
{:spawn_executable, rsync_path},
[
:binary,
:exit_status,
:hide,
:use_stdio,
:stderr_to_stdout,
args:
~w(-a --info=progress2) ++
rsync_args ++
sources ++
[args[:target]],
env: env
]
)
</syntaxhighlight>

Progress lines come in with a fairly self-explanatory format:
<syntaxhighlight lang="text">
3,342,336 33% 3.14MB/s 0:00:02
</syntaxhighlight>

{{Aside|text=
rsync has a variety of progress options, we chose overall progress above so the meaning of the percentage is "overall percent complete".

Here is the menu of alternatives:

; <code>--info=progress2</code> : report overall progress

; <code>--progress</code> : report statistics per file

; <code>--itemize-changes</code> : list the operations taken on each file
}}

Each rsync output line is sent to the library's <code>handle_info</code> callback as <code>{:data, line}</code> and after the transfer is finished we receive a conclusive <code>{:exit_status, status_code}</code>.

We extract the percent_done column and strictly reject any other output:
<syntaxhighlight lang="elixir">
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
percent_done_text when is_binary(percent_done_text) <- Enum.at(terms, 1),
{percent_done, "%"} <- Float.parse(percent_done_text) do
percent_done
else
_ ->
{:unknown, line}
end
</syntaxhighlight>The <code>trim</code> lets us ignore spacing and newline trickery—or even a leading carriage return as you can see in the rsync source code,
<syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>

{{Aside|text=
On the terminal, rsync progress lines are updated in place by emitting a [[w:Carriage return|carriage return]] control character <code>0x0d</code> or <code>\r</code> as you see above. The character seems to be named after pushing the physical paper carriage of a typewriter backwards without feeding a new line. On the terminal this overwrites the current line!

[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreements about carriage return]] vs. newline have caused eye-rolling since the dawn of personal computing.
}}

One more comment about this carriage return: the "control" character is just a byte in the binary data coming over the pipe from rsync, but it plays a control function because of how the tty interprets it. Still, a repeated theme is that data and control are leaky categories. We come to the more formal control side channels later.

This is where Erlang/OTP really starts to shine: by opening the port inside of a dedicated gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> we have a separate thread communicating with rsync, which receives an asynchronous message like <code>{:data, text_line}</code> for each progress line. It's easy to parse the line, update some internal state and optionally send a progress summary to the code calling the library.

== Problem: runaway processes ==
This would have been the end of the story, but I'm a very flat-footed and iterative developer and as I was calling my rsync library from my application under development, I would often kill the program abruptly by crashing or by typing <control>-C in the terminal. Dozens of times. What I found is that the rsync transfers would continue to run in the background even after Elixir had completely shut down.

That would have to change—leaving overlapping file transfers running unmonitored is exactly what I wanted to avoid by having Elixir control the process in the first place. Once the BEAM stops there was no way to clearly identify and kill the sketchy rsyncing.

In fact, killing the lower-level threads when a higher-level supervising process dies is central to the BEAM concept of supervisors<ref>https://www.erlang.org/doc/system/sup_princ.html</ref> which has earned the virtual machine its reputation for being legendarily robust. Why would some external processes stop and others not? There seemed to be no way to send a signal or close the port to stop the process, either.

== Bad assumption: pipe-like processes ==
A straightforward use case for external processes would be to run a standard transformation such as compression or decompression. A program like <code>gzip</code> or <code>cat</code> will stop once it detects that its input has ended, because the main loop usually makes a C system call to <code>read</code> like this:<syntaxhighlight lang="c">
ssize_t n_read = read (input_desc, buf, bufsize);
if (n_read < 0) { error... }
if (n_read == 0) { end of file... }
</syntaxhighlight>The manual for read<ref>https://man.archlinux.org/man/read.2</ref> explains that reading 0 bytes indicates the end of file, and a negative number indicates an error such as the input file descriptor already being closed. If you think this sounds weird, I would agree: how do we tell the difference between a stream which is stalled and one which has ended? Does the calling process yield control until input arrives? How do we know if more than bufsize bytes are available? If that word salad excites you, read more about <code>O_NONBLOCK</code><ref>https://man.archlinux.org/man/open.2.en#O_NONBLOCK</ref> and unix pipes<ref>https://man.archlinux.org/man/pipe.7.en</ref>.

But here we'll focus on how processes affect each other through pipes. Surprising answer: not very much! Try opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file. Oh no, you killed it! You didn't actually send anything, instead the <control>-d is interpreted by bash and it responds by closing the pipe to the child process. This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe. My entry point to learning more is this stty webzine<ref>https://wizardzines.com/comics/stty/</ref> by Julia Evans. Go ahead, try it: <code>stty -a</code>

Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a real thing. Now try opening "watch ls" or "sleep 60" and try <control>-d all you want—no effect. You did close its stdin but nobody cares because it wasn't listening anway.

Back to the problem at hand, as it turns out "rsync" is in this latter category of programs which sees itself as a daemon which should continue even when input is closed. This makes sense enough, since rsync expects no user input and its output is just a side-effect of its main purpose.

BEAM assumes the connected process behaves like this, so nothing needs to be done to clean up a dangling external process because it will end itself as soon as the Port is closed or the BEAM exits. If the external process is known to not behave this way, the recommendation is to wrap it in a shell script which converts a closed stdin into a kill signal.<ref>https://hexdocs.pm/elixir/main/Port.html#module-orphan-operating-system-processes</ref>

== BEAM internal and external processes ==
[[W:BEAM (Erlang virtual machine)|BEAM]] applications are built out of supervision trees and excel at managing huge numbers of parallel actor processes, all scheduled internally. Although the communities' mostly share a philosophy of running as much as possible inside of the VM because it builds on this strength, and simplifies away much interface glue and context switching, on many occasions it will still start an external OS process. There are some straightforward ways to simply run a command line, which might be familiar to programmers coming from another language: <code>[https://www.erlang.org/doc/apps/kernel/os.html#cmd/2 os:cmd]</code> takes a string and runs the thing. At a lower level, external programs are managed through a [https://www.erlang.org/doc/system/ports.html Port] which is a flexible abstraction allowing a backend driver to communicate data in and out, and to send some control signals such as reporting an external process's exit and exit status.

When it comes to internal processes, BEAM is among the most mature and robust, achieved by good isolation and by its hierarchical [https://www.erlang.org/doc/system/sup_princ supervisors] liberally pruning entire subprocess trees at the first sign of going out of specification. But for external processes, results are mixed. Some programs are twitchy and crash easily, for example <code>cat</code>, but others like the BEAM itself or a long-running server are built to survive any ordinary I/O glitch or accidental mashing of the keyboard. Furthermore, this will usually be a fundamental assumption of that program and there will be no configuration to make the program behave differently depending on stimulus.

== Reliable clean up ==
What I discovered is that the BEAM external process library assumes that its spawned processes will respond to standard input and output shutting down or so called end of file, for example what happens when <control>-d is typed into the shell. This works very well for a subprocess like <code>bash</code> but has no effect on a program like <code>sleep</code> or <code>rsync</code>.

The hole created by this mismatch is interestingly solved by something shaped like the BEAM's supervisor itself. I would expect the VM to spawn many processes as necessary, but I wouldn't expect the child process to outlive the VM, just because it happens to be insensitive to end of file. Instead, I was hoping that the VM would try harder to kill these processes as the Port is closed, or if the VM halts.

In fact, letting a child process outlive the one that spawned it is unusual enough that the condition is called an "orphan process". The POSIX standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists, but this is a "should have" and not a must. The reason it can be undesirable to allow this to happen at all is that the orphan process becomes entirely responsible for itself, potentially running forever without any more intervention according to the purpose of the process. Even the system init process tracks its children, and can restart them in response to service commands. Init will know nothing about its adopted, orphan processes.

When I ran into this issue, I found the suggested workaround of writing a [https://hexdocs.pm/elixir/1.18.3/Port.html#module-zombie-operating-system-processes wrapper script] to track its child (the program originally intended to run), listen for the end of file from BEAM, and kill the external program. How much simpler it would be if this workaround were already built into the Erlang Port module!

It's always a pleasure to ask questions in the BEAM communities, they have earned a reputation as being friendly and open. The first big tip was to look at the third-party library [https://hexdocs.pm/erlexec/ erlexec], which demonstrates some best practices that might be backported into the language itself. Everyone speaking on the problem has generally agreed that the fragile clean up of external processes is a bug, and supported the idea that one of the "terminate" signals should be sent to spawned programs.

Which signal to use is still an open issue, there's a softer version <code>HUP</code> which says "Goodbye!" and the program is free to interpret as it will, the mid-level <code>TERM</code> that I prefer because it makes the intention explicit but can still be blocked or handled gracefully if needed, and <code>KILL</code> which is bursting with destructive potential. The world of unix signals is a wild and scary place, on which there's a refreshing diversity of opinion around the Internet.

== Inside the BEAM ==
Despite its retro-futuristic appearance of being one of the most time-tested yet forward-facing programming environments, I was brought back to Earth by digging around inside the VM to find that it's just a C program like any other. There's nothing holy about the BEAM emulator, there are some good and some great ideas about functional languages and they're buried in a mass of ancient procedural ifdefs, with unnerving memory management and typedefs wrapping the size of an integer on various platforms, just like you might find in other relics from the dark ages of computing, next to the Firefox or linux kernel source code.

Tantalizingly, message-passing is at the core of the VM, but is not a first-class concept when reaching out to external processes. There's some fancy footwork with [[W:Anonymous pipe|pipes]] and [[W:Dup (system call)|dup]], but communication is done with enums, unions, and bit-rattling stdlib. I love it, but... it might something to look at on another rainy day.

Elixir/Ports and external process wiring

2025-10-17T09:33:26Z

Adamw: c/e, image, formatting and arrangement

This deceivingly simple programming adventure veers unexpectedly into piping and signaling between unix processes.

== Context: controlling "rsync" ==
{{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}}

My exploration begins while writing a beta-quality rsync library for Elixir which transfers files in the background and can monitor progress. I hoped to learn better how to interface with long-lived external processes—and I got more than I wished for.

[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|400x400px]]

Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
System.shell("rsync -a source target")
</syntaxhighlight>
This has a few shortcomings, starting with filename escaping so at a minimum we should use <code>System.cmd</code>:<syntaxhighlight lang="elixir">
System.find_executable(rsync_path)
|> System.cmd([~w(-a), source, target])
</syntaxhighlight>However this job would block until the transfer is finished and we get no feedback until completion.

Elixir's low-level <code>Port.open</code> maps directly to ERTS <code>open_port</code><ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref> which provides flexibility. Here we have a command turning some knobs:<syntaxhighlight lang="elixir">
Port.open(
{:spawn_executable, rsync_path},
[
:binary,
:exit_status,
:hide,
:use_stdio,
:stderr_to_stdout,
args:
~w(-a --info=progress2) ++
rsync_args ++
sources ++
[args[:target]],
env: env
]
)
</syntaxhighlight>

Progress lines have a fairly self-explanatory format:
<syntaxhighlight lang="text">
3,342,336 33% 3.14MB/s 0:00:02
</syntaxhighlight>

{{Aside|text=
rsync has a variety of progress options, we chose overall progress above so the meaning of the percentage is "overall percent complete".

Here is the menu:

; <code>--info=progress2</code> : report overall progress

; <code>--progress</code> : report statistics per file

; <code>--itemize-changes</code> : list the operations taken on each file
}}

Each rsync output line is sent to the library callback <code>handle_info</code> as <code>{:data, line}</code>, and after transfer is finished it receives a conclusive <code>{:exit_status, status_code}</code>.

Here we extract the percent_done column and strictly reject any other output:
<syntaxhighlight lang="elixir">
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
percent_done_text when is_binary(percent_done_text) <- Enum.at(terms, 1),
{percent_done, "%"} <- Float.parse(percent_done_text) do
percent_done
else
_ ->
{:unknown, line}
end
</syntaxhighlight>The <code>trim</code> lets us ignore spacing and newline trickery—or the leading carriage return you can see in this line from rsync's source,
<syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>

{{Aside|text=
On the terminal, rsync progress lines are updated in-place by emitting the fun [[w:Carriage return|carriage return]] control character <code>0x0d</code> or <code>\r</code> as you see above. The character seems to be named after pushing the physical paper carriage of a typewriter backwards without feeding a new line. On the terminal this overwrites the current line!

[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreements about carriage return]] vs. newline have caused eye-rolling since the dawn of personal computing.
}}

One more comment about this carriage return: it's a byte in the binary data coming over the pipe from rsync, but it plays a "control" function because of how it will be interpreted by the tty. A repeated theme is that data and control are leaky categories,

This is where Erlang/OTP really starts to shine: by opening the port inside of a dedicated gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> we have a separate thread communicating with rsync, which receives an asynchronous message like <code>{:data, text_line}</code> for each progress line. It's easy to parse the line, update some internal state and optionally send a progress summary to the code calling the library.

== Problem: runaway processes ==
This would have been the end of the story, but I'm a very flat-footed and iterative developer and as I was calling my rsync library from my application under development, I would often kill the program abruptly by crashing or by typing <control>-C in the terminal. Dozens of times. What I found is that the rsync transfers would continue to run in the background even after Elixir had completely shut down.

That would have to change—leaving overlapping file transfers running unmonitored is exactly what I wanted to avoid by having Elixir control the process in the first place. Once the BEAM stops there was no way to clearly identify and kill the sketchy rsyncing.

In fact, killing the lower-level threads when a higher-level supervising process dies is central to the BEAM concept of supervisors<ref>https://www.erlang.org/doc/system/sup_princ.html</ref> which has earned the virtual machine its reputation for being legendarily robust. Why would some external processes stop and others not? There seemed to be no way to send a signal or close the port to stop the process, either.

== Bad assumption: pipe-like processes ==
A straightforward use case for external processes would be to run a standard transformation such as compression or decompression. A program like <code>gzip</code> or <code>cat</code> will stop once it detects that its input has ended, because the main loop usually makes a C system call to <code>read</code> like this:<syntaxhighlight lang="c">
ssize_t n_read = read (input_desc, buf, bufsize);
if (n_read < 0) { error... }
if (n_read == 0) { end of file... }
</syntaxhighlight>The manual for read<ref>https://man.archlinux.org/man/read.2</ref> explains that reading 0 bytes indicates the end of file, and a negative number indicates an error such as the input file descriptor already being closed. If you think this sounds weird, I would agree: how do we tell the difference between a stream which is stalled and one which has ended? Does the calling process yield control until input arrives? How do we know if more than bufsize bytes are available? If that word salad excites you, read more about <code>O_NONBLOCK</code><ref>https://man.archlinux.org/man/open.2.en#O_NONBLOCK</ref> and unix pipes<ref>https://man.archlinux.org/man/pipe.7.en</ref>.

But here we'll focus on how processes affect each other through pipes. Surprising answer: not very much! Try opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file. Oh no, you killed it! You didn't actually send anything, instead the <control>-d is interpreted by bash and it responds by closing the pipe to the child process. This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe. My entry point to learning more is this stty webzine<ref>https://wizardzines.com/comics/stty/</ref> by Julia Evans. Go ahead, try it: <code>stty -a</code>

Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a real thing. Now try opening "watch ls" or "sleep 60" and try <control>-d all you want—no effect. You did close its stdin but nobody cares because it wasn't listening anway.

Back to the problem at hand, as it turns out "rsync" is in this latter category of programs which sees itself as a daemon which should continue even when input is closed. This makes sense enough, since rsync expects no user input and its output is just a side-effect of its main purpose.

BEAM assumes the connected process behaves like this, so nothing needs to be done to clean up a dangling external process because it will end itself as soon as the Port is closed or the BEAM exits. If the external process is known to not behave this way, the recommendation is to wrap it in a shell script which converts a closed stdin into a kill signal.<ref>https://hexdocs.pm/elixir/main/Port.html#module-orphan-operating-system-processes</ref>

== BEAM internal and external processes ==
[[W:BEAM (Erlang virtual machine)|BEAM]] applications are built out of supervision trees and excel at managing huge numbers of parallel actor processes, all scheduled internally. Although the communities' mostly share a philosophy of running as much as possible inside of the VM because it builds on this strength, and simplifies away much interface glue and context switching, on many occasions it will still start an external OS process. There are some straightforward ways to simply run a command line, which might be familiar to programmers coming from another language: <code>[https://www.erlang.org/doc/apps/kernel/os.html#cmd/2 os:cmd]</code> takes a string and runs the thing. At a lower level, external programs are managed through a [https://www.erlang.org/doc/system/ports.html Port] which is a flexible abstraction allowing a backend driver to communicate data in and out, and to send some control signals such as reporting an external process's exit and exit status.

When it comes to internal processes, BEAM is among the most mature and robust, achieved by good isolation and by its hierarchical [https://www.erlang.org/doc/system/sup_princ supervisors] liberally pruning entire subprocess trees at the first sign of going out of specification. But for external processes, results are mixed. Some programs are twitchy and crash easily, for example <code>cat</code>, but others like the BEAM itself or a long-running server are built to survive any ordinary I/O glitch or accidental mashing of the keyboard. Furthermore, this will usually be a fundamental assumption of that program and there will be no configuration to make the program behave differently depending on stimulus.

== Reliable clean up ==
What I discovered is that the BEAM external process library assumes that its spawned processes will respond to standard input and output shutting down or so called end of file, for example what happens when <control>-d is typed into the shell. This works very well for a subprocess like <code>bash</code> but has no effect on a program like <code>sleep</code> or <code>rsync</code>.

The hole created by this mismatch is interestingly solved by something shaped like the BEAM's supervisor itself. I would expect the VM to spawn many processes as necessary, but I wouldn't expect the child process to outlive the VM, just because it happens to be insensitive to end of file. Instead, I was hoping that the VM would try harder to kill these processes as the Port is closed, or if the VM halts.

In fact, letting a child process outlive the one that spawned it is unusual enough that the condition is called an "orphan process". The POSIX standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists, but this is a "should have" and not a must. The reason it can be undesirable to allow this to happen at all is that the orphan process becomes entirely responsible for itself, potentially running forever without any more intervention according to the purpose of the process. Even the system init process tracks its children, and can restart them in response to service commands. Init will know nothing about its adopted, orphan processes.

When I ran into this issue, I found the suggested workaround of writing a [https://hexdocs.pm/elixir/1.18.3/Port.html#module-zombie-operating-system-processes wrapper script] to track its child (the program originally intended to run), listen for the end of file from BEAM, and kill the external program. How much simpler it would be if this workaround were already built into the Erlang Port module!

It's always a pleasure to ask questions in the BEAM communities, they have earned a reputation as being friendly and open. The first big tip was to look at the third-party library [https://hexdocs.pm/erlexec/ erlexec], which demonstrates some best practices that might be backported into the language itself. Everyone speaking on the problem has generally agreed that the fragile clean up of external processes is a bug, and supported the idea that one of the "terminate" signals should be sent to spawned programs.

Which signal to use is still an open issue, there's a softer version <code>HUP</code> which says "Goodbye!" and the program is free to interpret as it will, the mid-level <code>TERM</code> that I prefer because it makes the intention explicit but can still be blocked or handled gracefully if needed, and <code>KILL</code> which is bursting with destructive potential. The world of unix signals is a wild and scary place, on which there's a refreshing diversity of opinion around the Internet.

== Inside the BEAM ==
Despite its retro-futuristic appearance of being one of the most time-tested yet forward-facing programming environments, I was brought back to Earth by digging around inside the VM to find that it's just a C program like any other. There's nothing holy about the BEAM emulator, there are some good and some great ideas about functional languages and they're buried in a mass of ancient procedural ifdefs, with unnerving memory management and typedefs wrapping the size of an integer on various platforms, just like you might find in other relics from the dark ages of computing, next to the Firefox or linux kernel source code.

Tantalizingly, message-passing is at the core of the VM, but is not a first-class concept when reaching out to external processes. There's some fancy footwork with [[W:Anonymous pipe|pipes]] and [[W:Dup (system call)|dup]], but communication is done with enums, unions, and bit-rattling stdlib. I love it, but... it might something to look at on another rainy day.

Template:Project

2025-10-17T08:47:47Z

Adamw: url is optional

<noinclude>
Examples:
* <code><nowiki>{{Project|status=production|url=https://git.invalid/scrape-wiki-html-dump}}</nowiki></code>

{{Project|status=production|url=https://git.invalid/scrape-wiki-html-dump}}

<div style="clear: both;"></div>

* <code><nowiki>{{Project|url=https://git.invalid/scrape-wiki-html-dump}}</nowiki></code>

{{Project|url=https://git.invalid/scrape-wiki-html-dump}}

* <code><nowiki>{{Project|url=https://demo.invalid/|source=https://git.invalid/scrape-wiki-html-dump}}</nowiki></code>

{{Project|url=https://demo.invalid/|source=https://git.invalid/scrape-wiki-html-dump}}

<div style="clear: both;"></div>

<templatedata>
{
"description": "Infobox about the project documented on this page.",
"params": {
"status": {
"type": "string"
},
"url": {
"type": "url"
},
"source": {
"type": "url"
}
}
}
</templatedata>
</noinclude><includeonly>
<div style="float: right; clear: right; display:flex; flex-direction: row">
<div style="margin-top: auto; margin-bottom: auto">[[File:Git format.png|64x64px]]</div>
<div style="margin-top: auto; margin-bottom: auto">
Project link{{#if: {{{status|}}} |  ({{{status|}}}) | }}:
<br>
{{{url|}}}
{{#if: {{{source|}}} | <br>Source code:<br>{{{source}}} }}
</div>
</div>
</includeonly>

Elixir/Ports and external process wiring

2025-10-16T22:27:39Z

Adamw: c/e and move out some asides

This is a short programming adventure which goes into piping and signaling between processes.

== Context: controlling "rsync" ==
This exploration began with writing a library<ref>https://hexdocs.pm/rsync/Rsync.html</ref> to run rsync in order to transfer files in a background thread and monitor progress. I hoped to learn how to interface with long-lived external processes, and I got more than I wished for.

Starting rsync would be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
System.shell("rsync -a src target")
</syntaxhighlight>
This has a few shortcomings: filename escaping is hard to do safely so <code>System.cmd</code> should be used instead, and the job would block until the transfer is done so we get no feedback until completion. Ending the shell command in an ampersand <code>&</code> is not enough, so the caller would have to manually start a new thread.

Elixir's low-level <code>Port</code> call maps directly to the base Erlang open_port<ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref> and it gives much more flexibility:<syntaxhighlight lang="elixir">
Port.open(
{:spawn_executable, rsync_path},
[
:binary,
:exit_status,
:hide,
:use_stdio,
:stderr_to_stdout,
args:
~w(-a --info=progress2) ++
rsync_args ++
sources ++
[args[:target]],
env: env
]
)
</syntaxhighlight>

{{Aside|text=
If you're here for rsync, it includes a few alternatives for progress reporting:

; <code>--info=progress2</code> : reports overall progress
; <code>--progress</code> : reports statistics per file
; <code>--itemize-changes</code> ; lists the operations taken on each file

Progress reporting uses a columnar format:
<syntaxhighlight lang="text">
3,342,336 33% 3.14MB/s 0:00:02
</syntaxhighlight>
}}

{{Aside|text=
On the terminal the progress line is updated in-place by restarting the line with the fun [[w:Carriage return|carriage return]] control character <code>0x0d</code> or <code>\r</code>. This is apparently named after pushing the physical paper carriage of a typewriter and on a terminal it will erases the current line so it can be written again! But over a pipe we see this as a regular byte in the stream, like "<code>-old line-^M-new line-</code>". [[W:|Disagreements]] about carriage return vs. newline have caused eye-rolling since the dawn of personal computing but we can double-check the rsync source code and we see that it will format output using carriage return on any platform: <syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>
}}

This is where Erlang/OTP really starts to shine: by opening the port inside of a dedicated gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> we have a separate thread communicating with rsync, which receives an asynchronous message like <code>{:data, text_line}</code> for each progress line. It's easy to parse the line, update some internal state and optionally send a progress summary to the code calling the library.

== Problem: runaway processes ==
This would have been the end of the story, but I'm a very flat-footed and iterative developer and as I was calling my rsync library from my application under development, I would often kill the program abruptly by crashing or by typing <control>-C in the terminal. Dozens of times. What I found is that the rsync transfers would continue to run in the background even after Elixir had completely shut down.

That would have to change—leaving overlapping file transfers running unmonitored is exactly what I wanted to avoid by having Elixir control the process in the first place. Once the BEAM stops there was no way to clearly identify and kill the sketchy rsyncing.

In fact, killing the lower-level threads when a higher-level supervising process dies is central to the BEAM concept of supervisors<ref>https://www.erlang.org/doc/system/sup_princ.html</ref> which has earned the virtual machine its reputation for being legendarily robust. Why would some external processes stop and others not? There seemed to be no way to send a signal or close the port to stop the process, either.

== Bad assumption: pipe-like processes ==
A straightforward use case for external processes would be to run a standard transformation such as compression or decompression. A program like <code>gzip</code> or <code>cat</code> will stop once it detects that its input has ended, because the main loop usually makes a C system call to <code>read</code> like this:<syntaxhighlight lang="c">
ssize_t n_read = read (input_desc, buf, bufsize);
if (n_read < 0) { error... }
if (n_read == 0) { end of file... }
</syntaxhighlight>The manual for read<ref>https://man.archlinux.org/man/read.2</ref> explains that reading 0 bytes indicates the end of file, and a negative number indicates an error such as the input file descriptor already being closed. If you think this sounds weird, I would agree: how do we tell the difference between a stream which is stalled and one which has ended? Does the calling process yield control until input arrives? How do we know if more than bufsize bytes are available? If that word salad excites you, read more about <code>O_NONBLOCK</code><ref>https://man.archlinux.org/man/open.2.en#O_NONBLOCK</ref> and unix pipes<ref>https://man.archlinux.org/man/pipe.7.en</ref>.

But here we'll focus on how processes affect each other through pipes. Surprising answer: not very much! Try opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file. Oh no, you killed it! You didn't actually send anything, instead the <control>-d is interpreted by bash and it responds by closing the pipe to the child process. This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe. My entry point to learning more is this stty webzine<ref>https://wizardzines.com/comics/stty/</ref> by Julia Evans. Go ahead, try it: <code>stty -a</code>

Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a real thing. Now try opening "watch ls" or "sleep 60" and try <control>-d all you want—no effect. You did close its stdin but nobody cares because it wasn't listening anway.

Back to the problem at hand, as it turns out "rsync" is in this latter category of programs which sees itself as a daemon which should continue even when input is closed. This makes sense enough, since rsync expects no user input and its output is just a side-effect of its main purpose.

BEAM assumes the connected process behaves like this, so nothing needs to be done to clean up a dangling external process because it will end itself as soon as the Port is closed or the BEAM exits. If the external process is known to not behave this way, the recommendation is to wrap it in a shell script which converts a closed stdin into a kill signal.<ref>https://hexdocs.pm/elixir/main/Port.html#module-orphan-operating-system-processes</ref>

== BEAM internal and external processes ==
[[W:BEAM (Erlang virtual machine)|BEAM]] applications are built out of supervision trees and excel at managing huge numbers of parallel actor processes, all scheduled internally. Although the communities' mostly share a philosophy of running as much as possible inside of the VM because it builds on this strength, and simplifies away much interface glue and context switching, on many occasions it will still start an external OS process. There are some straightforward ways to simply run a command line, which might be familiar to programmers coming from another language: <code>[https://www.erlang.org/doc/apps/kernel/os.html#cmd/2 os:cmd]</code> takes a string and runs the thing. At a lower level, external programs are managed through a [https://www.erlang.org/doc/system/ports.html Port] which is a flexible abstraction allowing a backend driver to communicate data in and out, and to send some control signals such as reporting an external process's exit and exit status.

When it comes to internal processes, BEAM is among the most mature and robust, achieved by good isolation and by its hierarchical [https://www.erlang.org/doc/system/sup_princ supervisors] liberally pruning entire subprocess trees at the first sign of going out of specification. But for external processes, results are mixed. Some programs are twitchy and crash easily, for example <code>cat</code>, but others like the BEAM itself or a long-running server are built to survive any ordinary I/O glitch or accidental mashing of the keyboard. Furthermore, this will usually be a fundamental assumption of that program and there will be no configuration to make the program behave differently depending on stimulus.

== Reliable clean up ==
What I discovered is that the BEAM external process library assumes that its spawned processes will respond to standard input and output shutting down or so called end of file, for example what happens when <control>-d is typed into the shell. This works very well for a subprocess like <code>bash</code> but has no effect on a program like <code>sleep</code> or <code>rsync</code>.

The hole created by this mismatch is interestingly solved by something shaped like the BEAM's supervisor itself. I would expect the VM to spawn many processes as necessary, but I wouldn't expect the child process to outlive the VM, just because it happens to be insensitive to end of file. Instead, I was hoping that the VM would try harder to kill these processes as the Port is closed, or if the VM halts.

In fact, letting a child process outlive the one that spawned it is unusual enough that the condition is called an "orphan process". The POSIX standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists, but this is a "should have" and not a must. The reason it can be undesirable to allow this to happen at all is that the orphan process becomes entirely responsible for itself, potentially running forever without any more intervention according to the purpose of the process. Even the system init process tracks its children, and can restart them in response to service commands. Init will know nothing about its adopted, orphan processes.

When I ran into this issue, I found the suggested workaround of writing a [https://hexdocs.pm/elixir/1.18.3/Port.html#module-zombie-operating-system-processes wrapper script] to track its child (the program originally intended to run), listen for the end of file from BEAM, and kill the external program. How much simpler it would be if this workaround were already built into the Erlang Port module!

It's always a pleasure to ask questions in the BEAM communities, they have earned a reputation as being friendly and open. The first big tip was to look at the third-party library [https://hexdocs.pm/erlexec/ erlexec], which demonstrates some best practices that might be backported into the language itself. Everyone speaking on the problem has generally agreed that the fragile clean up of external processes is a bug, and supported the idea that one of the "terminate" signals should be sent to spawned programs.

Which signal to use is still an open issue, there's a softer version <code>HUP</code> which says "Goodbye!" and the program is free to interpret as it will, the mid-level <code>TERM</code> that I prefer because it makes the intention explicit but can still be blocked or handled gracefully if needed, and <code>KILL</code> which is bursting with destructive potential. The world of unix signals is a wild and scary place, on which there's a refreshing diversity of opinion around the Internet.

== Inside the BEAM ==
Despite its retro-futuristic appearance of being one of the most time-tested yet forward-facing programming environments, I was brought back to Earth by digging around inside the VM to find that it's just a C program like any other. There's nothing holy about the BEAM emulator, there are some good and some great ideas about functional languages and they're buried in a mass of ancient procedural ifdefs, with unnerving memory management and typedefs wrapping the size of an integer on various platforms, just like you might find in other relics from the dark ages of computing, next to the Firefox or linux kernel source code.

Tantalizingly, message-passing is at the core of the VM, but is not a first-class concept when reaching out to external processes. There's some fancy footwork with [[W:Anonymous pipe|pipes]] and [[W:Dup (system call)|dup]], but communication is done with enums, unions, and bit-rattling stdlib. I love it, but... it might something to look at on another rainy day.

Module:Message box/ambox.css

2025-10-16T22:11:53Z

Adamw:

.ambox {
border: 1px solid #a2a9b1;
/* @noflip */
border-left: 10px solid #36c; /* Default "notice" blue */
/*background-color: #fbfbfb;*/
box-sizing: border-box;
}

/* Single border between stacked boxes. Take into account base templatestyles,
* user styles, and Template:Dated maintenance category.
* remove link selector when T200206 is fixed
*/
.ambox + link + .ambox,
.ambox + link + style + .ambox,
.ambox + link + link + .ambox,
/* TODO: raise these as "is this really that necessary???". the change was Dec 2021 */
.ambox + .mw-empty-elt + link + .ambox,
.ambox + .mw-empty-elt + link + style + .ambox,
.ambox + .mw-empty-elt + link + link + .ambox {
margin-top: -1px;
}

/* For the "small=left" option. */
/* must override .ambox + .ambox styles above */
html body.mediawiki .ambox.mbox-small-left {
/* @noflip */
margin: 4px 1em 4px 0;
overflow: hidden;
width: 238px;
border-collapse: collapse;
font-size: 88%;
line-height: 1.25em;
}

.ambox-speedy {
/* @noflip */
border-left: 10px solid #b32424; /* Red */
background-color: #fee7e6; /* Pink */
}

.ambox-delete {
/* @noflip */
border-left: 10px solid #b32424; /* Red */
}

.ambox-content {
/* @noflip */
border-left: 10px solid #f28500; /* Orange */
}

.ambox-style {
/* @noflip */
border-left: 10px solid #fc3; /* Yellow */
}

.ambox-move {
/* @noflip */
border-left: 10px solid #9932cc; /* Purple */
}

.ambox-protection {
/* @noflip */
border-left: 10px solid #a2a9b1; /* Gray-gold */
}

.ambox .mbox-text {
border: none;
/* @noflip */
padding: 0.25em 0.5em;
width: 100%;
}

.ambox .mbox-image {
border: none;
/* @noflip */
padding: 2px 0 2px 0.5em;
text-align: center;
}

.ambox .mbox-imageright {
border: none;
/* @noflip */
padding: 2px 0.5em 2px 0;
text-align: center;
}

/* An empty narrow cell */
.ambox .mbox-empty-cell {
border: none;
padding: 0;
width: 1px;
}

.ambox .mbox-image-div {
width: 52px;
}

@media (min-width: 720px) {
.ambox {
margin: 0 10%; /* 10% = Will not overlap with other elements */
}
}
/*
@media print {
body.ns-0 .ambox {
display: none !important;
}
}*/

Module:Message box/ambox.css

2025-10-16T22:10:56Z

Adamw:

/* {{pp|small=y}} */
.ambox {
border: 1px solid #a2a9b1;
/* @noflip */
border-left: 10px solid #36c; /* Default "notice" blue */
/*background-color: #fbfbfb;*/
box-sizing: border-box;
}

/* Single border between stacked boxes. Take into account base templatestyles,
* user styles, and Template:Dated maintenance category.
* remove link selector when T200206 is fixed
*/
.ambox + link + .ambox,
.ambox + link + style + .ambox,
.ambox + link + link + .ambox,
/* TODO: raise these as "is this really that necessary???". the change was Dec 2021 */
.ambox + .mw-empty-elt + link + .ambox,
.ambox + .mw-empty-elt + link + style + .ambox,
.ambox + .mw-empty-elt + link + link + .ambox {
margin-top: -1px;
}

/* For the "small=left" option. */
/* must override .ambox + .ambox styles above */
html body.mediawiki .ambox.mbox-small-left {
/* @noflip */
margin: 4px 1em 4px 0;
overflow: hidden;
width: 238px;
border-collapse: collapse;
font-size: 88%;
line-height: 1.25em;
}

.ambox-speedy {
/* @noflip */
border-left: 10px solid #b32424; /* Red */
background-color: #fee7e6; /* Pink */
}

.ambox-delete {
/* @noflip */
border-left: 10px solid #b32424; /* Red */
}

.ambox-content {
/* @noflip */
border-left: 10px solid #f28500; /* Orange */
}

.ambox-style {
/* @noflip */
border-left: 10px solid #fc3; /* Yellow */
}

.ambox-move {
/* @noflip */
border-left: 10px solid #9932cc; /* Purple */
}

.ambox-protection {
/* @noflip */
border-left: 10px solid #a2a9b1; /* Gray-gold */
}

.ambox .mbox-text {
border: none;
/* @noflip */
padding: 0.25em 0.5em;
width: 100%;
}

.ambox .mbox-image {
border: none;
/* @noflip */
padding: 2px 0 2px 0.5em;
text-align: center;
}

.ambox .mbox-imageright {
border: none;
/* @noflip */
padding: 2px 0.5em 2px 0;
text-align: center;
}

/* An empty narrow cell */
.ambox .mbox-empty-cell {
border: none;
padding: 0;
width: 1px;
}

.ambox .mbox-image-div {
width: 52px;
}

@media (min-width: 720px) {
.ambox {
margin: 0 10%; /* 10% = Will not overlap with other elements */
}
}
/*
@media print {
body.ns-0 .ambox {
display: none !important;
}
}*/

Template:Aside

2025-10-16T22:06:20Z

Adamw:

<includeonly>
<templatestyles src="Module:Message box/ambox.css"></templatestyles>
<table class="box-reltime plainlinks metadata ambox ambox-notice" role="presentation">
<tr>
<td class="mbox-image">
<div class="mbox-image-div">[[File:Information icon4.svg|40x40px|link=|alt=]]</div>
</td>
<td class="mbox-text">
<div class="mbox-text-span">{{{text|}}}</div>
</td>
</tr>
</table>
</includeonly><noinclude>
Example: <pre>{{Aside|text=Content}}</pre> {{Aside|text=Content}}
</noinclude>

Template:Aside

2025-10-16T22:03:14Z

Adamw: include templatestyles

<includeonly><templatestyles src="Module:Message box/ambox.css"></templatestyles><table class="box-reltime plainlinks metadata ambox ambox-notice" role="presentation"><tr><td class="mbox-image"><div class="mbox-image-div">[[File:Information icon4.svg|40x40px|link=|alt=]]</div></td><td class="mbox-text"><div class="mbox-text-span">{{{text|}}}<span class="hide-when-compact"></span></div></td></tr></table>
</includeonly><noinclude>
Example: <pre>{{Aside|text=Content}}</pre> {{Aside|text=Content}}
</noinclude>

Module:Message box/ambox.css

2025-10-16T22:02:07Z

Adamw: Created page with "/* {{pp|small=y}} */ .ambox { border: 1px solid #a2a9b1; /* @noflip */ border-left: 10px solid #36c; /* Default "notice" blue */ background-color: #fbfbfb; box-sizing: border-box; } /* Single border between stacked boxes. Take into account base templatestyles, * user styles, and Template:Dated maintenance category. * remove link selector when T200206 is fixed */ .ambox + link + .ambox, .ambox + link + style + .ambox, .ambox + link + link + .ambox, /* TODO: rais..."

/* {{pp|small=y}} */
.ambox {
border: 1px solid #a2a9b1;
/* @noflip */
border-left: 10px solid #36c; /* Default "notice" blue */
background-color: #fbfbfb;
box-sizing: border-box;
}

/* Single border between stacked boxes. Take into account base templatestyles,
* user styles, and Template:Dated maintenance category.
* remove link selector when T200206 is fixed
*/
.ambox + link + .ambox,
.ambox + link + style + .ambox,
.ambox + link + link + .ambox,
/* TODO: raise these as "is this really that necessary???". the change was Dec 2021 */
.ambox + .mw-empty-elt + link + .ambox,
.ambox + .mw-empty-elt + link + style + .ambox,
.ambox + .mw-empty-elt + link + link + .ambox {
margin-top: -1px;
}

/* For the "small=left" option. */
/* must override .ambox + .ambox styles above */
html body.mediawiki .ambox.mbox-small-left {
/* @noflip */
margin: 4px 1em 4px 0;
overflow: hidden;
width: 238px;
border-collapse: collapse;
font-size: 88%;
line-height: 1.25em;
}

.ambox-speedy {
/* @noflip */
border-left: 10px solid #b32424; /* Red */
background-color: #fee7e6; /* Pink */
}

.ambox-delete {
/* @noflip */
border-left: 10px solid #b32424; /* Red */
}

.ambox-content {
/* @noflip */
border-left: 10px solid #f28500; /* Orange */
}

.ambox-style {
/* @noflip */
border-left: 10px solid #fc3; /* Yellow */
}

.ambox-move {
/* @noflip */
border-left: 10px solid #9932cc; /* Purple */
}

.ambox-protection {
/* @noflip */
border-left: 10px solid #a2a9b1; /* Gray-gold */
}

.ambox .mbox-text {
border: none;
/* @noflip */
padding: 0.25em 0.5em;
width: 100%;
}

.ambox .mbox-image {
border: none;
/* @noflip */
padding: 2px 0 2px 0.5em;
text-align: center;
}

.ambox .mbox-imageright {
border: none;
/* @noflip */
padding: 2px 0.5em 2px 0;
text-align: center;
}

/* An empty narrow cell */
.ambox .mbox-empty-cell {
border: none;
padding: 0;
width: 1px;
}

.ambox .mbox-image-div {
width: 52px;
}

@media (min-width: 720px) {
.ambox {
margin: 0 10%; /* 10% = Will not overlap with other elements */
}
}
/*
@media print {
body.ns-0 .ambox {
display: none !important;
}
}*/

Template:Aside

2025-10-16T20:45:42Z

Adamw: copy rendered ambox

<includeonly><table class="box-reltime plainlinks metadata ambox ambox-notice" role="presentation"><tr><td class="mbox-image"><div class="mbox-image-div">[[File:Information icon4.svg|40x40px|link=|alt=]]</div></td><td class="mbox-text"><div class="mbox-text-span">{{{text|}}}<span class="hide-when-compact"></span></div></td></tr></table>
</includeonly><noinclude>
Example: <pre>{{Aside|text=Content}}</pre> {{Aside|text=Content}}
</noinclude>

Template:Aside

2025-10-16T20:43:15Z

Adamw: copied from expanded template

<table class="box-reltime plainlinks metadata ambox ambox-notice" role="presentation"><tr><td class="mbox-image"><div class="mbox-image-div">[[File:Information icon4.svg|40x40px|link=|alt=]]</div></td><td class="mbox-text"><div class="mbox-text-span">{{{text|}}}<span class="hide-when-compact"></span></div></td></tr></table>

Draft:Elixir/OS processes

2025-10-16T20:31:31Z

Adamw: Adamw moved page Draft:Elixir/OS processes to Draft:Elixir/Ports and external process wiring

#REDIRECT [[Draft:Elixir/Ports and external process wiring]]

Elixir/Ports and external process wiring

2025-10-16T20:31:31Z

Adamw: Adamw moved page Draft:Elixir/OS processes to Draft:Elixir/Ports and external process wiring

This is a short programming adventure which goes into piping and signaling between processes.

== Context: controlling "rsync" ==
This exploration began when I wrote a simple library to run rsync from an Elixir program<ref>https://hexdocs.pm/rsync/Rsync.html</ref>, to transfer files in a background thread while monitoring progress. I was hoping to learn how to interface with long-lived external processes, and I ended up learning more than I wished for.

Starting rsync and reading from it went very well, mostly thanks to the <code>--info=progress2</code> option which reports progress with a simple columnar format that can be easily parsed:<syntaxhighlight lang="text">
3,342,336 33% 3.14MB/s 0:00:02
</syntaxhighlight>In case you're here to integrate with rsync, there's also a slightly different <code>--progress</code> option which reports statistics per file, and an option <code>--itemize-changes</code> which can be included to get information about the operations taken on each file, but in my case I care more about the overall transfer progress.

On the terminal the progress line is updated in-place by restarting the line with the fun [[w:Carriage return|carriage return]] control character <code>0x0d</code> or <code>\r</code>. This is apparently named after pushing the physical paper carriage of a typewriter and on a terminal it will erases the current line so it can be written again! But over a pipe we see this as a regular byte in the stream, like "<code>-old line-^M-new line-</code>". [[W:|Disagreements]] about carriage return vs. newline have caused eye-rolling since the dawn of personal computing but we can double-check the rsync source code and we see that it will format output using carriage return on any platform: <syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>

My library starts rsync using Elixir's low-level <code>Port</code> call, which maps directly to the base Erlang open_port<ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref> implementation:<syntaxhighlight lang="elixir">
Port.open(
{:spawn_executable, rsync_path},
[
:binary,
:exit_status,
:hide,
:use_stdio,
:stderr_to_stdout,
args:
~w(-a --info=progress2) ++
rsync_args ++
sources ++
[args[:target]],
env: env
]
)
</syntaxhighlight>This is where Erlang/OTP really starts to shine: by opening the port inside of a dedicated gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> we have a separate thread communicating with rsync, which receives an asynchronous message like <code>{:data, text_line}</code> for each progress line. It's easy to parse the line, update some internal state and optionally send a progress summary to the code calling the library.

== Problem: runaway processes ==
This would have been the end of the story, but I'm a very flat-footed and iterative developer and as I was calling my rsync library from my application under development, I would often kill the program abruptly by crashing or by typing <control>-C in the terminal. Dozens of times. What I found is that the rsync transfers would continue to run in the background even after Elixir had completely shut down.

That would have to change—leaving overlapping file transfers running unmonitored is exactly what I wanted to avoid by having Elixir control the process in the first place. Once the BEAM stops there was no way to clearly identify and kill the sketchy rsyncing.

In fact, killing the lower-level threads when a higher-level supervising process dies is central to the BEAM concept of supervisors<ref>https://www.erlang.org/doc/system/sup_princ.html</ref> which has earned the virtual machine its reputation for being legendarily robust. Why would some external processes stop and others not? There seemed to be no way to send a signal or close the port to stop the process, either.

== Bad assumption: pipe-like processes ==
A straightforward use case for external processes would be to run a standard transformation such as compression or decompression. A program like <code>gzip</code> or <code>cat</code> will stop once it detects that its input has ended, because the main loop usually makes a C system call to <code>read</code> like this:<syntaxhighlight lang="c">
ssize_t n_read = read (input_desc, buf, bufsize);
if (n_read < 0) { error... }
if (n_read == 0) { end of file... }
</syntaxhighlight>The manual for read<ref>https://man.archlinux.org/man/read.2</ref> explains that reading 0 bytes indicates the end of file, and a negative number indicates an error such as the input file descriptor already being closed. If you think this sounds weird, I would agree: how do we tell the difference between a stream which is stalled and one which has ended? Does the calling process yield control until input arrives? How do we know if more than bufsize bytes are available? If that word salad excites you, read more about <code>O_NONBLOCK</code><ref>https://man.archlinux.org/man/open.2.en#O_NONBLOCK</ref> and unix pipes<ref>https://man.archlinux.org/man/pipe.7.en</ref>.

But here we'll focus on how processes affect each other through pipes. Surprising answer: not very much! Try opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file. Oh no, you killed it! You didn't actually send anything, instead the <control>-d is interpreted by bash and it responds by closing the pipe to the child process. This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe. My entry point to learning more is this stty webzine<ref>https://wizardzines.com/comics/stty/</ref> by Julia Evans. Go ahead, try it: <code>stty -a</code>

Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a real thing. Now try opening "watch ls" or "sleep 60" and try <control>-d all you want—no effect. You did close its stdin but nobody cares because it wasn't listening anway.

Back to the problem at hand, as it turns out "rsync" is in this latter category of programs which sees itself as a daemon which should continue even when input is closed. This makes sense enough, since rsync expects no user input and its output is just a side-effect of its main purpose.

BEAM assumes the connected process behaves like this, so nothing needs to be done to clean up a dangling external process because it will end itself as soon as the Port is closed or the BEAM exits. If the external process is known to not behave this way, the recommendation is to wrap it in a shell script which converts a closed stdin into a kill signal.<ref>https://hexdocs.pm/elixir/main/Port.html#module-orphan-operating-system-processes</ref>

== BEAM internal and external processes ==
[[W:BEAM (Erlang virtual machine)|BEAM]] applications are built out of supervision trees and excel at managing huge numbers of parallel actor processes, all scheduled internally. Although the communities' mostly share a philosophy of running as much as possible inside of the VM because it builds on this strength, and simplifies away much interface glue and context switching, on many occasions it will still start an external OS process. There are some straightforward ways to simply run a command line, which might be familiar to programmers coming from another language: <code>[https://www.erlang.org/doc/apps/kernel/os.html#cmd/2 os:cmd]</code> takes a string and runs the thing. At a lower level, external programs are managed through a [https://www.erlang.org/doc/system/ports.html Port] which is a flexible abstraction allowing a backend driver to communicate data in and out, and to send some control signals such as reporting an external process's exit and exit status.

When it comes to internal processes, BEAM is among the most mature and robust, achieved by good isolation and by its hierarchical [https://www.erlang.org/doc/system/sup_princ supervisors] liberally pruning entire subprocess trees at the first sign of going out of specification. But for external processes, results are mixed. Some programs are twitchy and crash easily, for example <code>cat</code>, but others like the BEAM itself or a long-running server are built to survive any ordinary I/O glitch or accidental mashing of the keyboard. Furthermore, this will usually be a fundamental assumption of that program and there will be no configuration to make the program behave differently depending on stimulus.

== Reliable clean up ==
What I discovered is that the BEAM external process library assumes that its spawned processes will respond to standard input and output shutting down or so called end of file, for example what happens when <control>-d is typed into the shell. This works very well for a subprocess like <code>bash</code> but has no effect on a program like <code>sleep</code> or <code>rsync</code>.

The hole created by this mismatch is interestingly solved by something shaped like the BEAM's supervisor itself. I would expect the VM to spawn many processes as necessary, but I wouldn't expect the child process to outlive the VM, just because it happens to be insensitive to end of file. Instead, I was hoping that the VM would try harder to kill these processes as the Port is closed, or if the VM halts.

In fact, letting a child process outlive the one that spawned it is unusual enough that the condition is called an "orphan process". The POSIX standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists, but this is a "should have" and not a must. The reason it can be undesirable to allow this to happen at all is that the orphan process becomes entirely responsible for itself, potentially running forever without any more intervention according to the purpose of the process. Even the system init process tracks its children, and can restart them in response to service commands. Init will know nothing about its adopted, orphan processes.

When I ran into this issue, I found the suggested workaround of writing a [https://hexdocs.pm/elixir/1.18.3/Port.html#module-zombie-operating-system-processes wrapper script] to track its child (the program originally intended to run), listen for the end of file from BEAM, and kill the external program. How much simpler it would be if this workaround were already built into the Erlang Port module!

It's always a pleasure to ask questions in the BEAM communities, they have earned a reputation as being friendly and open. The first big tip was to look at the third-party library [https://hexdocs.pm/erlexec/ erlexec], which demonstrates some best practices that might be backported into the language itself. Everyone speaking on the problem has generally agreed that the fragile clean up of external processes is a bug, and supported the idea that one of the "terminate" signals should be sent to spawned programs.

Which signal to use is still an open issue, there's a softer version <code>HUP</code> which says "Goodbye!" and the program is free to interpret as it will, the mid-level <code>TERM</code> that I prefer because it makes the intention explicit but can still be blocked or handled gracefully if needed, and <code>KILL</code> which is bursting with destructive potential. The world of unix signals is a wild and scary place, on which there's a refreshing diversity of opinion around the Internet.

== Inside the BEAM ==
Despite its retro-futuristic appearance of being one of the most time-tested yet forward-facing programming environments, I was brought back to Earth by digging around inside the VM to find that it's just a C program like any other. There's nothing holy about the BEAM emulator, there are some good and some great ideas about functional languages and they're buried in a mass of ancient procedural ifdefs, with unnerving memory management and typedefs wrapping the size of an integer on various platforms, just like you might find in other relics from the dark ages of computing, next to the Firefox or linux kernel source code.

Tantalizingly, message-passing is at the core of the VM, but is not a first-class concept when reaching out to external processes. There's some fancy footwork with [[W:Anonymous pipe|pipes]] and [[W:Dup (system call)|dup]], but communication is done with enums, unions, and bit-rattling stdlib. I love it, but... it might something to look at on another rainy day.

Elixir/Ports and external process wiring

2025-10-16T17:12:13Z

Adamw: lots of background detail

Elixir/Ports and external process wiring

2025-10-16T07:00:03Z

Adamw: increase all heading levels

== Challenge: controlling "rsync" ==
This exploration began as I wrote a simple library to run rsync from Elixir.<ref>https://hexdocs.pm/rsync/Rsync.html</ref> I was hoping to learn how to interface with long-lived external processes, in this case to transfer files and monitor progress. Starting and reading from rsync went very well, thanks to the <code>--info=progress2</code> option which reports progress in a fairly machine-readable format. I was able to start the file transfer, capture status, and report it back to the Elixir caller in various ways.

My library starts rsync using a low-level <code>Port</code> call, which maps directly to the base Erlang open_port<ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref> implementation:<syntaxhighlight lang="elixir">
Port.open(
{:spawn_executable, rsync_path},
[
:binary,
:exit_status,
:hide,
:use_stdio,
:stderr_to_stdout,
args:
~w(-a --info=progress2) ++
rsync_args ++
sources ++
[args[:target]],
env: env
]
)
</syntaxhighlight>

== Problem: runaway processes ==
Since I was calling my rsync library from an application under development, I would often kill the program abruptly by crashing or by typing <control>-C in the terminal. What I found is that the rsync transfer would continue to run in the background even after Elixir had completely shut down.

That would have to change—leaving overlapping file transfers running unmonitored is exactly what I wanted to avoid by having Elixir control the process in the first place.

== Bad assumption: pipe-like processes ==
A common use case is to use external processes for something like compression and decompression. A program like <code>gzip</code> or <code>cat</code> will stop once it detects that its input has ended, using a C system call like this:<syntaxhighlight lang="c">
ssize_t n_read = read (input_desc, buf, bufsize);
if (n_read < 0) { error... }
if (n_read == 0) { end of file... }
</syntaxhighlight>The manual for read<ref>https://man.archlinux.org/man/read.2</ref> explains that reading 0 bytes indicates the end of file, and a negative number indicates an error such as the input file descriptor already being closed.

BEAM assumes the connected process behaves like this, so nothing needs to be done to clean up a dangling external process because it will end itself as soon as the Port is closed or the BEAM exits. If the external process is known to not behave this way, the recommendation is to wrap it in a shell script which converts a closed stdin into a kill signal.<ref>https://hexdocs.pm/elixir/main/Port.html#module-orphan-operating-system-processes</ref>

== BEAM internal and external processes ==
[[W:BEAM (Erlang virtual machine)|BEAM]] applications are built out of supervision trees and excel at managing huge numbers of parallel actor processes, all scheduled internally. Although the communities' mostly share a philosophy of running as much as possible inside of the VM because it builds on this strength, and simplifies away much interface glue and context switching, on many occasions it will still start an external OS process. There are some straightforward ways to simply run a command line, which might be familiar to programmers coming from another language: <code>[https://www.erlang.org/doc/apps/kernel/os.html#cmd/2 os:cmd]</code> takes a string and runs the thing. At a lower level, external programs are managed through a [https://www.erlang.org/doc/system/ports.html Port] which is a flexible abstraction allowing a backend driver to communicate data in and out, and to send some control signals such as reporting an external process's exit and exit status.

When it comes to internal processes, BEAM is among the most mature and robust, achieved by good isolation and by its hierarchical [https://www.erlang.org/doc/system/sup_princ supervisors] liberally pruning entire subprocess trees at the first sign of going out of specification. But for external processes, results are mixed. Some programs are twitchy and crash easily, for example <code>cat</code>, but others like the BEAM itself or a long-running server are built to survive any ordinary I/O glitch or accidental mashing of the keyboard. Furthermore, this will usually be a fundamental assumption of that program and there will be no configuration to make the program behave differently depending on stimulus.

== Reliable clean up ==
What I discovered is that the BEAM external process library assumes that its spawned processes will respond to standard input and output shutting down or so called end of file, for example what happens when <control>-d is typed into the shell. This works very well for a subprocess like <code>bash</code> but has no effect on a program like <code>sleep</code> or <code>rsync</code>.

The hole created by this mismatch is interestingly solved by something shaped like the BEAM's supervisor itself. I would expect the VM to spawn many processes as necessary, but I wouldn't expect the child process to outlive the VM, just because it happens to be insensitive to end of file. Instead, I was hoping that the VM would try harder to kill these processes as the Port is closed, or if the VM halts.

In fact, letting a child process outlive the one that spawned it is unusual enough that the condition is called an "orphan process". The POSIX standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists, but this is a "should have" and not a must. The reason it can be undesirable to allow this to happen at all is that the orphan process becomes entirely responsible for itself, potentially running forever without any more intervention according to the purpose of the process. Even the system init process tracks its children, and can restart them in response to service commands. Init will know nothing about its adopted, orphan processes.

When I ran into this issue, I found the suggested workaround of writing a [https://hexdocs.pm/elixir/1.18.3/Port.html#module-zombie-operating-system-processes wrapper script] to track its child (the program originally intended to run), listen for the end of file from BEAM, and kill the external program. How much simpler it would be if this workaround were already built into the Erlang Port module!

It's always a pleasure to ask questions in the BEAM communities, they have earned a reputation as being friendly and open. The first big tip was to look at the third-party library [https://hexdocs.pm/erlexec/ erlexec], which demonstrates some best practices that might be backported into the language itself. Everyone speaking on the problem has generally agreed that the fragile clean up of external processes is a bug, and supported the idea that one of the "terminate" signals should be sent to spawned programs.

Which signal to use is still an open issue, there's a softer version <code>HUP</code> which says "Goodbye!" and the program is free to interpret as it will, the mid-level <code>TERM</code> that I prefer because it makes the intention explicit but can still be blocked or handled gracefully if needed, and <code>KILL</code> which is bursting with destructive potential. The world of unix signals is a wild and scary place, on which there's a refreshing diversity of opinion around the Internet.

== Inside the BEAM ==
Despite its retro-futuristic appearance of being one of the most time-tested yet forward-facing programming environments, I was brought back to Earth by digging around inside the VM to find that it's just a C program like any other. There's nothing holy about the BEAM emulator, there are some good and some great ideas about functional languages and they're buried in a mass of ancient procedural ifdefs, with unnerving memory management and typedefs wrapping the size of an integer on various platforms, just like you might find in other relics from the dark ages of computing, next to the Firefox or linux kernel source code.

Tantalizingly, message-passing is at the core of the VM, but is not a first-class concept when reaching out to external processes. There's some fancy footwork with [[W:Anonymous pipe|pipes]] and [[W:Dup (system call)|dup]], but communication is done with enums, unions, and bit-rattling stdlib. I love it, but... it might something to look at on another rainy day.

Elixir/Ports and external process wiring

2025-10-16T06:27:40Z

Adamw: Add some introduction

==== Challenge: controlling "rsync" ====
This exploration began as I wrote a simple library to run rsync from Elixir.<ref>https://hexdocs.pm/rsync/Rsync.html</ref> I was hoping to learn how to interface with long-lived external processes, in this case to transfer files and monitor progress. Starting and reading from rsync went very well, thanks to the <code>--info=progress2</code> option which reports progress in a fairly machine-readable format. I was able to start the file transfer, capture status, and report it back to the Elixir caller in various ways.

My library starts rsync using a low-level <code>Port</code> call, which maps directly to the base Erlang open_port<ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref> implementation:<syntaxhighlight lang="elixir">
Port.open(
{:spawn_executable, rsync_path},
[
:binary,
:exit_status,
:hide,
:use_stdio,
:stderr_to_stdout,
args:
~w(-a --info=progress2) ++
rsync_args ++
sources ++
[args[:target]],
env: env
]
)
</syntaxhighlight>

==== Problem: runaway processes ====
Since I was calling my rsync library from an application under development, I would often kill the program abruptly by crashing or by typing <control>-C in the terminal. What I found is that the rsync transfer would continue to run in the background even after Elixir had completely shut down.

That would have to change—leaving overlapping file transfers running unmonitored is exactly what I wanted to avoid by having Elixir control the process in the first place.

==== Bad assumption: pipe-like processes ====
A common use case is to use external processes for something like compression and decompression. A program like <code>gzip</code> or <code>cat</code> will stop once it detects that its input has ended, using a C system call like this:<syntaxhighlight lang="c">
ssize_t n_read = read (input_desc, buf, bufsize);
if (n_read < 0) { error... }
if (n_read == 0) { end of file... }
</syntaxhighlight>The manual for read<ref>https://man.archlinux.org/man/read.2</ref> explains that reading 0 bytes indicates the end of file, and a negative number indicates an error such as the input file descriptor already being closed.

BEAM assumes the connected process behaves like this, so nothing needs to be done to clean up a dangling external process because it will end itself as soon as the Port is closed or the BEAM exits. If the external process is known to not behave this way, the recommendation is to wrap it in a shell script which converts a closed stdin into a kill signal.<ref>https://hexdocs.pm/elixir/main/Port.html#module-orphan-operating-system-processes</ref>

==== BEAM internal and external processes ====
[[W:BEAM (Erlang virtual machine)|BEAM]] applications are built out of supervision trees and excel at managing huge numbers of parallel actor processes, all scheduled internally. Although the communities' mostly share a philosophy of running as much as possible inside of the VM because it builds on this strength, and simplifies away much interface glue and context switching, on many occasions it will still start an external OS process. There are some straightforward ways to simply run a command line, which might be familiar to programmers coming from another language: <code>[https://www.erlang.org/doc/apps/kernel/os.html#cmd/2 os:cmd]</code> takes a string and runs the thing. At a lower level, external programs are managed through a [https://www.erlang.org/doc/system/ports.html Port] which is a flexible abstraction allowing a backend driver to communicate data in and out, and to send some control signals such as reporting an external process's exit and exit status.

When it comes to internal processes, BEAM is among the most mature and robust, achieved by good isolation and by its hierarchical [https://www.erlang.org/doc/system/sup_princ supervisors] liberally pruning entire subprocess trees at the first sign of going out of specification. But for external processes, results are mixed. Some programs are twitchy and crash easily, for example <code>cat</code>, but others like the BEAM itself or a long-running server are built to survive any ordinary I/O glitch or accidental mashing of the keyboard. Furthermore, this will usually be a fundamental assumption of that program and there will be no configuration to make the program behave differently depending on stimulus.

==== Reliable clean up ====
What I discovered is that the BEAM external process library assumes that its spawned processes will respond to standard input and output shutting down or so called end of file, for example what happens when <control>-d is typed into the shell. This works very well for a subprocess like <code>bash</code> but has no effect on a program like <code>sleep</code> or <code>rsync</code>.

The hole created by this mismatch is interestingly solved by something shaped like the BEAM's supervisor itself. I would expect the VM to spawn many processes as necessary, but I wouldn't expect the child process to outlive the VM, just because it happens to be insensitive to end of file. Instead, I was hoping that the VM would try harder to kill these processes as the Port is closed, or if the VM halts.

In fact, letting a child process outlive the one that spawned it is unusual enough that the condition is called an "orphan process". The POSIX standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists, but this is a "should have" and not a must. The reason it can be undesirable to allow this to happen at all is that the orphan process becomes entirely responsible for itself, potentially running forever without any more intervention according to the purpose of the process. Even the system init process tracks its children, and can restart them in response to service commands. Init will know nothing about its adopted, orphan processes.

When I ran into this issue, I found the suggested workaround of writing a [https://hexdocs.pm/elixir/1.18.3/Port.html#module-zombie-operating-system-processes wrapper script] to track its child (the program originally intended to run), listen for the end of file from BEAM, and kill the external program. How much simpler it would be if this workaround were already built into the Erlang Port module!

It's always a pleasure to ask questions in the BEAM communities, they have earned a reputation as being friendly and open. The first big tip was to look at the third-party library [https://hexdocs.pm/erlexec/ erlexec], which demonstrates some best practices that might be backported into the language itself. Everyone speaking on the problem has generally agreed that the fragile clean up of external processes is a bug, and supported the idea that one of the "terminate" signals should be sent to spawned programs.

Which signal to use is still an open issue, there's a softer version <code>HUP</code> which says "Goodbye!" and the program is free to interpret as it will, the mid-level <code>TERM</code> that I prefer because it makes the intention explicit but can still be blocked or handled gracefully if needed, and <code>KILL</code> which is bursting with destructive potential. The world of unix signals is a wild and scary place, on which there's a refreshing diversity of opinion around the Internet.

==== Inside the BEAM ====
Despite its retro-futuristic appearance of being one of the most time-tested yet forward-facing programming environments, I was brought back to Earth by digging around inside the VM to find that it's just a C program like any other. There's nothing holy about the BEAM emulator, there are some good and some great ideas about functional languages and they're buried in a mass of ancient procedural ifdefs, with unnerving memory management and typedefs wrapping the size of an integer on various platforms, just like you might find in other relics from the dark ages of computing, next to the Firefox or linux kernel source code.

Tantalizingly, message-passing is at the core of the VM, but is not a first-class concept when reaching out to external processes. There's some fancy footwork with [[W:Anonymous pipe|pipes]] and [[W:Dup (system call)|dup]], but communication is done with enums, unions, and bit-rattling stdlib. I love it, but... it might something to look at on another rainy day.

Elixir/Ports and external process wiring

2025-03-11T08:00:29Z

Adamw: link

==== BEAM internal and external processes ====
[[W:BEAM (Erlang virtual machine)|BEAM]] applications are built out of supervision trees and excel at managing huge numbers of parallel actor processes, all scheduled internally. Although the communities' mostly share a philosophy of running as much as possible inside of the VM because it builds on this strength, and simplifies away much interface glue and context switching, on many occasions it will still start an external OS process. There are some straightforward ways to simply run a command line, which might be familiar to programmers coming from another language: <code>[https://www.erlang.org/doc/apps/kernel/os.html#cmd/2 os:cmd]</code> takes a string and runs the thing. At a lower level, external programs are managed through a [https://www.erlang.org/doc/system/ports.html Port] which is a flexible abstraction allowing a backend driver to communicate data in and out, and to send some control signals such as reporting an external process's exit and exit status.

When it comes to internal processes, BEAM is among the most mature and robust, achieved by good isolation and by its hierarchical [https://www.erlang.org/doc/system/sup_princ supervisors] liberally pruning entire subprocess trees at the first sign of going out of specification. But for external processes, results are mixed. Some programs are twitchy and crash easily, for example <code>cat</code>, but others like the BEAM itself or a long-running server are built to survive any ordinary I/O glitch or accidental mashing of the keyboard. Furthermore, this will usually be a fundamental assumption of that program and there will be no configuration to make the program behave differently depending on stimulus.

==== Reliable clean up ====
What I discovered is that the BEAM external process library assumes that its spawned processes will respond to standard input and output shutting down or so called end of file, for example what happens when <control>-d is typed into the shell. This works very well for a subprocess like <code>bash</code> but has no effect on a program like <code>sleep</code> or <code>rsync</code>.

The hole created by this mismatch is interestingly solved by something shaped like the BEAM's supervisor itself. I would expect the VM to spawn many processes as necessary, but I wouldn't expect the child process to outlive the VM, just because it happens to be insensitive to end of file. Instead, I was hoping that the VM would try harder to kill these processes as the Port is closed, or if the VM halts.

In fact, letting a child process outlive the one that spawned it is unusual enough that the condition is called an "orphan process". The POSIX standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists, but this is a "should have" and not a must. The reason it can be undesirable to allow this to happen at all is that the orphan process becomes entirely responsible for itself, potentially running forever without any more intervention according to the purpose of the process. Even the system init process tracks its children, and can restart them in response to service commands. Init will know nothing about its adopted, orphan processes.

When I ran into this issue, I found the suggested workaround of writing a [https://hexdocs.pm/elixir/1.18.3/Port.html#module-zombie-operating-system-processes wrapper script] to track its child (the program originally intended to run), listen for the end of file from BEAM, and kill the external program. How much simpler it would be if this workaround were already built into the Erlang Port module!

It's always a pleasure to ask questions in the BEAM communities, they have earned a reputation as being friendly and open. The first big tip was to look at the third-party library [https://hexdocs.pm/erlexec/ erlexec], which demonstrates some best practices that might be backported into the language itself. Everyone speaking on the problem has generally agreed that the fragile clean up of external processes is a bug, and supported the idea that one of the "terminate" signals should be sent to spawned programs.

Which signal to use is still an open issue, there's a softer version <code>HUP</code> which says "Goodbye!" and the program is free to interpret as it will, the mid-level <code>TERM</code> that I prefer because it makes the intention explicit but can still be blocked or handled gracefully if needed, and <code>KILL</code> which is bursting with destructive potential. The world of unix signals is a wild and scary place, on which there's a refreshing diversity of opinion around the Internet.

==== Inside the BEAM ====
Despite its retro-futuristic appearance of being one of the most time-tested yet forward-facing programming environments, I was brought back to Earth by digging around inside the VM to find that it's just a C program like any other. There's nothing holy about the BEAM emulator, there are some good and some great ideas about functional languages and they're buried in a mass of ancient procedural ifdefs, with unnerving memory management and typedefs wrapping the size of an integer on various platforms, just like you might find in other relics from the dark ages of computing, next to the Firefox or linux kernel source code.

Tantalizingly, message-passing is at the core of the VM, but is not a first-class concept when reaching out to external processes. There's some fancy footwork with [[W:Anonymous pipe|pipes]] and [[W:Dup (system call)|dup]], but communication is done with enums, unions, and bit-rattling stdlib. I love it, but... it might something to look at on another rainy day.

Elixir/Ports and external process wiring

2025-03-10T16:25:03Z

Adamw: early draft

==== BEAM internal and external processes ====
"beam" applications are built out of supervision trees and excel at managing huge numbers of parallel actor processes, all scheduled internally. Although the communities' mostly share a philosophy of running as much as possible inside of the VM because it builds on this strength, and simplifies away much interface glue and context switching, on many occasions it will still start an external OS process. There are some straightforward ways to simply run a command line, which might be familiar to programmers coming from another language: <code>[https://www.erlang.org/doc/apps/kernel/os.html#cmd/2 os:cmd]</code> takes a string and runs the thing. At a lower level, external programs are managed through a [https://www.erlang.org/doc/system/ports.html Port] which is a flexible abstraction allowing a backend driver to communicate data in and out, and to send some control signals such as reporting an external process's exit and exit status.

When it comes to internal processes, BEAM is among the most mature and robust, achieved by good isolation and by its hierarchical [https://www.erlang.org/doc/system/sup_princ supervisors] liberally pruning entire subprocess trees at the first sign of going out of specification. But for external processes, results are mixed. Some programs are twitchy and crash easily, for example <code>cat</code>, but others like the BEAM itself or a long-running server are built to survive any ordinary I/O glitch or accidental mashing of the keyboard. Furthermore, this will usually be a fundamental assumption of that program and there will be no configuration to make the program behave differently depending on stimulus.

==== Reliable clean up ====
What I discovered is that the BEAM external process library assumes that its spawned processes will respond to standard input and output shutting down or so called end of file, for example what happens when <control>-d is typed into the shell. This works very well for a subprocess like <code>bash</code> but has no effect on a program like <code>sleep</code> or <code>rsync</code>.

The hole created by this mismatch is interestingly solved by something shaped like the BEAM's supervisor itself. I would expect the VM to spawn many processes as necessary, but I wouldn't expect the child process to outlive the VM, just because it happens to be insensitive to end of file. Instead, I was hoping that the VM would try harder to kill these processes as the Port is closed, or if the VM halts.

In fact, letting a child process outlive the one that spawned it is unusual enough that the condition is called an "orphan process". The POSIX standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists, but this is a "should have" and not a must. The reason it can be undesirable to allow this to happen at all is that the orphan process becomes entirely responsible for itself, potentially running forever without any more intervention according to the purpose of the process. Even the system init process tracks its children, and can restart them in response to service commands. Init will know nothing about its adopted, orphan processes.

When I ran into this issue, I found the suggested workaround of writing a [https://hexdocs.pm/elixir/1.18.3/Port.html#module-zombie-operating-system-processes wrapper script] to track its child (the program originally intended to run), listen for the end of file from BEAM, and kill the external program. How much simpler it would be if this workaround were already built into the Erlang Port module!

It's always a pleasure to ask questions in the BEAM communities, they have earned a reputation as being friendly and open. The first big tip was to look at the third-party library [https://hexdocs.pm/erlexec/ erlexec], which demonstrates some best practices that might be backported into the language itself. Everyone speaking on the problem has generally agreed that the fragile clean up of external processes is a bug, and supported the idea that one of the "terminate" signals should be sent to spawned programs.

Which signal to use is still an open issue, there's a softer version <code>HUP</code> which says "Goodbye!" and the program is free to interpret as it will, the mid-level <code>TERM</code> that I prefer because it makes the intention explicit but can still be blocked or handled gracefully if needed, and <code>KILL</code> which is bursting with destructive potential. The world of unix signals is a wild and scary place, on which there's a refreshing diversity of opinion around the Internet.

==== Inside the BEAM ====
Despite its retro-futuristic appearance of being one of the most time-tested yet forward-facing programming environments, I was brought back to Earth by digging around inside the VM to find that it's just a C program like any other. There's nothing holy about the BEAM emulator, there are some good and some great ideas about functional languages and they're buried in a mass of ancient procedural ifdefs, with unnerving memory management and typedefs wrapping the size of an integer on various platforms, just like you might find in other relics from the dark ages of computing, next to the Firefox or linux kernel source code.

Tantalizingly, message-passing is at the core of the VM, but is not a first-class concept when reaching out to external processes. There's some fancy footwork with [[W:Anonymous pipe|pipes]] and [[W:Dup (system call)|dup]], but communication is done with enums, unions, and bit-rattling stdlib. I love it, but... it might something to look at on another rainy day.

Draft:Elixir/bzip2-ex

2025-02-14T12:48:06Z

Adamw: Found another library, bzip2_decomp

An adventure story of my first Erlang/Elixir library binding (NIF).

''Adam Wight, Sept 2022''

{{Project|url=https://gitlab.com/adamwight/bzip2-ex}}

== Problem statement ==
[[File:Phap Nang Ngam Nai Wannakhadi (1964, p 60).jpg|thumb|Phap Nang Ngam Nai Wannakhadi (1964, p 60). [This painting is not titled, "Picking the low-hanging fruit". -AW]]I wanted to process some large, compressed files containing Wikipedia content<ref>https://dumps.wikimedia.org/backup-index.html</ref>, which couldn't be expanded in-place. The typical approach to this problem is to stream the decompressed data through the desired analysis in memory and then throw it away.

Decompression can be accomplished by piping through an external, command-line tool or by reading the file using a native Elixir codec. In my case, I chose to mix these approaches by untarring through tar using a Port, but writing a native bzip2 library to perform the decompression, since none existed at the time.

In hindsight, it would have been much simpler to use command-line bunzip2. The native library should make it possible to use backpressure and concurrency. But mostly I just got excited about a small gap in the BEAM ecosystem and wanted to teach myself how to write an Erlang native implemented function, or NIF<ref>https://www.erlang.org/doc/apps/erts/erl_nif</ref>.

How hard could it be to write a little binding...

==Mysteries of libbzip2==

The first interesting obstacle was that development of official bzip2 has stopped at the last stable release, with v1.0.x in 2019.<ref>The project page for bzip2 v1.0 is https://sourceware.org/bzip2/.</ref> A new group of people has been working towards a fork<ref>https://gitlab.com/bzip2/bzip2/</ref> that they're calling version "1.1" but hopefully will avoid breaking changes to the programming interface. This is still unreleased as of 2025.

The second point worth mentioning is that the bzip2 file format has no formal specification. This situation is pretty common and I can't complain, because there's a brilliant reverse-engineering<ref>https://github.com/dsnet/compress/blob/master/doc/bzip2-format.pdf</ref> effort which included the details I needed.

==High- or low-level integration?==
As I mentioned, my own project was already in hard mode due to the decision to write a NIF at all. But there was a second choice, between a the libbzip2 high-level interface<ref>https://sourceware.org/bzip2/manual/manual.html#hl-interface</ref> which does everything for you: open the file and return its contents decompressed, or the low-level interface<ref>https://sourceware.org/bzip2/manual/manual.html#low-level</ref> which works with block or even sub-block chunks of data.

Here I learned the most important requirement of a NIF binding: it does work within the BEAM memory and process space but it must return control to the Elixir scheduler within a very short time period, less than 100ms or so. Low-level it is, then!

If you want to look into yet another approach, Moosieus<ref>https://github.com/Moosieus/bzip2_decomp</ref> has written an Elixir binding for pure Rust bzip2-rs<ref>https://github.com/paolobarbolini/bzip2-rs</ref>. This looks good for decompression, but executes in a single run rather than streaming.

==Native implemented function (NIF)==
[[File:Potato_sprout,_January_23,_2006.jpg|right|267x267px]]TODO...

== Parallel processing ==
TODO: Stream vs block, what can write multiple streams, what are the challenges of detecting blocks...

==References==
<references />

Draft:Elixir/bzip2-ex

2025-02-14T08:26:22Z

Adamw: rewrite

A chronicle of my first Erlang/Elixir library binding (NIF).

''Adam Wight, Sept 2022''

{{Project|url=https://gitlab.com/adamwight/bzip2-ex}}

== Problem statement ==
[[File:Phap Nang Ngam Nai Wannakhadi (1964, p 60).jpg|thumb|Phap Nang Ngam Nai Wannakhadi (1964, p 60). [This painting is not titled, "Picking the low-hanging fruit". -AW]]I wanted to process some large, compressed files containing Wikipedia content<ref>https://dumps.wikimedia.org/backup-index.html</ref>, which couldn't be expanded in place. The typical approach to this problem is to stream the decompressed data through the desired analysis in memory and then throw it away.

Decompression can be accomplished by piping through an external, command-line tool or by reading the file using a native Elixir codec. In my case, I chose to mix these approaches by untarring through tar using a Port, but use a native bzip2 library to perform the decompression.

In hindsight, it would have been much simpler to use command-line bunzip2. The native library should make it possible to use backpressure and concurrency. But mostly I just got excited about a small gap in the BEAM ecosystem and wanted to teach myself how to write an Erlang native implemented function, or NIF<ref>https://www.erlang.org/doc/apps/erts/erl_nif</ref>.

How hard could it be to write a little binding...

==Mysteries of libbzip2==

The first interesting obstacle was that development of official bzip2 has stopped at the last stable release, with v1.0.x in 2019.<ref>The project page for bzip2 v1.0 is https://sourceware.org/bzip2/.</ref> A new group of people has been working towards a fork<ref>https://gitlab.com/bzip2/bzip2/</ref> that they're calling version "1.1" but hopefully will avoid breaking changes to the programming interface. This is still unreleased as of 2025.

The second point worth mentioning is that the bzip2 file format has no formal specification. This situation is pretty common and I can't complain, because there's a brilliant reverse-engineering<ref>https://github.com/dsnet/compress/blob/master/doc/bzip2-format.pdf</ref> effort which included the details I needed.

==High- or low-level integration?==
As I mentioned, my own project was already in hard mode due to the decision to write a NIF at all. But there was a second choice, between a the libbzip2 high-level interface<ref>https://sourceware.org/bzip2/manual/manual.html#hl-interface</ref> which does everything for you: open the file and return its contents decompressed, or the low-level interface<ref>https://sourceware.org/bzip2/manual/manual.html#low-level</ref> which works with block or even sub-block chunks of data.

Here I learned the most important requirement of a NIF binding: it does work within the BEAM memory and process space but it must return control to the Elixir scheduler within a very short time period, less than 100ms or so. Low-level it is, then!

==Native implemented function (NIF)==
[[File:Potato_sprout,_January_23,_2006.jpg|right|267x267px]]TODO...

== Parallel processing ==
TODO: Stream vs block, what can write multiple streams, what are the challenges of detecting blocks...

==References==
<references />

Main Page

2025-01-16T22:09:44Z

Adamw:

{{DISPLAYTITLE:Luddnet}}
[[File:Dice MET sf48-101-211a.jpg|alt=Thousand-year-old bone game die from the Islamic city of Nishapur.|left|96x96px|]]
<div style="font-size: 300%;">[[Special:Random|Try a random page?]]</div>

Main Page

2025-01-16T22:08:43Z

Adamw: image, random page, try to suppress title

{{DISPLAYTITLE:}}
[[File:Dice MET sf48-101-211a.jpg|alt=Thousand-year-old bone game die from the Islamic city of Nishapur.|left|96x96px|]]
<div style="font-size: 300%;">[[Special:Random|Try a random page?]]</div>

Ludd

2025-01-16T21:47:20Z

Adamw: Link to Blood in the Machine

'''Ludd''' can refer to:
* [[w:Ned Ludd|Ned Ludd]], fictitious general of the [[w:Luddite|Luddite]] movement
* [[w:Lludd Llaw Eraint|Lludd Llaw Eraint]], mythological hero in Welsh mythology who rids Britain of three "plagues"
* [[w:Nuada|Nuada]], figure in Irish mythology

==See also==
* <cite>Merchant, Brian (2023). ''Blood in the Machine''. [https://www.hachettebookgroup.com/titles/brian-merchant/blood-in-the-machine/9780316487740/ Little, Brown].</cite>
* [[w:Lud (disambiguation)|Lud]], something else entirely
* [[w:Lod|Lod]], a city in Israel (formerly Lydda)

[[Category:Disambiguation pages]]

Ludd

2025-01-16T19:06:02Z

Adamw: Strip prefix from interwiki links

Ludd

2025-01-16T19:04:40Z

Adamw: Disambiguation target for domain homepage

'''Ludd''' can refer to:
* [[w:Ned Ludd]], fictitious general of the [[w:Luddite]] movement
* [[w:Lludd Llaw Eraint]], mythological hero in Welsh mythology who rids Britain of three "plagues"
* [[w:Nuada]], figure in Irish mythology

==See also==
* [[Lud (disambiguation)]]
* [[Lod]], a city in Israel (formerly Lydda)

[[Category:Disambiguation pages]]