Elixir/Ports and external process wiring: Difference between revisions

Adamw (talk | contribs)
c/e first page
Adamw (talk | contribs)
 
(12 intermediate revisions by the same user not shown)
Line 4: Line 4:
{{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}}
{{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}}


My exploration begins while writing a beta-quality library for Elixir to transfer files in the background and monitor progress, using rsync.
My exploration begins while writing a beta-quality library for Elixir to transfer files in the background and monitor progress using rsync.


{{Aside|text=[[w:rsync|Rsync]] is usually the best tool for file transfer, locally or over a network. It can resume incomplete transfers and synchronize directories efficiently, and it's complex enough that nobody is reimplementing it in pure Erlang any time soon.}}
I was excited to learn how to interface with long-lived external processes—and this project offered more than I hoped for.


I was excited to learn how to interface with long-lived external processes—and this project offered more than I hoped for.
{{Aside|text=<p>[[w:rsync|Rsync]] is the standard utility for file transfers, locally or over a network.  It can resume incomplete transfers and synchronize directories efficiently, and after almost 30 years of usage rsync can be trusted to handle any edge case.</p>
<p>BEAM<ref>The virtual machine shared by Erlang, Elixir, Gleam, Ash, and so on: [https://blog.stenmans.org/theBeamBook/ the BEAM Book]</ref> is a fairly unique ecosystem in which it's not considered deviant to reinvent a rounder wheel: an external dependency like "cron" will often be ported into native Erlang—but the complexity of rsync and its dependence on a matching remote daemon makes it unlikely that it will be rewritten any time soon, which is why I've decided to wrap external command execution in a library.</p>}}


[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|300x300px]]
[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|300x300px]]


=== Naive shelling ===
=== Naïve shelling ===
 


Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
System.shell("rsync -a source target")
System.shell("rsync -a source target")
</syntaxhighlight>
</syntaxhighlight>
This has a few shortcomings, starting with how we pass the filenames.  It's possible to have a dynamic path coming from string interpolation like <code>#{source}</code> but this gets risky: consider what happens if the filenames include whitespace or even special shell characters such as ";".
This has a few shortcomings, starting with how one would pass it dynamic paths.  It's unsafe to use string interpolation (<code>"#{source}"</code> ): consider what could happen if the filenames include unescaped whitespace or special shell characters such as ";".


=== Safe path handling ===
=== Safe path handling ===
Skipping ahead to <code>System.cmd</code>, which takes a raw argv and can't be fooled special characters in the path arguments:<syntaxhighlight lang="elixir">
We turn next to <code>System.cmd</code>, which takes a raw argv and can't be fooled special characters in the path arguments:<syntaxhighlight lang="elixir">
System.find_executable(rsync_path)
System.find_executable(rsync_path)
|> System.cmd([~w(-a), source, target])
|> System.cmd([~w(-a), source, target])
</syntaxhighlight>For a short job this would be fine, but during longer transfers our program loses control and we have to wait indefinitely for the monolithic command to finish.
</syntaxhighlight>For a short job this is perfect, but for longer transfers our program loses control and observability, waiting indefinitely for a monolithic command to return.


=== Asynchronous call and communication ===
=== Asynchronous call and communication ===
To run a external process asynchronously we will reach for Elixir's low-level <code>Port.open</code> which passes all of its parameters directly<ref>See the [https://github.com/elixir-lang/elixir/blob/809b035dccf046b7b7b4422f42cfb6d075df71d2/lib/elixir/lib/port.ex#L232 port.ex source code]</ref> to ERTS <code>open_port</code><ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref>.  These functions are tremendously flexible, here we turn a few knobs:<syntaxhighlight lang="elixir">
To run a external process asynchronously we reach for Elixir's low-level <code>Port.open</code>, nothing but a one-line wrapper<ref>See the [https://github.com/elixir-lang/elixir/blob/809b035dccf046b7b7b4422f42cfb6d075df71d2/lib/elixir/lib/port.ex#L232 port.ex source code]</ref> passing its parameters directly to ERTS <code>open_port</code><ref>[https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2 Erlang <code>open_port</code> docs]</ref>.  This function is tremendously flexible, here we turn a few knobs:<syntaxhighlight lang="elixir">
Port.open(
Port.open(
   {:spawn_executable, rsync_path},
   {:spawn_executable, rsync_path},
Line 44: Line 44:
   ]
   ]
)
)
</syntaxhighlight>
Progress lines come in with a fairly self-explanatory format:
<syntaxhighlight lang="text">
      3,342,336  33%    3.14MB/s    0:00:02
</syntaxhighlight>
</syntaxhighlight>


{{Aside|text=
{{Aside|text=
rsync has a variety of progress options, we chose overall progress above so the meaning of the percentage is "overall percent complete".
'''Rsync progress reporting options'''


Here is the menu of alternatives:
There are a variety of ways to report progress:


; <code>--info=progress2</code> : report overall progress
; <code>-v</code> : list each filename as it's transferred


; <code>--progress</code> : report statistics per file
; <code>--progress</code> : report statistics per file
; <code>--info=progress2</code> : report overall progress


; <code>--itemize-changes</code> : list the operations taken on each file
; <code>--itemize-changes</code> : list the operations taken on each file


; <code>--out-format=FORMAT</code> : any format using parameters from rsyncd.conf's <code>log format</code><ref>https://man.freebsd.org/cgi/man.cgi?query=rsyncd.conf</ref>
; <code>--out-format=FORMAT</code> : any custom format string following rsyncd.conf's <code>log format</code><ref>[https://man.archlinux.org/man/rsyncd.conf.5#log~2 rsyncd.conf log format] docs</ref>
}}
}}


Each rsync output line is sent to the library's <code>handle_info</code> callback as <code>{:data, line}</code> and after the transfer is finished we receive a conclusive <code>{:exit_status, status_code}</code>.
Rsync outputs <code>--info=progress2</code> lines like so:<syntaxhighlight lang="text">
      overall percent complete  time remaining
bytes transferred |  transfer speed    |
        |        |        |          |
      3,342,336  33%    3.14MB/s    0:00:02
</syntaxhighlight>
 
The controlling Port captures these lines is sent to the library's <code>handle_info</code> callback as <code>{:data, line}</code>.  After the transfer is finished we receive a conclusive <code>{:exit_status, status_code}</code> message.


We extract the percent_done column and strictly reject any other output:
As a first step, we extract the overall_percent_done column and flag any unrecognized output:
<syntaxhighlight lang="elixir">
<syntaxhighlight lang="elixir">
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
         percent_done_text when is_binary(percent_done_text) <- Enum.at(terms, 1),
         percent_done_text when percent_done_text != nil <- Enum.at(terms, 1),
         {percent_done, "%"} <- Float.parse(percent_done_text) do
         {percent_done, "%"} <- Float.parse(percent_done_text) do
       percent_done
       percent_done
Line 77: Line 81:
         {:unknown, line}
         {:unknown, line}
     end
     end
</syntaxhighlight>The <code>trim</code> lets us ignore spacing and newline trickery—or even a leading carriage return as you can see in the rsync source code,
</syntaxhighlight>The <code>trim</code> is lifting more than its weight here: it lets us completely ignore spacing and newline trickery—and ignores the leading carriage return before each line, seen in the rsync source code:<ref>[https://github.com/RsyncProject/rsync/blob/797e17fc4a6f15e3b1756538a9f812b63942686f/progress.c#L129 rsync/progress.c] source code</ref>
<syntaxhighlight lang="c">
<syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>The carriage return <code>\r</code> deserves a special mention: this "control" character is just a byte in the binary data coming over the pipe from rsync, but it plays a control function because of how the tty interprets itOn the terminal the effect is to overwrite the current line!
</syntaxhighlight>Carriage return <code>\r</code> deserves special mention: this is the first "control" character we come across and it looks the same as an ordinary byte in the binary data coming over the pipe from rsync, similar to newline <code>\n</code>Its normal role is to control the terminal emulator, rewinding the cursor so that the current line can be overwritten! And like newline, carriage return can be ignored.  Control signaling is exactly what goes haywire about this project, and the leaky category distinction between data and control seems to be a repeated theme in inter-process communication.  The reality is not so much data vs. control, as it seems to be a sequence of layers like with [[w:OSI model|networking]].


A repeated theme is that data and control are leaky categories.  We come to the more formal control side channels later.
{{Aside|text=
{{Aside|text=
[[File:Chinese typewriter 03.jpg|right|200x200px]]
[[File:Chinese typewriter 03.jpg|right|200x200px]]


On the terminal, rsync progress lines are updated in place by emitting a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>.  The character seems to be named after pushing the physical paper carriage of a typewriter back to the beginning of the line without feeding the roller.
On the terminal, rsync progress lines are updated in place by beginning each line with a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>.  Try this command in a terminal:<syntaxhighlight lang="shell">
# echo "three^Mtwo"
twoee
</syntaxhighlight>
You'll have to use <control>-v <control>-m to type a literal carriage return, copy-and-paste won't work.


[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreement about carriage return]] vs. newline has caused eye-rolling since the dawn of personal computing.
The character is named after the pushing of a physical typewriter carriage to return to the beginning of the current line without feeding the roller to a new line.


[[File:Nilgais fighting, Lakeshwari, Gwalior district, India.jpg|left|200x200px]]
[[File:Baboons Playing in Chobe National Park-crlf.jpg|left|300x300px|Three young baboons playing on a rock ledge.  Two are on the ridge and one below, grabbing the tail of another.  A meme font shows "\r", "\n", and "\r\n" personified as each baboon.]]
[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreement about carriage return]] vs. line feed has caused eye-rolling since the dawn of personal computing.
}}
}}


== OTP generic server ==
== OTP generic server ==
This is where Erlang/OTP really starts to shine: our rsync library wraps the Port calls under a gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> module and this gives us some special properties for free: a dedicated thread which coordinates with rsync independently from anything else, receiving and sending asynchronous messagesIt has an internal state including the latest percent done and this can be probed by calling code, or it can be set up to push updates to a listener.
The Port API is convenient enough so far, but Erlang/OTP really starts to shine once we wrap each Port connection under a <code>gen_server</code><ref>[https://www.erlang.org/doc/apps/stdlib/gen_server.html Erlang gen_server docs]</ref> module, giving us several properties for free: A dedicated application thread coordinates with its rsync process independent of anything else.  Input and output are asynchronous and buffered, but handled sequentially in a thread-safe wayThe gen_server holds internal state including the up-to-date completion percentage.  And the caller can request updates as needed, or it can listen for push messages with the parsed statistics.


A gen_server should be able to run under a [https://adoptingerlang.org/docs/development/supervision_trees/ OTP supervision tree] as well but our module has a major flaw: although it can correctly detect and report when rsync crashes or completes, when our gen_server is stopped by its supervisor it cannot stop its external child process in turn.
This gen_server is also expected to run safely under an OTP supervision tree<ref>[https://adoptingerlang.org/docs/development/supervision_trees/ "Supervision Trees"] chapter from [https://adoptingerlang.org/ Adopting Erlang]</ref> but this is where our dream falls apart for the moment.  The Port already watches for rsync completion or failure and reports upwards to its caller, but we fail at the critical property of being able to propagate a termination downwards to shut down rsync if the calling code or our library module crashes.


== Problem: runaway processes ==
== Problem: runaway processes ==
[[File:CargoNet Di 12 Euro 4000 Lønsdal - Bolna.jpg|thumb]]
[[File:CargoNet Di 12 Euro 4000 Lønsdal - Bolna.jpg|thumb]]
What this means is that rsync transfers would continue to run in the background even after Elixir had completely shut down, because the BEAM has no way of stopping the process.
The unpleasant real-world consequence is that rsync transfers will continue to run in the background even after Elixir kills our gen_server or shuts down, because the BEAM has no way of stopping the external process.


To check whether this was something specific to rsync, I tried to open a Port spawning the command <code>sleep 60</code> and I found that it behaves exactly the same way, hanging until the sleep ends naturally regardless of what happened in Elixir or whether its pipes are still open.
It's possible to find the operating system PID of the child process with <code>Port.info(port, :os_pid)</code> and send it a signal by shelling out to unix <code>kill PID</code>, but BEAM doesn't include built-in functions to send a signal to an OS process, and there is an ugly race condition between closing the port and sending this signal.  We'll keep looking for another way to "link" the processes.
 
To debug what happens during <code>port_close</code> and to eliminate variables, I tried spawning <code>sleep 60</code> instead of rsync and I found that it behaves in exactly the same way: hanging until <code>sleep</code> ends naturally regardless of what happened in Elixir or whether its pipes are still open.  This happens to have been a lucky choice as I learned later: "sleep" is daemon-like so similar to rsync, but its behavior is much simpler to reason about.


== Bad assumption: pipe-like processes ==
== Bad assumption: pipe-like processes ==
A program like <code>gzip</code> or <code>cat</code> will stop once it detects that its input has ended because the main loop usually makes a C system call to <code>read</code> like this:<syntaxhighlight lang="c">
A pipeline like <code>gzip</code> or <code>cat</code> it built to read from its input and write to its output.  We can roughly group the different styles of command-line application into "pipeline" programs which read and write, "interactive" programs which require user input, and "daemon" programs which are designed to run in the background.  Some programs support multiple modes depending on the arguments given at launch, or by detecting the terminal using <code>isatty</code><ref>[https://man.archlinux.org/man/isatty.3.en docs for <code>isatty</code>]</ref>.  The BEAM is currently optimized to interface with pipeline programs and it assumes that the external process will stop when its "standard input" is closed.
ssize_t n_read = read (input_desc, buf, bufsize);
 
if (n_read < 0) { error... }
A typical pipeline program will stop once it detects that input has ended, for example by calling <code>read</code><ref>[https://man.archlinux.org/man/read.2 libc <code>read</code> docs]</ref> in a loop:<syntaxhighlight lang="c">
if (n_read == 0) { end of file... }
size_read = read (input_desc, buf, bufsize);
</syntaxhighlight>The manual for read<ref>https://man.archlinux.org/man/read.2</ref> explains that reading 0 bytes indicates the end of file, and a negative number indicates an error such as the input file descriptor already being closedIf you think this sounds weird, I would agree: how do we tell the difference between a stream which is stalled and one which has ended?  Does the calling process yield control until input arrives?  How do we know if more than bufsize bytes are available?  If that word salad excites you, read more about <code>O_NONBLOCK</code><ref>https://man.archlinux.org/man/open.2.en#O_NONBLOCK</ref> and unix pipes<ref>https://man.archlinux.org/man/pipe.7.en</ref>.
if (size_read < 0) { error... }
if (size_read == 0) { end of file... }
</syntaxhighlight>
 
If the program does blocking I/O, then a zero-byte <code>read</code> indicates the end of file conditionA program which does asynchronous I/O with <code>O_NONBLOCK</code><ref>[https://man.archlinux.org/man/open.2.en#O_NONBLOCK O_NONBLOCK docs]</ref> might instead detect EOF by listening for the <code>HUP</code> hang-up signal which is can be arranged (TODO: document how this can be done with <code>prctl</code>, and on which platforms).
 
But here we'll focus on how processes can more generally affect each other through pipes. Surprising answer: without much effect!  You can experiment with the <code>/dev/null</code> device which behaves like a closed pipe, for example compare these two commands:
 
<syntaxhighlight lang="shell">
cat < /dev/null


But here we'll focus on how processes affect each other through pipes.  Surprising answer: it doesn't affect very much!  Try opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file.  Oh no, you killed it!  You didn't actually send anything, though—the <control>-d is interpreted by bash and it responds by closing its pipe connected to "[[w:Standard streams|standard input]]" of the child process.  This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe.  My entry point to learning more is this stty webzine<ref>https://wizardzines.com/comics/stty/</ref> by Julia Evans.  Go ahead and try this command, what could go wrong: <code>stty -a</code>
sleep 10 < /dev/null
</syntaxhighlight><code>cat</code> exits immediately, but <code>sleep</code> does its thing as usual.


Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a hard realityYou could even reopen stdin from the application, to the great surprise of your friends and neighborsFor example, try opening "watch ls" or "sleep 60" and try <control>-d all you want—no effectYou did close its stdin but nobody cared, it wasn't listening to you anyway.
You could do the same experiment by opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file.  Interestingly, what happened here is that <control>-d is interpreted by bash which responds by closing its pipe connected to standard input of the child processThis is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipeMy entry point to learning more is this stty webzine<ref>[https://wizardzines.com/comics/stty/ ★ wizard zines ★: stty]</ref> by Julia Evans. Dump information about your own terminal emulator: <code>stty -a</code>


Back to the problem at hand, "rsync" is in this latter category of "daemon-like" programs which will carry on even after standard input is closed.  This makes sense enough, since rsync isn't interactive and any output is just a side effect of its main purpose.
Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a hard reality.  A program with a chaotic disposition could even reopen stdin after it was closed and connect it to something else, to the great surprise of friends and neighbors.
 
Back to the problem at hand, "rsync" is in the category of "daemon-like" programs which will carry on even after standard input is closed.  This makes sense enough, since rsync isn't interactive and any output is just a side effect of its main purpose.


== Shimming can kill ==
== Shimming can kill ==
It's possible to write a small adapter which is sensitive to stdin closing, then converts this into a stronger signal like SIGTERM which it forwards to its own child.  This is the idea behind a suggested shell script<ref>https://hexdocs.pm/elixir/1.19.0/Port.html#module-orphan-operating-system-processes</ref> for Elixir and the erlexec<ref>[https://hexdocs.pm/erlexec/readme.html https://hexdocs.pm/erlexec/]</ref> library.  The opposite adapter is also found in the [[w:nohup|nohup]] shell command and the grimsby<ref>https://github.com/shortishly/grimsby</ref> library: these will keep standard in and/or standard out open for the child process even after the parent exits.
A small shim can adapt a daemon-like program to behave more like a pipeline.  The shim is sensitive to stdin closing or SIGHUP, and when this is detected it converts this into a stronger signal like SIGTERM which it forwards to its own child.  This is the idea behind a suggested shell script<ref>[https://hexdocs.pm/elixir/1.19.0/Port.html#module-orphan-operating-system-processes Elixir Port docs showing a shim script]</ref> for Elixir, and the <code>erlexec</code><ref name=":0">[https://hexdocs.pm/erlexec/readme.html <code>erlexec</code> library]</ref> library.  The opposite adapter can be found in the [[w:nohup|nohup]] shell command and the grimsby<ref>[https://github.com/shortishly/grimsby <code>grimsby</code> library]</ref> library: these will keep standard in and/or standard out open for the child process even after the parent exits, so that a pipe-like program can behave more like a daemon.


I took the shim approach with my rsync library and included a small C program<ref>https://gitlab.com/adamwight/rsync_ex/-/blob/main/src/main.c?ref_type=heads</ref> which wraps rsync and makes it sensitive to the BEAM port_close.  It's featherweight, leaving pipes unchanged as it passes control to rsync—its only real effect is to convert SIGHUP to SIGKILL (but should have been SIGTERM, see the sidebar discussion of different signals below).
I used the shim approach in my rsync library and it includes a small C program<ref>[https://gitlab.com/adamwight/rsync_ex/-/blob/main/src/main.c?ref_type=heads rsync_ex C shim program]</ref> which wraps rsync and makes it sensitive to BEAM <code>port_close</code>.  It's featherweight, leaving pipes unchanged as it passes control to rsync, here are the business parts:<syntaxhighlight lang="c">// Set up a fail-safe to self-signal with HUP if the controlling process dies.
prctl(PR_SET_PDEATHSIG, SIGHUP);</syntaxhighlight><syntaxhighlight lang="c">
void handle_signal(int signum) {
  if (signum == SIGHUP && child_pid > 0) {
    // Send the child TERM so that rsync can perform clean-up such as shutting down a remote server.
    kill(child_pid, SIGTERM);
  }
}
</syntaxhighlight>


== Reliable clean up ==
== Reliable clean up ==
{{Project|status=in review|url=https://erlangforums.com/t/open-port-and-zombie-processes|source=https://github.com/erlang/otp/pull/9453}}
{{Project|status=in review|url=https://erlangforums.com/t/open-port-and-zombie-processes|source=https://github.com/erlang/otp/pull/9453}}
It's always a pleasure to ask questions in the BEAM communities, they have earned their reputation for being friendly and open.  The first big tip was to look at the third-party library [https://hexdocs.pm/erlexec/ erlexec], which demonstrates emerging best practices which could be backported into the language itself.  Everyone speaking on the problem has generally agreed that the fragile clean up of external processes is a bug, and supported the idea that some flavor of "terminate" signal should be sent to spawned programs.
It's always a pleasure to ask questions in the BEAM communities, they deserve their reputation for being friendly and open.  The first big tip was to look at the third-party library <code>erlexec</code><ref name=":0" />, which demonstrates emerging best practices which could be backported into the language itself.  Everyone speaking on the problem generally agrees that the fragile clean up of external processes is a bug, and supports the idea that some flavor of "terminate" signal should be sent to spawned programs when the port is closed.
 
[[File:Itinerant glassworker exhibition with spinning wheel and steam engine.jpg|thumb]]
I would be lying to hide my disappointment that the required core changes are mostly in a C program and not actually in Erlang, but it was still fascinating to open such an elegant black box and find the technological equivalent of a steam engine inside.  All of the futuristic, high-level features we've come to know actually map closely to a few scraps of wizardry with ordinary pipes, using stdlib read, write, and select<ref>https://man.archlinux.org/man/select.2.en</ref>.
I would be lying to hide my disappointment that the required core changes are mostly in an auxiliary C program and not written in Erlang or even in the BEAM itself, but it was still fascinating to open such an elegant black box and find the technological equivalent of a steam engine inside.  All of the futuristic, high-level features we've come to know actually map closely to a few scraps of wizardry with ordinary pipes<ref>[https://man.archlinux.org/man/pipe.7.en Overview of unix pipes]</ref>, using libc's pipe<ref>[https://man.archlinux.org/man/pipe.2.en Docs for the <code>pipe</code> syscall]</ref>, read, write, and select<ref>[https://man.archlinux.org/man/select.2.en libc <code>select</code> docs]</ref>.


Port drivers<ref>https://www.erlang.org/doc/system/ports.html</ref> are fundamental to ERTS and external processes are launched through several levels of wiring: the spawn driver starts a forker driver which sends a control message to <code>erl_child_setup</code> to execute your external command.  Each BEAM has a single erl_child_setup process to watch over all children.
Port drivers<ref>[https://www.erlang.org/doc/system/ports.html Erlang ports docs]</ref> are fundamental to ERTS, and several levels of port wiring are involved in launching external processes: the spawn driver starts a forker driver which sends a control message to <code>erl_child_setup</code> to execute your external command.  Each BEAM has a single erl_child_setup process to watch over all children.  This architecture reflects the Supervisor paradigm and we can leverage it to produce some of the same properties: the subprocess can buffer reads and writes asynchronously and handle them sequentially; and if the BEAM crashes then erl_child_setup can detect the condition and do its own cleanup.


Letting a child process outlive the one that spawned leaves it in a state called an "orphaned process" in POSIX, and the standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists.  This can be seen as undesirable because unix itself has a paradigm similar to OTP's Supervisors, in which each parent is responsible for its children.  Without supervision, a process could potentially run forever or do naughty things.  The system <code>init</code> process starts and tracks its own children, and can restart them in response to service commands.  But init will know nothing about adopted, orphan processes or how to monitor and restart them.
Letting a child process outlive its controlling process leaves the child in a state called "orphaned" in POSIX, and the standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists.  This can be seen as undesirable because unix itself has a paradigm similar to OTP's Supervisors, in which each parent is responsible for its children.  Without supervision, a process could potentially run forever or do naughty things.  The system <code>init</code> process starts and tracks its own children, and can restart them in response to service commands.  But init will know nothing about adopted, orphan processes or how to monitor and restart them.


The patch [https://github.com/erlang/otp/pull/9453 PR#9453] adapting port_close to SIGTERM is waiting for review and responses look generally positive so far.
The patch [https://github.com/erlang/otp/pull/9453 PR#9453] adapting port_close to SIGTERM is waiting for review and responses look generally positive so far.
Line 138: Line 168:
Which signal to use is still an open question:
Which signal to use is still an open question:


; <code>HUP</code> : the softest "Goodbye!" that a program is free to interpret as it wishes
; <code>HUP</code> : sent to a process when its standard input stream is closed<ref>[https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap11.html#tag_11_01_10 POSIX standard "General Terminal Interface: Modem Disconnect"]</ref>


; <code>TERM</code> : has a clear intention of "kill this thing" but still possible to trap at the target and handle in a customized way
; <code>TERM</code> : has a clear intention of "kill this thing" but still possible to trap at the target and handle in a customized way
Line 146: Line 176:
There is a refreshing diversity of opinion, so it could be worthwhile to make the signal configurable for each port.
There is a refreshing diversity of opinion, so it could be worthwhile to make the signal configurable for each port.
}}
}}
== TODO: consistency with unix process groups ==
... there is something fun here about how unix already has process tree behaviors which are close analogues to a BEAM supervisor tree.


== Future directions ==
== Future directions ==