Elixir/Ports and external process wiring: Difference between revisions

Adamw (talk | contribs)
c/e
Adamw (talk | contribs)
No edit summary
Line 92: Line 92:


On the terminal, rsync progress lines are updated in place by beginning each line with a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>.  Try this command in a terminal:<syntaxhighlight lang="shell">
On the terminal, rsync progress lines are updated in place by beginning each line with a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>.  Try this command in a terminal:<syntaxhighlight lang="shell">
echo "three^Mtwo"
# echo "three^Mtwo"
twoee
</syntaxhighlight>
</syntaxhighlight>
You'll have to use <control>-v <control>-m to type a literal carriage return, copy-and-paste won't work.  Spoiler: the output should read "twoee".
You'll have to use <control>-v <control>-m to type a literal carriage return, copy-and-paste won't work.


The character seems to be named after pushing the physical paper carriage of a typewriter back to the beginning of the line without feeding the roller.
The character seems to be named after pushing the physical paper carriage of a typewriter back to the beginning of the line without feeding the roller.
Line 116: Line 117:


== Bad assumption: pipe-like processes ==
== Bad assumption: pipe-like processes ==
A pipeline like <code>gzip</code> or <code>cat</code> it built to read from its input and write to its output.  These will stop once they detects that input has ended because the main loop usually makes a C system call to <code>read</code> like this:<syntaxhighlight lang="c">
A pipeline like <code>gzip</code> or <code>cat</code> it built to read from its input and write to its output.  We can roughly group the different styles of command-line application into "pipeline" programs which read and write, "interactive" programs which require user input, and "daemon" programs which are designed to run in the background.  Some programs support multiple modes depending on the arguments given at launch, or by detecting the terminal using <code>isatty</code><ref>[https://man.archlinux.org/man/isatty.3.en docs for <code>isatty</code>]</ref>.  The BEAM is currently optimized to interface with pipeline programs and it assumes that the external process will stop when its "standard input" is closed.
 
A typical pipeline program will stop once it detects that input has ended, by making regular C system calls to <code>read</code><ref>[https://man.archlinux.org/man/read.2 libc <code>read</code> docs]</ref>:<syntaxhighlight lang="c">
ssize_t n_read = read (input_desc, buf, bufsize);
ssize_t n_read = read (input_desc, buf, bufsize);
if (n_read < 0) { error... }
if (n_read < 0) { error... }
if (n_read == 0) { end of file... }
if (n_read == 0) { end of file... }
</syntaxhighlight>The manual for read<ref>[https://man.archlinux.org/man/read.2 libc <code>read</code> docs]</ref> explains that reading 0 bytes indicates the end of file, and a negative number indicates an error such as the input file descriptor already being closedIf you think this sounds weird, I would agree: how do we tell the difference between a stream which is stalled and one which has ended?  Does the calling process yield control until input arrives?  How do we know if more than bufsize bytes are available?  If that word salad excites you, read more about <code>O_NONBLOCK</code><ref>[https://man.archlinux.org/man/open.2.en#O_NONBLOCK O_NONBLOCK docs]</ref> and unix pipes<ref>[https://man.archlinux.org/man/pipe.7.en overview of unix pipes]</ref>.
</syntaxhighlight>When the program uses blocking I/O, reading zero bytes indicates the end of file.  There are also programs which do asynchronous I/O using <code>O_NONBLOCK</code><ref>[https://man.archlinux.org/man/open.2.en#O_NONBLOCK O_NONBLOCK docs]</ref>, and these might rely on the <code>HUP</code> hang-up signal which is normally sent when input is closed.
 
But here we'll focus on how processes can more generally affect each other through pipes.  Surprising answer: without much effect!  You can experiment with the <code>/dev/null</code> device which behaves like a closed pipe, for example compare these two commands:<syntaxhighlight lang="shell">
cat < /dev/null
 
sleep 10 < /dev/null
</syntaxhighlight><code>cat</code> exits immediately, but <code>sleep</code> does its thing as usual.


But here we'll focus on how processes affect each other through pipes.  Surprising answer: it doesn't affect very much!  Try opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file.  Oh no, you killed it!  You didn't actually send anything, though—the <control>-d is interpreted by bash and it responds by closing its pipe connected to "[[w:Standard streams|standard input]]" of the child process.  This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe.  My entry point to learning more is this stty webzine<ref>[https://wizardzines.com/comics/stty/ ★ wizard zines ★: stty]</ref> by Julia Evans.  Go ahead and try this command, what could go wrong: <code>stty -a</code>
You could do the same experiment by opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file.  Interestingly, what happened here is that <control>-d is interpreted by bash which responds by closing its pipe connected to standard input of the child process.  This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe.  My entry point to learning more is this stty webzine<ref>[https://wizardzines.com/comics/stty/ ★ wizard zines ★: stty]</ref> by Julia Evans.  Dump information about your own terminal emulator: <code>stty -a</code>


Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a hard reality.  You could even reopen stdin from the application, to the great surprise of your friends and neighbors.  For example, try opening "watch ls" or "sleep 60" and try <control>-d all you want—no effect.  You did close its stdin but nobody cared, it wasn't listening to you anyway.
Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a hard reality.  A program with a chaotic disposition could even reopen stdin after it was closed and connect it to something else, to the great surprise of friends and neighbors.


Back to the problem at hand, "rsync" is in this latter category of "daemon-like" programs which will carry on even after standard input is closed.  This makes sense enough, since rsync isn't interactive and any output is just a side effect of its main purpose.
Back to the problem at hand, "rsync" is in the category of "daemon-like" programs which will carry on even after standard input is closed.  This makes sense enough, since rsync isn't interactive and any output is just a side effect of its main purpose.


== Shimming can kill ==
== Shimming can kill ==
It's possible to write a small adapter which is sensitive to stdin closing, then converts this into a stronger signal like SIGTERM which it forwards to its own child.  This is the idea behind a suggested shell script<ref>[https://hexdocs.pm/elixir/1.19.0/Port.html#module-orphan-operating-system-processes Elixir Port docs showing a shim script]</ref> for Elixir and the erlexec<ref>[https://hexdocs.pm/erlexec/readme.html <code>erlexec</code> library]</ref> library.  The opposite adapter is also found in the [[w:nohup|nohup]] shell command and the grimsby<ref>[https://github.com/shortishly/grimsby <code>grimsby</code> library]</ref> library: these will keep standard in and/or standard out open for the child process even after the parent exits.
A small shim can adapt a daemon-like program to behave more like a pipeline.  The shim is sensitive to stdin closing or SIGHUP, and when this is detected it converts this into a stronger signal like SIGTERM which it forwards to its own child.  This is the idea behind a suggested shell script<ref>[https://hexdocs.pm/elixir/1.19.0/Port.html#module-orphan-operating-system-processes Elixir Port docs showing a shim script]</ref> for Elixir, and the <code>erlexec</code><ref name=":0">[https://hexdocs.pm/erlexec/readme.html <code>erlexec</code> library]</ref> library.  The opposite adapter can be found in the [[w:nohup|nohup]] shell command and the grimsby<ref>[https://github.com/shortishly/grimsby <code>grimsby</code> library]</ref> library: these will keep standard in and/or standard out open for the child process even after the parent exits, so that a pipe-like program can behave more like a daemon.


I took the shim approach with my rsync library and included a small C program<ref>[https://gitlab.com/adamwight/rsync_ex/-/blob/main/src/main.c?ref_type=heads rsync_ex C shim program]</ref> which wraps rsync and makes it sensitive to the BEAM port_close.  It's featherweight, leaving pipes unchanged as it passes control to rsync—its only real effect is to convert SIGHUP to SIGKILL (but should have been SIGTERM, see the sidebar discussion of different signals below).
I used the shim approach in my rsync library and it includes a small C program<ref>[https://gitlab.com/adamwight/rsync_ex/-/blob/main/src/main.c?ref_type=heads rsync_ex C shim program]</ref> which wraps rsync and makes it sensitive to BEAM <code>port_close</code>.  It's featherweight, leaving pipes unchanged as it passes control to rsync—its only real effect is to convert SIGHUP to SIGKILL (but should have been SIGTERM, see the sidebar discussion of different signals below).


== Reliable clean up ==
== Reliable clean up ==
{{Project|status=in review|url=https://erlangforums.com/t/open-port-and-zombie-processes|source=https://github.com/erlang/otp/pull/9453}}
{{Project|status=in review|url=https://erlangforums.com/t/open-port-and-zombie-processes|source=https://github.com/erlang/otp/pull/9453}}
It's always a pleasure to ask questions in the BEAM communities, they have earned their reputation for being friendly and open.  The first big tip was to look at the third-party library [https://hexdocs.pm/erlexec/ erlexec], which demonstrates emerging best practices which could be backported into the language itself.  Everyone speaking on the problem has generally agreed that the fragile clean up of external processes is a bug, and supported the idea that some flavor of "terminate" signal should be sent to spawned programs.
It's always a pleasure to ask questions in the BEAM communities, they deserve their reputation for being friendly and open.  The first big tip was to look at the third-party library <code>erlexec</code><ref name=":0" />, which demonstrates emerging best practices which could be backported into the language itself.  Everyone speaking on the problem generally agrees that the fragile clean up of external processes is a bug, and supports the idea that some flavor of "terminate" signal should be sent to spawned programs when the port is closed.


I would be lying to hide my disappointment that the required core changes are mostly in a C program and not actually in Erlang, but it was still fascinating to open such an elegant black box and find the technological equivalent of a steam engine inside.  All of the futuristic, high-level features we've come to know actually map closely to a few scraps of wizardry with ordinary pipes, using stdlib read, write, and select<ref>[https://man.archlinux.org/man/select.2.en libc <code>select</code> docs]</ref>.
I would be lying to hide my disappointment that the required core changes are mostly in an auxiliary C program and not written in Erlang or even in the BEAM itself, but it was still fascinating to open such an elegant black box and find the technological equivalent of a steam engine inside.  All of the futuristic, high-level features we've come to know actually map closely to a few scraps of wizardry with ordinary pipes<ref>[https://man.archlinux.org/man/pipe.7.en Overview of unix pipes]</ref>, using libc's pipe<ref>[https://man.archlinux.org/man/pipe.2.en Docs for the <code>pipe</code> syscall]</ref>, read, write, and select<ref>[https://man.archlinux.org/man/select.2.en libc <code>select</code> docs]</ref>.


Port drivers<ref>[https://www.erlang.org/doc/system/ports.html Erlang ports docs]</ref> are fundamental to ERTS and external processes are launched through several levels of wiring: the spawn driver starts a forker driver which sends a control message to <code>erl_child_setup</code> to execute your external command.  Each BEAM has a single erl_child_setup process to watch over all children.
Port drivers<ref>[https://www.erlang.org/doc/system/ports.html Erlang ports docs]</ref> are fundamental to ERTS, and several levels of port wiring are involved in launching external processes: the spawn driver starts a forker driver which sends a control message to <code>erl_child_setup</code> to execute your external command.  Each BEAM has a single erl_child_setup process to watch over all children.  This architecture reflects the Supervisor paradigm and we can leverage it to produce some of the same properties: the subprocess can buffer reads and writes asynchronously and handle them sequentially; and if the BEAM crashes then erl_child_setup can detect the condition and do its own cleanup.


Letting a child process outlive the one that spawned leaves it in a state called an "orphaned process" in POSIX, and the standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists.  This can be seen as undesirable because unix itself has a paradigm similar to OTP's Supervisors, in which each parent is responsible for its children.  Without supervision, a process could potentially run forever or do naughty things.  The system <code>init</code> process starts and tracks its own children, and can restart them in response to service commands.  But init will know nothing about adopted, orphan processes or how to monitor and restart them.
Letting a child process outlive its controlling process leaves the child in a state called "orphaned" in POSIX, and the standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists.  This can be seen as undesirable because unix itself has a paradigm similar to OTP's Supervisors, in which each parent is responsible for its children.  Without supervision, a process could potentially run forever or do naughty things.  The system <code>init</code> process starts and tracks its own children, and can restart them in response to service commands.  But init will know nothing about adopted, orphan processes or how to monitor and restart them.


The patch [https://github.com/erlang/otp/pull/9453 PR#9453] adapting port_close to SIGTERM is waiting for review and responses look generally positive so far.
The patch [https://github.com/erlang/otp/pull/9453 PR#9453] adapting port_close to SIGTERM is waiting for review and responses look generally positive so far.
Line 149: Line 158:
Which signal to use is still an open question:
Which signal to use is still an open question:


; <code>HUP</code> : the softest "Goodbye!" that a program is free to interpret as it wishes
; <code>HUP</code> : sent to a process when its standard input stream is closed<ref>[https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap11.html#tag_11_01_10 POSIX standard "General Terminal Interface: Modem Disconnect"</ref>


; <code>TERM</code> : has a clear intention of "kill this thing" but still possible to trap at the target and handle in a customized way
; <code>TERM</code> : has a clear intention of "kill this thing" but still possible to trap at the target and handle in a customized way