Elixir/Ports and external process wiring: Difference between revisions
No edit summary |
light c/e |
||
| Line 4: | Line 4: | ||
{{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}} | {{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}} | ||
My exploration begins while writing a beta-quality library for Elixir to transfer files in the background and monitor progress | My exploration begins while writing a beta-quality library for Elixir to transfer files in the background and monitor progress using rsync. | ||
I was excited to learn how to interface with long-lived external processes—and this project offered more than I hoped for. | I was excited to learn how to interface with long-lived external processes—and this project offered more than I hoped for. | ||
{{Aside|text=[[w:rsync|Rsync]] is the | {{Aside|text=<p>[[w:rsync|Rsync]] is the standard utility for file transfers, locally or over a network. It can resume incomplete transfers and synchronize directories efficiently, and after almost 30 years of usage it can be trusted to handle any edge case.</p> | ||
<p>BEAM is a fairly unique ecosystem in which it's not considered deviant to reinvent a rounder wheel: it's common to port external dependencies into native Erlang—but the complexity of rsync and its dependence on a matching remote daemon makes it unlikely that it will be rewritten any time soon, which is why I've decided to wrap external command execution in a library.</p>}} | |||
BEAM is a fairly unique ecosystem in which | |||
[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|300x300px]] | [[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|300x300px]] | ||
| Line 19: | Line 18: | ||
System.shell("rsync -a source target") | System.shell("rsync -a source target") | ||
</syntaxhighlight> | </syntaxhighlight> | ||
This has a few shortcomings, starting with how we pass the filenames. It | This has a few shortcomings, starting with how we pass the filenames. It would be possible to pass a dynamic path using string interpolation like <code>#{source}</code> but this is risky: consider what happens if the filenames include whitespace or even special shell characters such as ";". | ||
=== Safe path handling === | === Safe path handling === | ||
We turn next to <code>System.cmd</code>, which takes a raw argv and can't be fooled special characters in the path arguments:<syntaxhighlight lang="elixir"> | |||
System.find_executable(rsync_path) | System.find_executable(rsync_path) | ||
|> System.cmd([~w(-a), source, target]) | |> System.cmd([~w(-a), source, target]) | ||
</syntaxhighlight>For a short job this | </syntaxhighlight>For a short job this is perfect, but for longer transfers our program loses control and observability, waiting indefinitely for a monolithic command to return. | ||
=== Asynchronous call and communication === | === Asynchronous call and communication === | ||
To run a external process asynchronously we | To run a external process asynchronously we reach for Elixir's low-level <code>Port.open</code>, nothing but a one-line wrapper<ref>See the [https://github.com/elixir-lang/elixir/blob/809b035dccf046b7b7b4422f42cfb6d075df71d2/lib/elixir/lib/port.ex#L232 port.ex source code]</ref> which passes its parameters directly to ERTS <code>open_port</code><ref>[https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2 Erlang <code>open_port</code> docs]</ref>. This function is tremendously flexible, here we turn a few knobs:<syntaxhighlight lang="elixir"> | ||
Port.open( | Port.open( | ||
{:spawn_executable, rsync_path}, | {:spawn_executable, rsync_path}, | ||
| Line 48: | Line 47: | ||
{{Aside|text= | {{Aside|text= | ||
'''Rsync progress reporting options''' | |||
There are a variety of ways to report progress: | |||
; <code>-v</code> : list each filename as it's transferred | |||
; <code>--info=progress2</code> : report overall progress | ; <code>--info=progress2</code> : report overall progress | ||
| Line 61: | Line 62: | ||
}} | }} | ||
Rsync outputs progress lines in a fairly self-explanatory format:<syntaxhighlight lang="text"> | We've chosen <code>--info=progress2</code> , so the meaning of the reported percentage is "overall percent complete". Rsync outputs these progress lines in a fairly self-explanatory columnar format:<syntaxhighlight lang="text"> | ||
percent complete time remaining | |||
bytes transferred | transfer speed | | |||
| | | | | |||
3,342,336 33% 3.14MB/s 0:00:02 | 3,342,336 33% 3.14MB/s 0:00:02 | ||
</syntaxhighlight> | </syntaxhighlight> | ||
| Line 77: | Line 81: | ||
{:unknown, line} | {:unknown, line} | ||
end | end | ||
</syntaxhighlight>The <code>trim</code> is lifting more than its weight here: it lets us completely ignore spacing and newline trickery—and even a leading carriage return that we can see in the rsync source code, | </syntaxhighlight>The <code>trim</code> is lifting more than its weight here: it lets us completely ignore spacing and newline trickery—and even a leading carriage return that we can see in the rsync source code,<ref>[https://github.com/RsyncProject/rsync/blob/797e17fc4a6f15e3b1756538a9f812b63942686f/progress.c#L129 rsync/progress.c] source code</ref> | ||
<syntaxhighlight lang="c"> | <syntaxhighlight lang="c"> | ||
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...); | rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...); | ||
</syntaxhighlight>Carriage return <code>\r</code> deserves a special mention: this "control" character is just a byte in the binary data coming over the pipe from rsync, but its normal role is playing a control function because of how the terminal emulator responds to it. On a terminal the effect is to rewind the cursor | </syntaxhighlight>Carriage return <code>\r</code> deserves a special mention: this "control" character is just a byte in the binary data coming over the pipe from rsync, but its normal role is playing a control function because of how the terminal emulator responds to it. On a terminal the effect is to rewind the cursor so that the current line can be overwritten! | ||
A repeated theme in inter-process communication is that data and control are leaky categories. We come to the more formal control side channels later. | A repeated theme in inter-process communication is that data and control are leaky categories. We come to the more formal control side channels later. | ||
| Line 88: | Line 92: | ||
On the terminal, rsync progress lines are updated in place by beginning each line with a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>. Try this command in a terminal:<syntaxhighlight lang="shell"> | On the terminal, rsync progress lines are updated in place by beginning each line with a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>. Try this command in a terminal:<syntaxhighlight lang="shell"> | ||
echo " | echo "three^Mtwo" | ||
</syntaxhighlight> | </syntaxhighlight> | ||
You'll have to use <control>-v <control>-m to type a literal carriage return. Spoiler: the output should read " | You'll have to use <control>-v <control>-m to type a literal carriage return, copy-and-paste won't work. Spoiler: the output should read "twoee". | ||
The character seems to be named after pushing the physical paper carriage of a typewriter back to the beginning of the line without feeding the roller. | The character seems to be named after pushing the physical paper carriage of a typewriter back to the beginning of the line without feeding the roller. | ||
[[File:Nilgais fighting, Lakeshwari, Gwalior district, India.jpg|left|200x200px]] | |||
[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreement about carriage return]] vs. line feed has caused eye-rolling since the dawn of personal computing. | [[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreement about carriage return]] vs. line feed has caused eye-rolling since the dawn of personal computing. | ||
}} | }} | ||
== OTP generic server == | == OTP generic server == | ||
The Port API is convenient enough so far, but where Erlang/OTP really starts to shine is when we wrap each Port connection under a gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> module, giving us some properties for free: A dedicated thread coordinates with its rsync independent of anything else. Input and output are asynchronous and buffered, but handled sequentially in a thread-safe way. It holds internal state including the up-to-date completion percentage. And the caller can either request updates manually, or it can listen for pushed statistics. | The Port API is convenient enough so far, but where Erlang/OTP really starts to shine is when we wrap each Port connection under a gen_server<ref>[https://www.erlang.org/doc/apps/stdlib/gen_server.html Erlang gen_server docs]</ref> module, giving us some properties for free: A dedicated thread coordinates with its rsync independent of anything else. Input and output are asynchronous and buffered, but handled sequentially in a thread-safe way. It holds internal state including the up-to-date completion percentage. And the caller can either request updates manually, or it can listen for pushed statistics. | ||
This gen_server should also be able to run under an [https://adoptingerlang.org/docs/development/supervision_trees/ OTP supervision tree] but this is where the dream falls apart, for the moment. The Port can watch for rsync completion or failure and report this to its caller, but we fail at the second critical property of being able to shut down rsync if the calling code or our library module crashes. | This gen_server should also be able to run under an [https://adoptingerlang.org/docs/development/supervision_trees/ OTP supervision tree] but this is where the dream falls apart, for the moment. The Port can watch for rsync completion or failure and report this to its caller, but we fail at the second critical property of being able to shut down rsync if the calling code or our library module crashes. | ||
| Line 117: | Line 120: | ||
if (n_read < 0) { error... } | if (n_read < 0) { error... } | ||
if (n_read == 0) { end of file... } | if (n_read == 0) { end of file... } | ||
</syntaxhighlight>The manual for read<ref>https://man.archlinux.org/man/read.2</ref> explains that reading 0 bytes indicates the end of file, and a negative number indicates an error such as the input file descriptor already being closed. If you think this sounds weird, I would agree: how do we tell the difference between a stream which is stalled and one which has ended? Does the calling process yield control until input arrives? How do we know if more than bufsize bytes are available? If that word salad excites you, read more about <code>O_NONBLOCK</code><ref>https://man.archlinux.org/man/open.2.en#O_NONBLOCK</ref> and unix pipes<ref>https://man.archlinux.org/man/pipe.7.en</ref>. | </syntaxhighlight>The manual for read<ref>[https://man.archlinux.org/man/read.2 libc <code>read</code> docs]</ref> explains that reading 0 bytes indicates the end of file, and a negative number indicates an error such as the input file descriptor already being closed. If you think this sounds weird, I would agree: how do we tell the difference between a stream which is stalled and one which has ended? Does the calling process yield control until input arrives? How do we know if more than bufsize bytes are available? If that word salad excites you, read more about <code>O_NONBLOCK</code><ref>[https://man.archlinux.org/man/open.2.en#O_NONBLOCK O_NONBLOCK docs]</ref> and unix pipes<ref>[https://man.archlinux.org/man/pipe.7.en overview of unix pipes]</ref>. | ||
But here we'll focus on how processes affect each other through pipes. Surprising answer: it doesn't affect very much! Try opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file. Oh no, you killed it! You didn't actually send anything, though—the <control>-d is interpreted by bash and it responds by closing its pipe connected to "[[w:Standard streams|standard input]]" of the child process. This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe. My entry point to learning more is this stty webzine<ref>https://wizardzines.com/comics/stty/</ref> by Julia Evans. Go ahead and try this command, what could go wrong: <code>stty -a</code> | But here we'll focus on how processes affect each other through pipes. Surprising answer: it doesn't affect very much! Try opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file. Oh no, you killed it! You didn't actually send anything, though—the <control>-d is interpreted by bash and it responds by closing its pipe connected to "[[w:Standard streams|standard input]]" of the child process. This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe. My entry point to learning more is this stty webzine<ref>[https://wizardzines.com/comics/stty/ ★ wizard zines ★: stty]</ref> by Julia Evans. Go ahead and try this command, what could go wrong: <code>stty -a</code> | ||
Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a hard reality. You could even reopen stdin from the application, to the great surprise of your friends and neighbors. For example, try opening "watch ls" or "sleep 60" and try <control>-d all you want—no effect. You did close its stdin but nobody cared, it wasn't listening to you anyway. | Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a hard reality. You could even reopen stdin from the application, to the great surprise of your friends and neighbors. For example, try opening "watch ls" or "sleep 60" and try <control>-d all you want—no effect. You did close its stdin but nobody cared, it wasn't listening to you anyway. | ||
| Line 126: | Line 129: | ||
== Shimming can kill == | == Shimming can kill == | ||
It's possible to write a small adapter which is sensitive to stdin closing, then converts this into a stronger signal like SIGTERM which it forwards to its own child. This is the idea behind a suggested shell script<ref>https://hexdocs.pm/elixir/1.19.0/Port.html#module-orphan-operating-system-processes</ref> for Elixir and the erlexec<ref>[https://hexdocs.pm/erlexec/readme.html | It's possible to write a small adapter which is sensitive to stdin closing, then converts this into a stronger signal like SIGTERM which it forwards to its own child. This is the idea behind a suggested shell script<ref>[https://hexdocs.pm/elixir/1.19.0/Port.html#module-orphan-operating-system-processes Elixir Port docs showing a shim script]</ref> for Elixir and the erlexec<ref>[https://hexdocs.pm/erlexec/readme.html <code>erlexec</code> library]</ref> library. The opposite adapter is also found in the [[w:nohup|nohup]] shell command and the grimsby<ref>[https://github.com/shortishly/grimsby <code>grimsby</code> library]</ref> library: these will keep standard in and/or standard out open for the child process even after the parent exits. | ||
I took the shim approach with my rsync library and included a small C program<ref>https://gitlab.com/adamwight/rsync_ex/-/blob/main/src/main.c?ref_type=heads</ref> which wraps rsync and makes it sensitive to the BEAM port_close. It's featherweight, leaving pipes unchanged as it passes control to rsync—its only real effect is to convert SIGHUP to SIGKILL (but should have been SIGTERM, see the sidebar discussion of different signals below). | I took the shim approach with my rsync library and included a small C program<ref>[https://gitlab.com/adamwight/rsync_ex/-/blob/main/src/main.c?ref_type=heads rsync_ex C shim program]</ref> which wraps rsync and makes it sensitive to the BEAM port_close. It's featherweight, leaving pipes unchanged as it passes control to rsync—its only real effect is to convert SIGHUP to SIGKILL (but should have been SIGTERM, see the sidebar discussion of different signals below). | ||
== Reliable clean up == | == Reliable clean up == | ||
| Line 134: | Line 137: | ||
It's always a pleasure to ask questions in the BEAM communities, they have earned their reputation for being friendly and open. The first big tip was to look at the third-party library [https://hexdocs.pm/erlexec/ erlexec], which demonstrates emerging best practices which could be backported into the language itself. Everyone speaking on the problem has generally agreed that the fragile clean up of external processes is a bug, and supported the idea that some flavor of "terminate" signal should be sent to spawned programs. | It's always a pleasure to ask questions in the BEAM communities, they have earned their reputation for being friendly and open. The first big tip was to look at the third-party library [https://hexdocs.pm/erlexec/ erlexec], which demonstrates emerging best practices which could be backported into the language itself. Everyone speaking on the problem has generally agreed that the fragile clean up of external processes is a bug, and supported the idea that some flavor of "terminate" signal should be sent to spawned programs. | ||
I would be lying to hide my disappointment that the required core changes are mostly in a C program and not actually in Erlang, but it was still fascinating to open such an elegant black box and find the technological equivalent of a steam engine inside. All of the futuristic, high-level features we've come to know actually map closely to a few scraps of wizardry with ordinary pipes, using stdlib read, write, and select<ref>https://man.archlinux.org/man/select.2.en</ref>. | I would be lying to hide my disappointment that the required core changes are mostly in a C program and not actually in Erlang, but it was still fascinating to open such an elegant black box and find the technological equivalent of a steam engine inside. All of the futuristic, high-level features we've come to know actually map closely to a few scraps of wizardry with ordinary pipes, using stdlib read, write, and select<ref>[https://man.archlinux.org/man/select.2.en libc <code>select</code> docs]</ref>. | ||
Port drivers<ref>https://www.erlang.org/doc/system/ports.html</ref> are fundamental to ERTS and external processes are launched through several levels of wiring: the spawn driver starts a forker driver which sends a control message to <code>erl_child_setup</code> to execute your external command. Each BEAM has a single erl_child_setup process to watch over all children. | Port drivers<ref>[https://www.erlang.org/doc/system/ports.html Erlang ports docs]</ref> are fundamental to ERTS and external processes are launched through several levels of wiring: the spawn driver starts a forker driver which sends a control message to <code>erl_child_setup</code> to execute your external command. Each BEAM has a single erl_child_setup process to watch over all children. | ||
Letting a child process outlive the one that spawned leaves it in a state called an "orphaned process" in POSIX, and the standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists. This can be seen as undesirable because unix itself has a paradigm similar to OTP's Supervisors, in which each parent is responsible for its children. Without supervision, a process could potentially run forever or do naughty things. The system <code>init</code> process starts and tracks its own children, and can restart them in response to service commands. But init will know nothing about adopted, orphan processes or how to monitor and restart them. | Letting a child process outlive the one that spawned leaves it in a state called an "orphaned process" in POSIX, and the standard recommends that when this happens the process should be adopted by the top-level system process "init" if it exists. This can be seen as undesirable because unix itself has a paradigm similar to OTP's Supervisors, in which each parent is responsible for its children. Without supervision, a process could potentially run forever or do naughty things. The system <code>init</code> process starts and tracks its own children, and can restart them in response to service commands. But init will know nothing about adopted, orphan processes or how to monitor and restart them. | ||