Elixir/Ports and external process wiring: Difference between revisions

Line 5:

My exploration begins while writing a beta-quality library for Elixir to transfer files in the background and monitor progress, using rsync.

I was excited to learn how to interface with long-lived external processes—and this project offered more than I hoped for.

{{Aside|text=[[w:rsync|Rsync]] is the best tool for file transfers, locally or over a network. It can resume incomplete transfers and synchronize directories efficiently, and after almost 30 years of usage it can be trusted to handle all the edge cases.

~~<br>~~

BEAM is a fairly unique ecosystem in which the philosophy is to constantly reinvent a rounder wheel: it's common to port external dependencies into native Erlang, but the complexity of rsync and its dependence on a matching remote daemon makes it unlikely that it will be rewritten any time soon, which is why I've decided to wrap external command execution in a library.}}

~~I was excited~~ to ~~learn how~~ to ~~interface with long-lived~~ external ~~processes—and this project offered more than~~ I ~~hoped for~~.

BEAM is a fairly unique ecosystem in which the philosophy is to constantly reinvent a rounder wheel: it's common to port external dependencies into native Erlang—but the complexity of rsync and its dependence on a matching remote daemon makes it unlikely that it will be rewritten any time soon, which is why I've decided to wrap external command execution in a library.}}

[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|300x300px]]

=== Naive shelling ===

Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">

Line 88:

Line 87:

[[File:Chinese typewriter 03.jpg|right|200x200px]]

On the terminal, rsync progress lines are updated in place by ~~emitting~~ a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>. The character seems to be named after pushing the physical paper carriage of a typewriter back to the beginning of the line without feeding the roller.

On the terminal, rsync progress lines are updated in place by beginning each line with a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>. Try this command in a terminal:<syntaxhighlight lang="shell">

echo "one^Mtwo"

</syntaxhighlight>

You'll have to use <control>-v <control>-m to type a literal carriage return. Spoiler: the output should read "two" and nothing else.

The character seems to be named after pushing the physical paper carriage of a typewriter back to the beginning of the line without feeding the roller.

[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreement about carriage return]] vs. line feed has caused eye-rolling since the dawn of personal computing.

Line 96:

Line 100:

== OTP generic server ==

~~This~~ is where Erlang/OTP really starts to shine~~: our rsync library wraps the~~ Port ~~calls~~ under a gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> module ~~and this gives~~ us some ~~special~~ properties for free: a dedicated thread ~~which~~ coordinates with rsync ~~independently from~~ anything else~~, receiving~~ and ~~sending~~ asynchronous ~~messages~~. It ~~has an~~ internal state including the ~~latest percent done and this~~ can ~~be probed by calling code~~, or it can ~~be set up to push updates to a listener~~.

The Port API is convenient enough so far, but where Erlang/OTP really starts to shine is when we wrap each Port connection under a gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> module, giving us some properties for free: A dedicated thread coordinates with its rsync independent of anything else. Input and output are asynchronous and buffered, but handled sequentially in a thread-safe way. It holds internal state including the up-to-date completion percentage. And the caller can either request updates manually, or it can listen for pushed statistics.

A gen_server should be able to run under a [https://adoptingerlang.org/docs/development/supervision_trees/ OTP supervision tree] ~~as well~~ but ~~our module has a major flaw: although it~~ can ~~correctly detect~~ and report ~~when~~ rsync ~~crashes~~ or ~~completes, when~~ our ~~gen_server is stopped by its supervisor it cannot stop its external child process in turn~~.

This gen_server should also be able to run under an [https://adoptingerlang.org/docs/development/supervision_trees/ OTP supervision tree] but this is where the dream falls apart, for the moment. The Port can watch for rsync completion or failure and report this to its caller, but we fail at the second critical property of being able to shut down rsync if the calling code or our library module crashes.

== Problem: runaway processes ==

[[File:CargoNet Di 12 Euro 4000 Lønsdal - Bolna.jpg|thumb]]

~~What~~ this ~~means~~ is that rsync transfers would continue to run in the background even after Elixir had completely shut down, because the BEAM has no way of stopping the process.

The unpleasant real-world consequence of this limitation is that rsync transfers would continue to run in the background even after Elixir had completely shut down, because the BEAM has no way of stopping the process.

It might be possible to send a signal using unix "kill", but BEAM doesn't expose the child process ID and it doesn't include any built-in commands to send a signal. Clearly we're expected to do this another way. Another problem with "kill" is that we want the external process to stop no matter how badly the BEAM is damaged so we can't rely on stored data and on making a few last calls before crashing.

To ~~check~~ whether ~~this~~ was ~~something~~ specific to rsync, I tried ~~to open a~~ Port spawning ~~the command~~ <code>sleep 60</code> and I found that it behaves exactly the same way, hanging until the sleep ends naturally regardless of what happened in Elixir or whether its pipes are still open.

To eliminate variable and to understand whether the failure to stop was specific to rsync, I tried the same Port command but spawning a <code>sleep 60</code>, and I found that it behaves exactly the same way, hanging until the sleep ends naturally regardless of what happened in Elixir or whether its pipes are still open. This happens to have been a lucky choice, as I learned later that "sleep" is also unusual but its behavior is much simpler to reason about.

== Bad assumption: pipe-like processes ==

A ~~program~~ like <code>gzip</code> or <code>cat</code> will stop once it detects that ~~its~~ input has ended because the main loop usually makes a C system call to <code>read</code> like this:<syntaxhighlight lang="c">

A pipeline like <code>gzip</code> or <code>cat</code> it built to read from its input and write to its output. These will stop once they detects that input has ended because the main loop usually makes a C system call to <code>read</code> like this:<syntaxhighlight lang="c">

ssize_t n_read = read (input_desc, buf, bufsize);

if (n_read < 0) { error... }

@@ Line 5: / Line 5: @@
 My exploration begins while writing a beta-quality library for Elixir to transfer files in the background and monitor progress, using rsync.
+I was excited to learn how to interface with long-lived external processes—and this project offered more than I hoped for.
 {{Aside|text=[[w:rsync|Rsync]] is the best tool for file transfers, locally or over a network.  It can resume incomplete transfers and synchronize directories efficiently, and after almost 30 years of usage it can be trusted to handle all the edge cases.
-<br>
-BEAM is a fairly unique ecosystem in which the philosophy is to constantly reinvent a rounder wheel: it's common to port external dependencies into native Erlang, but the complexity of rsync and its dependence on a matching remote daemon makes it unlikely that it will be rewritten any time soon, which is why I've decided to wrap external command execution in a library.}}
-I was excited to learn how to interface with long-lived external processes—and this project offered more than I hoped for.
+BEAM is a fairly unique ecosystem in which the philosophy is to constantly reinvent a rounder wheel: it's common to port external dependencies into native Erlang—but the complexity of rsync and its dependence on a matching remote daemon makes it unlikely that it will be rewritten any time soon, which is why I've decided to wrap external command execution in a library.}}
 [[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|300x300px]]
 === Naive shelling ===
 Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
@@ Line 88: / Line 87: @@
 [[File:Chinese typewriter 03.jpg|right|200x200px]]
-On the terminal, rsync progress lines are updated in place by emitting a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>.  The character seems to be named after pushing the physical paper carriage of a typewriter back to the beginning of the line without feeding the roller.
+On the terminal, rsync progress lines are updated in place by beginning each line with a [[w:Carriage return|carriage return]] control character, <code>\r</code>, <code>0x0d</code> sometimes rendered as <code>^M</code>.  Try this command in a terminal:<syntaxhighlight lang="shell">
+echo "one^Mtwo"
+</syntaxhighlight>
+You'll have to use <control>-v <control>-m to type a literal carriage return.  Spoiler: the output should read "two" and nothing else.
+The character seems to be named after pushing the physical paper carriage of a typewriter back to the beginning of the line without feeding the roller.
 [[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreement about carriage return]] vs. line feed has caused eye-rolling since the dawn of personal computing.
@@ Line 96: / Line 100: @@
 == OTP generic server ==
-This is where Erlang/OTP really starts to shine: our rsync library wraps the Port calls under a gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> module and this gives us some special properties for free: a dedicated thread which coordinates with rsync independently from anything else, receiving and sending asynchronous messages.  It has an internal state including the latest percent done and this can be probed by calling code, or it can be set up to push updates to a listener.
+The Port API is convenient enough so far, but where Erlang/OTP really starts to shine is when we wrap each Port connection under a gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> module, giving us some properties for free: A dedicated thread coordinates with its rsync independent of anything else.  Input and output are asynchronous and buffered, but handled sequentially in a thread-safe way.  It holds internal state including the up-to-date completion percentage.  And the caller can either request updates manually, or it can listen for pushed statistics.
-A gen_server should be able to run under a [https://adoptingerlang.org/docs/development/supervision_trees/ OTP supervision tree] as well but our module has a major flaw: although it can correctly detect and report when rsync crashes or completes, when our gen_server is stopped by its supervisor it cannot stop its external child process in turn.
+This gen_server should also be able to run under an [https://adoptingerlang.org/docs/development/supervision_trees/ OTP supervision tree] but this is where the dream falls apart, for the moment.  The Port can watch for rsync completion or failure and report this to its caller, but we fail at the second critical property of being able to shut down rsync if the calling code or our library module crashes.
 == Problem: runaway processes ==
 [[File:CargoNet Di 12 Euro 4000 Lønsdal - Bolna.jpg|thumb]]
-What this means is that rsync transfers would continue to run in the background even after Elixir had completely shut down, because the BEAM has no way of stopping the process.
+The unpleasant real-world consequence of this limitation is that rsync transfers would continue to run in the background even after Elixir had completely shut down, because the BEAM has no way of stopping the process.
+It might be possible to send a signal using unix "kill", but BEAM doesn't expose the child process ID and it doesn't include any built-in commands to send a signal.  Clearly we're expected to do this another way.  Another problem with "kill" is that we want the external process to stop no matter how badly the BEAM is damaged so we can't rely on stored data and on making a few last calls before crashing.
-To check whether this was something specific to rsync, I tried to open a Port spawning the command <code>sleep 60</code> and I found that it behaves exactly the same way, hanging until the sleep ends naturally regardless of what happened in Elixir or whether its pipes are still open.
+To eliminate variable and to understand whether the failure to stop was specific to rsync, I tried the same Port command but spawning a <code>sleep 60</code>, and I found that it behaves exactly the same way, hanging until the sleep ends naturally regardless of what happened in Elixir or whether its pipes are still open.  This happens to have been a lucky choice, as I learned later that "sleep" is also unusual but its behavior is much simpler to reason about.
 == Bad assumption: pipe-like processes ==
-A program like <code>gzip</code> or <code>cat</code> will stop once it detects that its input has ended because the main loop usually makes a C system call to <code>read</code> like this:<syntaxhighlight lang="c">
+A pipeline like <code>gzip</code> or <code>cat</code> it built to read from its input and write to its output.  These will stop once they detects that input has ended because the main loop usually makes a C system call to <code>read</code> like this:<syntaxhighlight lang="c">
 ssize_t n_read = read (input_desc, buf, bufsize);
 if (n_read < 0) { error... }