Elixir/Ports and external process wiring: Difference between revisions

Line 2:

== Context: controlling "rsync" ==

This exploration began ~~when I wrote~~ a ~~simple~~ library ~~to run rsync from an Elixir program~~<ref>https://hexdocs.pm/rsync/Rsync.html</ref>, to transfer files in a background thread ~~while monitoring~~ progress. I ~~was hoping~~ to learn how to interface with long-lived external processes, and I ~~ended up learning~~ more than I wished for.

This exploration began with writing a library<ref>https://hexdocs.pm/rsync/Rsync.html</ref> to run rsync in order to transfer files in a background thread and monitor progress. I hoped to learn how to interface with long-lived external processes, and I got more than I wished for.

Starting rsync ~~and reading from it went very well, mostly thanks~~ to ~~the <code>--info=progress2</code> option which reports progress with~~ a ~~simple columnar format that can be easily parsed~~:<syntaxhighlight lang="~~text~~">

Starting rsync would be as easy as calling out to a shell:<syntaxhighlight lang="elixir">

~~3,342,336 33% 3~~.~~14MB/s 0:00:02~~

System.shell("rsync -a src target")

~~</syntaxhighlight>In case you're here to integrate with~~ rsync~~, there's also a slightly different <code>~~--progress</code> option which reports statistics per file, and an option <code>--itemize-changes</code> which can be included to get information about the operations taken on each file, but in my case I care more about the overall transfer progress.

On the terminal the progress line is updated in-place by restarting the line with the fun [[w:Carriage return|carriage return]] control character <code>0x0d</code> or <code>\r</code>. This is apparently named after pushing the physical paper carriage of a typewriter and on a terminal it will erases the current line so it can be written again! But over a pipe we see this as a regular byte in the stream, like "<code>-old line-^M-new line-</code>". [[W:|Disagreements]] about carriage return vs. newline have caused eye-rolling since the dawn of personal computing but we can double-check the rsync source code and we see that it will format output using carriage return on any platform: <syntaxhighlight lang="~~c">~~

~~rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...~~);

</syntaxhighlight>

This has a few shortcomings: filename escaping is hard to do safely so <code>System.cmd</code> should be used instead, and the job would block until the transfer is done so we get no feedback until completion. Ending the shell command in an ampersand <code>&</code> is not enough, so the caller would have to manually start a new thread.

~~My library starts rsync using~~ Elixir's low-level <code>Port</code> call~~, which~~ maps directly to the base Erlang open_port<ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref> ~~implementation~~:<syntaxhighlight lang="elixir">

Elixir's low-level <code>Port</code> call maps directly to the base Erlang open_port<ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref> and it gives much more flexibility:<syntaxhighlight lang="elixir">

Port.open(

{:spawn_executable, rsync_path},

Line 29:

Line 26:

]

)

</syntaxhighlight>This is where Erlang/OTP really starts to shine: by opening the port inside of a dedicated gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> we have a separate thread communicating with rsync, which receives an asynchronous message like <code>{:data, text_line}</code> for each progress line. It's easy to parse the line, update some internal state and optionally send a progress summary to the code calling the library.

</syntaxhighlight>

{{Aside|text=

If you're here for rsync, it includes a few alternatives for progress reporting:

; <code>--info=progress2</code> : reports overall progress

; <code>--progress</code> : reports statistics per file

; <code>--itemize-changes</code> ; lists the operations taken on each file

Progress reporting uses a columnar format:

3,342,336 33% 3.14MB/s 0:00:02

</syntaxhighlight>

}}

{{Aside|text=

On the terminal the progress line is updated in-place by restarting the line with the fun [[w:Carriage return|carriage return]] control character <code>0x0d</code> or <code>\r</code>. This is apparently named after pushing the physical paper carriage of a typewriter and on a terminal it will erases the current line so it can be written again! But over a pipe we see this as a regular byte in the stream, like "<code>-old line-^M-new line-</code>". [[W:|Disagreements]] about carriage return vs. newline have caused eye-rolling since the dawn of personal computing but we can double-check the rsync source code and we see that it will format output using carriage return on any platform: <syntaxhighlight lang="c">

rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);

</syntaxhighlight>

}}

This is where Erlang/OTP really starts to shine: by opening the port inside of a dedicated gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> we have a separate thread communicating with rsync, which receives an asynchronous message like <code>{:data, text_line}</code> for each progress line. It's easy to parse the line, update some internal state and optionally send a progress summary to the code calling the library.

== Problem: runaway processes ==

@@ Line 2: / Line 2: @@
 == Context: controlling "rsync" ==
-This exploration began when I wrote a simple library to run rsync from an Elixir program<ref>https://hexdocs.pm/rsync/Rsync.html</ref>, to transfer files in a background thread while monitoring progress.  I was hoping to learn how to interface with long-lived external processes, and I ended up learning more than I wished for.
+This exploration began with writing a library<ref>https://hexdocs.pm/rsync/Rsync.html</ref> to run rsync in order to transfer files in a background thread and monitor progress.  I hoped to learn how to interface with long-lived external processes, and I got more than I wished for.
-Starting rsync and reading from it went very well, mostly thanks to the <code>--info=progress2</code> option which reports progress with a simple columnar format that can be easily parsed:<syntaxhighlight lang="text">
+Starting rsync would be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
-,342,336  33%    3.14MB/s    0:00:02
+System.shell("rsync -a src target")
-</syntaxhighlight>In case you're here to integrate with rsync, there's also a slightly different <code>--progress</code> option which reports statistics per file, and an option <code>--itemize-changes</code> which can be included to get information about the operations taken on each file, but in my case I care more about the overall transfer progress.
-On the terminal the progress line is updated in-place by restarting the line with the fun [[w:Carriage return|carriage return]] control character <code>0x0d</code> or <code>\r</code>.  This is apparently named after pushing the physical paper carriage of a typewriter and on a terminal it will erases the current line so it can be written again!  But over a pipe we see this as a regular byte in the stream, like "<code>-old line-^M-new line-</code>".  [[W:|Disagreements]] about carriage return vs. newline have caused eye-rolling since the dawn of personal computing but we can double-check the rsync source code and we see that it will format output using carriage return on any platform: <syntaxhighlight lang="c">
-rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
 </syntaxhighlight>
+This has a few shortcomings: filename escaping is hard to do safely so <code>System.cmd</code> should be used instead, and the job would block until the transfer is done so we get no feedback until completion.  Ending the shell command in an ampersand <code>&</code> is not enough, so the caller would have to manually start a new thread.
-My library starts rsync using Elixir's low-level <code>Port</code> call, which maps directly to the base Erlang open_port<ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref> implementation:<syntaxhighlight lang="elixir">
+Elixir's low-level <code>Port</code> call maps directly to the base Erlang open_port<ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref> and it gives much more flexibility:<syntaxhighlight lang="elixir">
 Port.open(
    {:spawn_executable, rsync_path},
@@ Line 29: / Line 26: @@
    ]
 )
-</syntaxhighlight>This is where Erlang/OTP really starts to shine: by opening the port inside of a dedicated gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> we have a separate thread communicating with rsync, which receives an asynchronous message like <code>{:data, text_line}</code> for each progress line.  It's easy to parse the line, update some internal state and optionally send a progress summary to the code calling the library.
+</syntaxhighlight>
+{{Aside|text=
+If you're here for rsync, it includes a few alternatives for progress reporting:
+; <code>--info=progress2</code> : reports overall progress
+; <code>--progress</code> : reports statistics per file
+; <code>--itemize-changes</code> ; lists the operations taken on each file
+Progress reporting uses a columnar format:
+<syntaxhighlight lang="text">
+,342,336  33%    3.14MB/s    0:00:02
+</syntaxhighlight>
+}}
+{{Aside|text=
+On the terminal the progress line is updated in-place by restarting the line with the fun [[w:Carriage return|carriage return]] control character <code>0x0d</code> or <code>\r</code>.  This is apparently named after pushing the physical paper carriage of a typewriter and on a terminal it will erases the current line so it can be written again!  But over a pipe we see this as a regular byte in the stream, like "<code>-old line-^M-new line-</code>".  [[W:|Disagreements]] about carriage return vs. newline have caused eye-rolling since the dawn of personal computing but we can double-check the rsync source code and we see that it will format output using carriage return on any platform: <syntaxhighlight lang="c">
+rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
+</syntaxhighlight>
+}}
+This is where Erlang/OTP really starts to shine: by opening the port inside of a dedicated gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> we have a separate thread communicating with rsync, which receives an asynchronous message like <code>{:data, text_line}</code> for each progress line.  It's easy to parse the line, update some internal state and optionally send a progress summary to the code calling the library.
 == Problem: runaway processes ==