Elixir/Ports and external process wiring: Difference between revisions

Adamw (talk | contribs)
Adamw (talk | contribs)
c/e and move out some asides
Line 2: Line 2:


== Context: controlling "rsync" ==
== Context: controlling "rsync" ==
This exploration began when I wrote a simple library to run rsync from an Elixir program<ref>https://hexdocs.pm/rsync/Rsync.html</ref>, to transfer files in a background thread while monitoring progress.  I was hoping to learn how to interface with long-lived external processes, and I ended up learning more than I wished for.
This exploration began with writing a library<ref>https://hexdocs.pm/rsync/Rsync.html</ref> to run rsync in order to transfer files in a background thread and monitor progress.  I hoped to learn how to interface with long-lived external processes, and I got more than I wished for.


Starting rsync and reading from it went very well, mostly thanks to the <code>--info=progress2</code> option which reports progress with a simple columnar format that can be easily parsed:<syntaxhighlight lang="text">
Starting rsync would be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
      3,342,336  33%    3.14MB/s    0:00:02
System.shell("rsync -a src target")
</syntaxhighlight>In case you're here to integrate with rsync, there's also a slightly different <code>--progress</code> option which reports statistics per file, and an option <code>--itemize-changes</code> which can be included to get information about the operations taken on each file, but in my case I care more about the overall transfer progress.
 
On the terminal the progress line is updated in-place by restarting the line with the fun [[w:Carriage return|carriage return]] control character <code>0x0d</code> or <code>\r</code>.  This is apparently named after pushing the physical paper carriage of a typewriter and on a terminal it will erases the current line so it can be written again!  But over a pipe we see this as a regular byte in the stream, like "<code>-old line-^M-new line-</code>".  [[W:|Disagreements]] about carriage return vs. newline have caused eye-rolling since the dawn of personal computing but we can double-check the rsync source code and we see that it will format output using carriage return on any platform: <syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>
</syntaxhighlight>
This has a few shortcomings: filename escaping is hard to do safely so <code>System.cmd</code> should be used instead, and the job would block until the transfer is done so we get no feedback until completion.  Ending the shell command in an ampersand <code>&</code> is not enough, so the caller would have to manually start a new thread.


My library starts rsync using Elixir's low-level <code>Port</code> call, which maps directly to the base Erlang open_port<ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref> implementation:<syntaxhighlight lang="elixir">
Elixir's low-level <code>Port</code> call maps directly to the base Erlang open_port<ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref> and it gives much more flexibility:<syntaxhighlight lang="elixir">
Port.open(
Port.open(
   {:spawn_executable, rsync_path},
   {:spawn_executable, rsync_path},
Line 29: Line 26:
   ]
   ]
)
)
</syntaxhighlight>This is where Erlang/OTP really starts to shine: by opening the port inside of a dedicated gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> we have a separate thread communicating with rsync, which receives an asynchronous message like <code>{:data, text_line}</code> for each progress line.  It's easy to parse the line, update some internal state and optionally send a progress summary to the code calling the library.
</syntaxhighlight>
 
{{Aside|text=
If you're here for rsync, it includes a few alternatives for progress reporting:
 
; <code>--info=progress2</code> : reports overall progress
; <code>--progress</code> : reports statistics per file
; <code>--itemize-changes</code> ; lists the operations taken on each file
 
Progress reporting uses a columnar format:
<syntaxhighlight lang="text">
      3,342,336  33%    3.14MB/s    0:00:02
</syntaxhighlight>
}}
 
{{Aside|text=
On the terminal the progress line is updated in-place by restarting the line with the fun [[w:Carriage return|carriage return]] control character <code>0x0d</code> or <code>\r</code>.  This is apparently named after pushing the physical paper carriage of a typewriter and on a terminal it will erases the current line so it can be written again!  But over a pipe we see this as a regular byte in the stream, like "<code>-old line-^M-new line-</code>".  [[W:|Disagreements]] about carriage return vs. newline have caused eye-rolling since the dawn of personal computing but we can double-check the rsync source code and we see that it will format output using carriage return on any platform: <syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>
}}
 
This is where Erlang/OTP really starts to shine: by opening the port inside of a dedicated gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> we have a separate thread communicating with rsync, which receives an asynchronous message like <code>{:data, text_line}</code> for each progress line.  It's easy to parse the line, update some internal state and optionally send a progress summary to the code calling the library.


== Problem: runaway processes ==
== Problem: runaway processes ==