Draft:Elixir/bzip2-ex: Difference between revisions
rewrite |
Found another library, bzip2_decomp |
||
| Line 1: | Line 1: | ||
An adventure story of my first Erlang/Elixir library binding (NIF). | |||
''Adam Wight, Sept 2022'' | ''Adam Wight, Sept 2022'' | ||
| Line 6: | Line 6: | ||
== Problem statement == | == Problem statement == | ||
[[File:Phap Nang Ngam Nai Wannakhadi (1964, p 60).jpg|thumb|Phap Nang Ngam Nai Wannakhadi (1964, p 60). [This painting is not titled, "Picking the low-hanging fruit". -AW]]I wanted to process some large, compressed files containing Wikipedia content<ref>https://dumps.wikimedia.org/backup-index.html</ref>, which couldn't be expanded in place. The typical approach to this problem is to stream the decompressed data through the desired analysis in memory and then throw it away. | [[File:Phap Nang Ngam Nai Wannakhadi (1964, p 60).jpg|thumb|Phap Nang Ngam Nai Wannakhadi (1964, p 60). [This painting is not titled, "Picking the low-hanging fruit". -AW]]I wanted to process some large, compressed files containing Wikipedia content<ref>https://dumps.wikimedia.org/backup-index.html</ref>, which couldn't be expanded in-place. The typical approach to this problem is to stream the decompressed data through the desired analysis in memory and then throw it away. | ||
Decompression can be accomplished by piping through an external, command-line tool or by reading the file using a native Elixir codec. In my case, I chose to mix these approaches by untarring through tar using a Port, but | Decompression can be accomplished by piping through an external, command-line tool or by reading the file using a native Elixir codec. In my case, I chose to mix these approaches by untarring through tar using a Port, but writing a native bzip2 library to perform the decompression, since none existed at the time. | ||
In hindsight, it would have been much simpler to use command-line bunzip2. The native library should make it possible to use backpressure and concurrency. But mostly I just got excited about a small gap in the BEAM ecosystem and wanted to teach myself how to write an Erlang native implemented function, or NIF<ref>https://www.erlang.org/doc/apps/erts/erl_nif</ref>. | In hindsight, it would have been much simpler to use command-line bunzip2. The native library should make it possible to use backpressure and concurrency. But mostly I just got excited about a small gap in the BEAM ecosystem and wanted to teach myself how to write an Erlang native implemented function, or NIF<ref>https://www.erlang.org/doc/apps/erts/erl_nif</ref>. | ||
| Line 24: | Line 24: | ||
Here I learned the most important requirement of a NIF binding: it does work within the BEAM memory and process space but it must return control to the Elixir scheduler within a very short time period, less than 100ms or so. Low-level it is, then! | Here I learned the most important requirement of a NIF binding: it does work within the BEAM memory and process space but it must return control to the Elixir scheduler within a very short time period, less than 100ms or so. Low-level it is, then! | ||
If you want to look into yet another approach, Moosieus<ref>https://github.com/Moosieus/bzip2_decomp</ref> has written an Elixir binding for pure Rust bzip2-rs<ref>https://github.com/paolobarbolini/bzip2-rs</ref>. This looks good for decompression, but executes in a single run rather than streaming. | |||
==Native implemented function (NIF)== | ==Native implemented function (NIF)== | ||