воскресенье, 27 декабря 2015 г.

nginx module to enable haskell binding to nginx configuration files

Do you like haskell and nginx? I love them both and this inspired me to write an nginx module for inlining haskell source code straight into nginx configuration files. The module was published on github as nginx-haskell-module and shipped with an nginx configuration example to show its basic usage. Let’s look at it.
user                    nobody;
worker_processes        2;

events {
    worker_connections  1024;

http {
    default_type        application/octet-stream;
    sendfile            on;

    haskell ghc_extra_flags '-hide-package regex-pcre';

    haskell compile /tmp/ngx_haskell.hs '

import qualified Data.Char as C
import           Text.Regex.PCRE
import           Safe

toUpper = map C.toUpper
NGX_EXPORT_S_S (toUpper)

takeN x y = take (readDef 0 y) x

NGX_EXPORT_S_S (reverse)

matches :: String -> String -> Bool
matches a b = not $ null (getAllTextMatches $ a =~ b :: [String])
NGX_EXPORT_B_SS (matches)


    server {
        listen       8010;
        server_name  main;
        error_log    /tmp/nginx-test-haskell-error.log;
        access_log   /tmp/nginx-test-haskell-access.log;

        location / {
            haskell_run toUpper $hs_a $arg_a;
            echo "toUpper ($arg_a) = $hs_a";
            if ($arg_b) {
                haskell_run takeN $hs_a $arg_a $arg_b;
                echo "takeN ($arg_a, $arg_b) = $hs_a";
            if ($arg_c) {
                haskell_run reverse $hs_a $arg_c;
                echo "reverse ($arg_c) = $hs_a";
            if ($arg_d) {
                haskell_run matches $hs_a $arg_d $arg_a;
                echo "matches ($arg_d, $arg_a) = $hs_a";
Haskell source code is placed inside the second argument of the directive haskell compile. In this example it contains some imports, three definitions of functions and four special export directives to introduce the functions on the nginx configuration level. There are four types of export directives: NGX_EXPORT_S_S, NGX_EXPORT_S_SS, NGX_EXPORT_B_S and NGX_EXPORT_B_SS for functions of types String -> String, String -> String -> String, String -> Bool and String -> String -> Bool respectively. The code gets written into the path specified in the first argument of the directive haskell compile (it must be an absolute path) and then compiled to a shared library at the very start of nginx. Sometimes ghc may require extra options besides defaults. And here is the case. As soon as import of Text.Regex.PCRE can be ambiguous (because two haskell packages regex-pcre and regex-pcre-builtin provide it), ghc must know which package to use. There is a special ghc flag -hide-package for hiding unwanted packages and it was used in this example by the directive haskell ghc_extra_flags. There is another nginx haskell directive haskell load which is similar to the haskell compile except it does not require the second argument (i.e. the haskell source code). The directive tries to load compiled shared library that corresponds to the path specified in its first argument (/tmp/ngx_haskell.so in this example). If the code argument is present but there is not compiled shared library, the latter will be first compiled and then loaded. If the haskell code fails to compile then nginx won’t start (or won’t reload workers if the code had been wrongly changed in the configuration file and nginx has been sent the SIGHUP signal). Any errors will be logged. To run the compiled haskell code in the nginx context there is another nginx directive haskell_run. It is allowed in server, location and location-if configuration clauses and may accept three or four arguments depending on the arity of the exported function to run which is specified in the first argument of the directive. The second argument introduces an nginx variable where return value of the haskell function will be saved. For example directive
                haskell_run takeN $hs_a $arg_a $arg_b;
introduces a new nginx variable $hs_a which will be calculated on demand as result of running an exported haskell function takeN with arguments $arg_a and $arg_b. Let’s do some curl tests. First of all nginx must be built and run. Besides the haskell module the build requires nginx echo module. It must be specified in options --add-module of nginx configure script.
./configure --add-module=<path-to-nginx-echo-module> --add-module=<path-to-nginx-haskell-module>
Placeholders <path-to-nginx-... are to be replaced with real paths to the modules. After running make we start the nginx daemon.
./objs/nginx -c /home/lyokha/devel/nginx-haskell-module/nginx.conf
[1 of 1] Compiling NgxHaskellUserRuntime ( /tmp/ngx_haskell.hs, /tmp/ngx_haskell.o )
Linking /tmp/ngx_haskell.so ...
Nginx option -c specifies location of the configuration file. The shared library /tmp/ngx_haskell.so was built upon start which was printed on the terminal. And now we are going to ask nginx server to do some haskell calculations!
curl 'http://localhost:8010/?a=hello_world'
toUpper (hello_world) = HELLO_WORLD
curl 'http://localhost:8010/?a=hello_world&b=4'
takeN (hello_world, 4) = hell
curl 'http://localhost:8010/?a=hello_world&b=oops'
takeN (hello_world, oops) = 
curl 'http://localhost:8010/?c=intelligence'
reverse (intelligence) = ecnegilletni
curl 'http://localhost:8010/?d=intelligence&a=^i'
matches (intelligence, ^i) = 1
curl 'http://localhost:8010/?d=intelligence&a=^I'
matches (intelligence, ^I) = 0
Hmm, I did not escape caret inside the URL in the matches examples: seems that curl allowed it, anyway the calculations were correct. Let’s do some changes in the haskell code inside the configuration file, say takeN will be
takeN x y = ("Incredible! " ++) $ take (readDef 0 y) x
, reload nginx configuration
pkill -HUP nginx
and do curl again.
curl 'http://localhost:8010/?a=hello_world&b=4'
takeN (hello_world, 4) = Incredible! hell
Nice, kind of runtime haskell reload! Now I want to explain some details of implementation and concerns about efficiency and exceptions. Haskell code gets compiled while reading nginx configuration file. It means that ghc must be accessible during nginx runtime (except when using haskell load with already compiled haskell library). Only one haskell compile (or haskell load) directive is allowed through the whole configuration. Directive haskell compile (or haskell load) must be placed before directives haskell_run and after directive haskell ghc_extra_flags (if any). Compiled haskell library gets loaded by dlopen() call when starting a worker process (i.e. haskell runtime is loaded for each nginx worker process separately). The haskell code itself is wrapped inside a special haskell FFI code (it can be found in /tmp/ngx_haskell.hs in case of the above example). The FFI tries its best to minimize unnecessary allocations when reading arguments that were passed from nginx, however for S_S and S_SS function types it does allocate a new string for return value which gets promptly copied to an nginx memory pool and freed. It was possible to avoid allocations for return values, but in this case ghc should have known about internal nginx string implementation, so I decided to sacrifice efficiency to avoid runtime dependency on nginx source code (again, this dependency would not have been necessary if this special FFI interface had been compiled in a separate module, but… probably I’ll do that in future). Another concern about efficiency is using in the exported haskell handlers haskell Strings which are simple lists of characters (which is assumed to be inefficient). On the other hand they are lazy (and this often means efficient). Ok, this is a traditional haskell trade-off matter… What about haskell exceptions? C code is unable to handle them and this is a bad news. However writing pure haskell code makes a strong guarantee about high level of exception safety (at least all IO exceptions must go). In the above example I used function readDef from module Safe instead of its partial counterpart read which increased safety level as well.

Комментариев нет:

Отправить комментарий