FFI Overview

Foreign functions are declared with the %foreign directive, which takes the following general form:

%foreign [specifiers]
name : t

The specifier is an Idris String which says in which language the foreign function is written, what it’s called, and where to find it. There may be more than one specifier, and a code generator is free to choose any specifier it understands - or even ignore the specifiers completely and use their own approach. In general, a specifier has the form “Language:name,library”. For example, in C:

%foreign "C:puts,libc"
puts : String -> PrimIO Int

It is up to specific code generators to decide how to locate the function and the library. In this document, we will assume the default Chez Scheme code generator (the examples also work with the Racket or Gambit code generator) and that the foreign language is C.

Scheme Sidenote

Scheme foreign specifiers can be written to target particular flavors.

The following example shows a foreign declaration that allocates memory in a way specific to the choice of code generator. In this example there is no general scheme specifier present that matches every flavor, e.g. scheme:foo, so it will only match the specific flavors listed:

%foreign "scheme,chez:foreign-alloc"
         "scheme,racket:malloc"
         "C:malloc,libc"
allocMem : (bytes : Int) -> PrimIO AnyPtr

Note

If your backend (code generator) is not specified but defines a C FFI it will be able to make use of the C:malloc,libc specifier.

C Sidenote

The C language specifier is used for common functions that may be used by any backend which can, in turn, FFI out to C. For example, Scheme.

The common C functions do no automatic memory management, deferring that to the individual backends.

The standard C backend is known as “RefC”, and uses the RefC language specifier.

FFI Example

As a running example, we are going to work with a small C file. Save the following content to a file smallc.c

#include <stdio.h>

int add(int x, int y) {
    return x+y;
}

int addWithMessage(char* msg, int x, int y) {
    printf("%s: %d + %d = %d\n", msg, x, y, x+y);
    return x+y;
}

Then, compile it to a shared library with:

cc -shared smallc.c -o libsmall.so

We can now write an Idris program which calls each of these. First, we’ll write a small program which uses add to add two integers:

%foreign "C:add,libsmall"
add : Int -> Int -> Int

main : IO ()
main = printLn (add 70 24)

The %foreign declaration states that add is written in C, with the name add in the library libsmall. As long as the run time is able to locate libsmall.so (in practice it looks in the current directory and the system library paths) we can run this at the REPL:

Main> :exec main
94

Note that it is the programmer’s responsibility to make sure that the Idris function and C function have corresponding types. There is no way for the machine to check this! If you get it wrong, you will get unpredictable behaviour.

Since add has no side effects, we’ve given it a return type of Int. But what if the function has some effect on the outside world, like addWithMessage? In this case, we use PrimIO Int to say that it returns a primitive IO action:

%foreign "C:addWithMessage,libsmall"
prim__addWithMessage : String -> Int -> Int -> PrimIO Int

Internally, PrimIO Int is a function which takes the current (linear) state of the world, and returns an Int with an updated state of the world. In general, IO operations in an Idris program are defined as instances of the HasIO interface. We can convert a primitive operation to one usable in HasIO using primIO:

primIO : HasIO io => PrimIO a -> io a

So, we can extend our program as follows:

addWithMessage : HasIO io => String -> Int -> Int -> io Int
addWithMessage s x y = primIO $ prim__addWithMessage s x y

main : IO ()
main
    = do printLn (add 70 24)
         addWithMessage "Sum" 70 24
         pure ()

It is up to the programmer to declare which functions are pure, and which have side effects, via PrimIO. Executing this gives:

Main> :exec main
94
Sum: 70 + 24 = 94

We have seen two specifiers for foreign functions:

%foreign "C:add,libsmall"
%foreign "C:addWithMessage,libsmall"

These both have the same form: "C:[name],libsmall" so instead of writing the concrete String, we write a function to compute the specifier, and use that instead:

libsmall : String -> String
libsmall fn = "C:" ++ fn ++ ",libsmall"

%foreign (libsmall "add")
add : Int -> Int -> Int

%foreign (libsmall "addWithMessage")
prim__addWithMessage : String -> Int -> Int -> PrimIO Int

Primitive FFI Types

The types which can be passed to and returned from foreign functions are restricted to those which it is reasonable to assume any back end can handle. In practice, this means most primitive types, and a limited selection of others. Argument types can be any of the following primitives:

  • Int
  • Char
  • Double (as double in C)
  • Bits8
  • Bits16
  • Bits32
  • Bits64
  • String (as char* in C)
  • Ptr t and AnyPtr (both as void* in C)

Return types can be any of the above, plus:

  • ()
  • PrimIO t, where t is a valid return type other than a PrimIO.

Handling String leads to some complications, for a number of reasons:

  • Strings can have multiple encodings. In the Idris run time, Strings are encoded as UTF-8, but C makes no assumptions.
  • It is not always clear who is responsible for freeing a String allocated by a C function.
  • In C, strings can be NULL, but Idris strings always have a value.

So, when passing String to and from C, remember the following:

  • A char* returned by a C function will be copied to the Idris heap, and the Idris run time immediately calls free with the returned char*.
  • If a char* might be NULL in C, use Ptr String rather than String.

When using Ptr String, the value will be passed as a void*, and therefore not accessible directly by Idris code. This is to protect against accidentally trying to use NULL as a String. You can nevertheless work with them and convert to String via foreign functions of the following form:

char* getString(void *p) {
    return (char*)p;
}

void* mkString(char* str) {
    return (void*)str;
}

int isNullString(void* str) {
    return str == NULL;
}

For an example, see the sample Example: Minimal Readline Bindings bindings.

Additionally, foreign functions can take callbacks, and take and return C struct pointers.

Callbacks

It is often useful in C for a function to take a callback, that is a function which is called after doing some work. For example, we can write a function which takes a callback that takes a char* and an int and returns a char*, in C, as follows (added to smallc.c above):

typedef char*(*StringFn)(char*, int);

char* applyFn(char* x, int y, StringFn f) {
    printf("Applying callback to %s %d\n", x, y);
    return f(x, y);
}

Then, we can access this from Idris by declaring it as a %foreign function and wrapping it in the HasIO interface, with the C function calling the Idris function as the callback:

%foreign (libsmall "applyFn")
prim__applyFn : String -> Int -> (String -> Int -> String) -> PrimIO String

applyFn : HasIO io =>
          String -> Int -> (String -> Int -> String) -> io String
applyFn c i f = primIO $ prim__applyFn c i f

For example, we can try this as follows:

pluralise : String -> Int -> String
pluralise str x
    = show x ++ " " ++
             if x == 1
                then str
                else str ++ "s"

main : IO ()
main
    = do str1 <- applyFn "Biscuit" 10 pluralise
         putStrLn str1
         str2 <- applyFn "Tree" 1 pluralise
         putStrLn str2

As a variant, the callback could have a side effect:

%foreign (libsmall "applyFn")
prim__applyFnIO : String -> Int -> (String -> Int -> PrimIO String) ->
                 PrimIO String

This is a little more fiddly to lift to a HasIO function, due to the callback, but we can do so using toPrim : IO a -> PrimIO a:

applyFnIO : HasIO io =>
            String -> Int -> (String -> Int -> IO String) -> io String
applyFnIO c i f = primIO $ prim__applyFnIO c i (\s, i => toPrim $ f s i)

Note that the callback is explicitly in IO here, since HasIO doesn’t have a general method for extracting the primitive IO operation.

For example, we can extend the above pluralise example to print a message in the callback:

pluralise : String -> Int -> IO String
pluralise str x
    = do putStrLn "Pluralising"
         pure $ show x ++ " " ++
                if x == 1
                   then str
                   else str ++ "s"

main : IO ()
main
    = do str1 <- applyFnIO "Biscuit" 10 pluralise
         putStrLn str1
         str2 <- applyFnIO "Tree" 1 pluralise
         putStrLn str2

Structs

Many C APIs pass around more complex data structures, as a struct. We do not aim to be completely general in the C types we support, because this will make it harder to write code which is portable across multiple back ends. However, it is still often useful to be able to access a struct directly. For example, add the following to the top of smallc.c, and rebuild libsmall.so:

#include <stdlib.h>

typedef struct {
    int x;
    int y;
} point;

point* mkPoint(int x, int y) {
    point* pt = malloc(sizeof(point));
    pt->x = x;
    pt->y = y;
    return pt;
}

void freePoint(point* pt) {
    free(pt);
}

We can define a type for accessing point in Idris by importing System.FFI and using the Struct type, as follows:

Point : Type
Point = Struct "point" [("x", Int), ("y", Int)]

%foreign (libsmall "mkPoint")
mkPoint : Int -> Int -> Point

%foreign (libsmall "freePoint")
prim__freePoint : Point -> PrimIO ()

freePoint : Point -> IO ()
freePoint p = primIO $ prim__freePoint p

The Point type in Idris now corresponds to point* in C. Fields can be read and written using the following, also from System.FFI:

getField : Struct s fs -> (n : String) ->
           FieldType n ty fs => ty
setField : Struct s fs -> (n : String) ->
           FieldType n ty fs => ty -> IO ()

Notice that fields are accessed by name, and must be available in the struct, given the constraint FieldType n ty fs, which states that the field named n has type ty in the structure fields fs. So, we can display a Point as follows by accessing the fields directly:

showPoint : Point -> String
showPoint pt
    = let x : Int = getField pt "x"
          y : Int = getField pt "y" in
          show (x, y)

And, as a complete example, we can initialise, update, display and delete a Point as follows:

main : IO ()
main = do let pt = mkPoint 20 30
          setField pt "x" (the Int 40)
          putStrLn $ showPoint pt
          freePoint pt

The field types of a Struct can be any of the following:

  • Int
  • Char
  • Double (double in C)
  • Bits8
  • Bits16
  • Bits32
  • Bits64
  • Ptr a or AnyPtr (void* in C)
  • Another Struct, which is a pointer to a struct in C

Note that this doesn’t include String or function types! This is primarily because these aren’t directly supported by the Chez back end. However, you can use another pointer type and convert. For example, assuming you have, in C:

typedef struct {
    char* name;
    point* pt;
} namedpoint;

You can represent this in Idris as:

NamedPoint : Type
NamedPoint
    = Struct "namedpoint"
               [("name", Ptr String),
               ("pt", Point)]

That is, using a Ptr String instead of a String directly. Then you can convert between a void* and a char* in C:

char* getString(void *p) {
    return (char*)p;
}

…and use this to convert to a String in Idris:

%foreign (pfn "getString")
getString : Ptr String -> String

Finalisers

In some libraries, a foreign function creates a pointer and the caller is responsible for freeing it. In this case, you can make an explicit foreign call to free. However, this is not always convenient, or even possible. Instead, you can ask the Idris run-time to be responsible for freeing the pointer when it is no longer accessible, using onCollect (or its typeless variant onCollectAny) defined in the Prelude:

onCollect : Ptr t -> (Ptr t -> IO ()) -> IO (GCPtr t)
onCollectAny : AnyPtr -> (AnyPtr -> IO ()) -> IO GCAnyPtr

A GCPtr t behaves exactly like Ptr t when passed to a foreign function (and, similarly, GCAnyPtr behaves like AnyPtr). A foreign function cannot return a GCPtr however, because then we can no longer assume the pointer is completely managed by the Idris run-time.

The finaliser is called either when the garbage collector determines that the pointer is no longer accessible, or at the end of execution.

Note that finalisers might not be supported by all back ends, since they depend on the facilities offered by a specific back end’s run time system. They are certainly supported in the Chez Scheme and Racket back ends.