Thursday 18 February 2010

A small Haskell / Objective-C Interface


In this post I will present a small Haskell typeclass OBJC for interfacing with Objective-C. Instead of implementing an interface via proxy types, OBJC will provide two functions toId and fromId. These can be used to transfer native Objective-C values to corresponding native Haskell value. It means for example converting a NSString to a Haskell String value or a NSArray to a Haskell list. The conversion via OBJC also works for more complex data structures, like e.g. [(Int, String)].

The use case for this typeclass would be to define a Haskell model implementation in a Cocoa Model-View-Controller application. This way the communication between the Haskell model and the Objective-C controller can be made easier and the programmer is able to define the model directly in terms of Haskell types.

For testing I also provide a simple application that shows how to use the OBJC typeclass in a Cocoa application.

Introduction

In the last post I described a technique to build a simple Cocoa application that interacts with Haskell. In this Cocoa application (Apple’s famous Currency Converter tutorial) all that was passed were plain scalar values, namely the double values for exchange rate, dollar amount, and the result. Passing C scalar values is made easy, because they are handled by Haskell’s Foreign Function Interface out of the box. The FFI also helps us to pass stable opaque references of Haskell values back to C by providing the StablePtr type.

In this post we will extend this simple interface to more data types. We will get a corresponding type to StablePtr for storing Objective-C objects in Haskell values in a stable way. Additionally we will create instances of OBJC for basic data types, like numbers, strings, and arrays. The template for this typeclass was the JSON typeclass that is described in chapter 6 of Real World Haskell.

This approach focusses on providing help for defining Haskell models for Cocoa applications. The rest of the application will still be implemented in Objective-C. For a complete implementation of a Cocoa application in Haskell HOC would be a good choice.

For the development of the OBJC typeclass we will switch back to reference counting on the Objective-C side. Dealing with two garbage collectors in two runtimes is hard. One of these runtimes might be heavily threaded because of Grand Central Dispatch; the other one uses lazy evaluation as default. Not exactly the kind of environment you want to play with if you are just starting. It’s not easy to be sure: is this object still reachable? When will it be evaluated and will it still be there? The ideal recipe for some headaches! So therefore it’s best to eliminate at least one cause for potential problems and switch back to reference-counting (retain/release) on the Objective-C side. Maybe I am too cautious, maybe it all works fine using the Objective-C GC, but for a start we will just take the simple approach.

The OBJC typeclass

The basic Objective-C class is NSObject and a pointer to a NSObject instance is of type id. We would like to pass NSObjects to Haskell functions as arguments or to return them as a result of a function. On the Haskell side it would be nice if basic types, like e.g. arrays or strings, could be automatically converted to the corresponding Objective-C types and vice versa.

For tasks like this it is best to define a typeclass in Haskell:

class OBJC a where
    fromId :: Id -> IOOBJC a
    toId   :: a  -> IOOBJC Id

This means that an instance a of the typeclass OBJC will provide a function fromId that converts Objective-C ids to a and a function toId that does the backward conversion, providing an Objective-C object for an a value.

The Objective-C pointers are represented in Haskell by the type Id:

data ObjcObject
type Id = Ptr ObjcObject    -- typed pointer for all NSObjects 

This is a typed pointer and can never be dereferenced.1 The type ObjcObject is an empty data declaration: there is no way in Haskell to construct a value of the type ObjcObject, which is ok as Objective-C objects are constructed by the Objective-C runtime. In order to use empty data declarations we have to switch on the GHC extension EmptyDataDecls.

Now a closer look at the result types of the typeclass functions: toId will provide a new Id for a given a value. This new Id will be an Objective-C value and means, that we will have to call the Objective-C runtime and maybe construct a new object. This clearly implies, that we have to do some IO and therefore this function will run in the IO monad, meaning that calling this functions will have side effects, namely the construction of a new Objective-C object. Just what we wanted.

The other function fromId will also call the Objective-C runtime and as Objective-C objects are not immutable per se calling the function fromId might give different results even if we provide the same id as an argument (for example, think of NSMutableArray). So this function has to be run in the IO monad too.

But the conversion from an Id value to an a value might result in an error. This is because we are doing a conversion from a weakly typed language to a strongly typed one.2 We have to deal with this. Therefore the function fromId runs in the IOOBJC monad, which is the IO monad plus error handling. It is defined as:

type OBJCError = String
type IOOBJC = ErrorT OBJCError IO

Using the ErrorT monad transformer. Inside this monad we can throw an error by the throwError action in case something goes wrong.3 Although it might strictly be not necessary4, we will use the IOOBJC monad for the toId function also, because this way the resulting code involving both fromId and toId will look nicer.

Opaque Values — The basic instance StableId

The first instance of the OBJC typeclass will be StableId. This is the counterpart of the StablePtr from the FFI. Where the StablePtr provides a way to store an opaque reference to a Haskell value in Objective-C and making sure that the referenced value will not be garbage collected, the StableId will provide a way to store an opaque Objective-C object in Haskell. It will make sure, that this object will not be dealloced by Objective-C as long as Haskell holds a reference to this object.

In order to deal with foreign-managed objects we have to use the ForeignPtr data type as the documentation explains:

The type ForeignPtr represents references to objects that are maintained in a foreign language, i.e., that are not part of the data structures usually managed by the Haskell storage manager. The essential difference between ForeignPtrs and vanilla memory references of type Ptr a is that the former may be associated with finalizers.

Therefore we define:

newtype StableId = StableId {
      foreignPtr :: ForeignPtr ObjcObject
    }

This type shall be an instance of the OBJC typeclass:

instance OBJC StableId where
    toId x = liftIO $ 
             withForeignPtr (foreignPtr x) $ 
                \ptr -> c_retainId ptr >>= c_autoreleaseId

    fromId ptr = liftIO $ 
                 do x <- c_retainId ptr >>= newForeignPtr c_FunPtr_releaseId 
                    return $ StableId x

In fromId we create a new ForeignPtr from a given Id. This is done by x <- c_retainId ptr >>= newForeignPtr c_FunPtr_releaseId: first the Id value is retained by c_retainId and then a new ForeignPtr is created with an associated finalizer c_FunPtr_releaseId. This means, first Haskell increases the retain count of the transferred object such that the Objective-C runtime wont dealloc the object as long as Haskell holds a reference to it via the ForeignPtr. This object is then released —once the Haskell runtime has no longer a reference to it— using the mechanism of the ForeignPtr and the provided finalizer c_FunPtr_releaseId. These functions are defined in Objective-C as:

void releaseId(id object)
{
    // release can trigger dealloc, which might need an autorelease pool
    NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
    [object release];
    [pool release];
}

id retainId(id object)
{
    return [object retain];
}

id autoreleaseId(id object)
{
    return [object autorelease];
}

The toId function is similar: we just retrieve the stored pointer to the Objective-C object and return it to the Objective-C runtime. Before that, we make sure that the returned object will not be dealloced by sending a retain and autorelease message to it via the Objective-C runtime.

The StableId instance of OBJC can deal with every sub-class of NSObject. It provides an opaque storage for objects of these Objective-C classes just like the StablePtr on the C side.

Numbers

For dealing with C scalar number types like double or int the FFI provides the necessary mechanisms. But the C number types are not first-class values in Objective-C. So, for example, you cannot store them in a NSArray.5 The first-class wrapper around C numbers is NSNumber. In the following we will construct instances for Int and Double of OBJC that can convert to and from NSNumber values.

The definition of the numeric instances for OBJC is quite straightforward: first we define a helper function that checks if a given Id represents a NSNumber object and if so calls a function f :: (OBJC a) => (Id -> IO a) to do the real conversion of this NSNumber to a numeric value in Haskell:

checkIfNSNumber :: (OBJC a) => (Id -> IO a) -> Id -> IOOBJC a
checkIfNSNumber f ptr = do isNSNumber <- liftIO $ c_isNSNumber ptr
                           case (fromIntegral isNSNumber) of
                             0         -> throwError "not a NSNumber value"
                             otherwise -> liftIO $ f ptr

The C helper function is given as:

int isNSNumber(id object)
{
    if ([object isKindOfClass:[NSNumber class]]) 
        return 1;
    else 
        return 0;
}

We are passing back int values instead of BOOL, because the FFI does not provide a wrapper for boolean values.

With this we can now define the OBJC instances for Double and Int:

instance OBJC Double where
    toId = checkNullPtr "Could not create NSNumber" . 
           c_numberWithDouble . realToFrac

    fromId = checkIfNSNumber $ liftM realToFrac . c_doubleValue

instance OBJC Int where
    toId = checkNullPtr "Could not create NSNumber" . 
           c_numberWithLong . fromIntegral

    fromId = checkIfNSNumber $ liftM fromIntegral . c_longValue

The function checkNullPtr is a simple function to lift an IO action to the IOOBJC monad. It will throw an error if a nullPtr is passed. This function is given as:

checkNullPtr :: String -> IO Id -> IOOBJC Id
checkNullPtr msg act = do ptrId <- liftIO act
                          if ptrId == nullPtr
                             then throwError msg
                             else return ptrId

Additionally we have the C conversion functions, here are the functions for Double:

double doubleValue(NSNumber *aNumber)
{
    return [aNumber doubleValue];
}

NSNumber *numberWithDouble(double aDouble)
{
    return [NSNumber numberWithDouble:aDouble];
}

This way basically all numeric values can be made into instances of OBJC.6

Strings

In Haskell we have the luxury of being provided with unicode strings. Same goes for Objective-C’s NSString class. The only problem is that we have to interface via C’s old ASCII strings. But this can be dealt with by using the nice Data.Text7 package in combination with Data.ByteString.

So in order to define an OBJC instance for String8 we proceed by “wishful thinking” and assume, that we already have an instance for Data.Text:

instance OBJC String where
    -- via Text
    toId = toId . T.pack
    fromId x = return . T.unpack =<< fromId x

T.pack and T.unpack are functions from Data.Text (imported qualified as T) that construct T.Text values from Strings and vice versa.

In toId = toId . T.pack the function on the left-hand side has the type String -> IOOBJC Id and toId on the right-hand side has the type T.Text -> IOOBJC Id. These a two different functions and the Haskell compiler knows what function to take just by inferring the used types.

We now define the instance for T.Text once again by wishful thinking:

instance OBJC T.Text where
    -- via ByteString
    toId = toId . encodeUtf8
    fromId x = return . decodeUtf8 =<< fromId x

The functions encodeUtf8 and decodeUtf8 are from Data.Text.Encoding and provide the translation from and to ByteStrings.

Finally there comes the point where we really have to deal with the conversion to Objective-C types. For this the ByteString module provides the necessary functions:

instance OBJC BS.ByteString where
    toId x = checkNullPtr "Could not create NSString" $ 
                            BS.useAsCString x c_utf8ToNSString

    fromId x = do ptr <- liftIO $ c_nsStringToUtf8 x
                  if ptr == nullPtr
                    then throwError "not a NSString value"
                    else liftIO $ BS.packCString ptr

For this we also need the corresponding C helper functions:

const char *nsStringToUtf8(NSString *str)
// returns a CString of UTF8 chars for the NSString str.
// if str is not a NSString, it will pass back NULL
{
    // it is quite time consuming to test at runtime if `str` really is a NSString.
    // so let's be optimistic and deal with an exception by passing back the null pointer.
    const char *p;
    @try {
        // according to the documentation of NSString's UTF8String the resulting
        // char array will be freed by the autorelease pool.
        p = [str UTF8String];
    }
    @catch (NSException * e) {
        p = NULL;
    }

    return p;
}

NSString *utf8ToNSString(const char* cstr)
{
    NSString *res;
    @try {
        res = [NSString stringWithUTF8String:cstr];
    }
    @catch (NSException * e) {
        res = NULL;
    }

    return res;
}

This way we can now construct three different types from a NSString: ByteString, Text, and String.

Arrays Lists

A more interesting data type is the array or list. (I know, arrays are not lists, but for the sake of this article I will use them both as representation of some abstract container type. For all that matters now, we can convert NSArrays to Haskell’s lists.)

An array contains other values, maybe even other arrays. So we cannot say in advance how deep the conversion has to be. But in principle we have to trigger the conversion of the complete nested structure contained in the array.

So the conversion of a container consists of two steps:

  1. Convert the original container value.

  2. (Recursively) Convert all contained values.

This can be done in Haskell quite nicely:

instance (OBJC a) => OBJC [a] where
    toId  xs = mapM toId xs >>= toNSArray'
    fromId x = fromNSArray' x >>= mapM fromId 

In fromId the function fromNSArray' has the type Id -> IOOBJC [Id]. It converts the original container NSArray to the Haskell list [Id] that contains the untreated Objective-C values of type Id. These values are then converted to Haskell values by mapM fromId. In the case of toId we have to execute the steps the other way around: first we have to convert all the values of a list (OBJC a) => [a] to [Id] which is done inside the IOOBJC monad by mapM toId and then convert the Haskell list to an NSArray by the function toNSArray' :: [Id] -> IOBJC Id.

These helper function toNSArray' and fromNSArray' are defined as local functions and we get the final implementation:

instance (OBJC a) => OBJC [a] where
    toId xs = mapM toId xs >>= toNSArray'
        where
          toNSArray' :: [Id] -> IOOBJC Id
          toNSArray' x = checkNullPtr "Could not create NSArray" $ 
                              withArrayLen x $ \len ptr -> 
                                  c_arrayWithCArray ptr (fromIntegral len)

    fromId x = fromNSArray' x >>= mapM fromId 
        where
          fromNSArray' :: Id -> IOOBJC [Id]
          fromNSArray' x = do ptr <- liftIO $ c_getObjects x
                              if ptr == nullPtr
                                then throwError "not a NSArray"
                                else liftIO $ do 
                                  len <- c_len x
                                  res <- peekArray (fromIntegral len) ptr
                                  free ptr
                                  return res

The restriction (OBJC a) => OBJC [a] says, that we can convert only lists that contain values of another OBJC instance. But this really is not that much of a restriction as we have the OBJC instance StableId that can be used for every NSObject value.

For the conversion of lists to C arrays we use the Foreign.Marshal.Array functions withArrayLen and peekArray. Finally here are the missing C helper functions:

NSArray *arrayWithCArray(id *objects, NSUInteger count)
{
    return [NSArray arrayWithObjects:objects count:count];
}

NSUInteger lengthOfArray(NSArray *anArray)
{
    NSUInteger len;
    @try {
        len = [anArray count];
    }
    @catch (NSException * e) {
        len = 0;
    }
    return len;
}

id *getObjects(NSArray *anArray)
{
    id (*objects);
    @try {
        NSRange range = NSMakeRange(0, [anArray count]);
        objects = malloc(sizeof(id) * range.length);
        [anArray getObjects:objects range:range];
    }
    @catch (NSException * e) {
        objects = NULL;
    }
    return objects;
}

The nice thing now is that we can trigger the depth of the conversion just by the type of the resulting OBJC instance! E.g. if we have a large NSArray of NSArrays of NSStrings we might not be interested in the individual strings on the Haskell side (we might just want to pass them back to Objective-C anyway). In this case we would use a conversion like fromId :: IOOBJC [StableId] and take, say, only the first element. But if we on the other hand really need all those strings nested in the structure we might use a conversion like fromId :: IOOBJC [[String]]. All just by specifying the result type.

Tuples

We now have the conversions for the basic data structures. The conversions for other data structures like e.g. NSDictionary can easily be defined along these lines. But as a last example we will look at the conversion for a Haskell tuple type, because this is kind of interesting.

Objective-C does not know tuples, in some sense NSArrays are tuples also. In Haskell one difference between tuples and lists is that tuples can contain values of different type. So if we want to define an instance of OBJC for a two-tuple we have to allow for that. This is expressed by the definition:

instance (OBJC a, OBJC b) => OBJC (a, b) where
    undefined

Here the type variables a and b might refer to different instances of OBJC. This means that types like (String, Double) would be an instance of OBJC, but also (String, String).

In order to make the definition of the instance for the two-tuple easy we would like to reuse old code. It would be nice if we could use the Haskell list to NSArray conversion as we finally want to convert the tuple to a NSArray anyway. But in Haskell a list can only contain values of the same type. So to make that work we will use StableId as a middleman and get:

instance (OBJC a, OBJC b) => OBJC (a, b) where
    -- via list and StableId
    toId (a, b) = do -- wrap arguments into opaque StableId, so that we can use them in a list
                     aStId <- fromId =<< toId a :: IOOBJC StableId
                     bStId <- fromId =<< toId b :: IOOBJC StableId
                     toId [aStId, bStId]

    fromId x = do ys <- fromId x :: IOOBJC [StableId]
                  case ys of
                      (aStId:bStId:[]) -> do a <- fromId =<< toId aStId
                                             b <- fromId =<< toId bStId
                                             return (a, b)
                      otherwise        -> throwError "Wrong number of arguments for (,)"

Implementations for longer tuples can be done the same way.

An example Cocoa application

For testing the OBJC typeclass I created a short Cocoa application. Its source is available as download.9 The mechanism how to use Haskell in an Objective-C application is described in my last blog post.

In this application mainly pure Haskell functions are tested. These are wrapped with the helper functions

toCocoa :: (OBJC a, OBJC b) => (a -> b) -> Id -> IO Id
toCocoa f anId = catchOBJC $ toId . f =<< fromId anId

catchOBJC :: IOOBJC Id -> IO Id
catchOBJC act = do eth <- runErrorT act
                   case eth of
                      Left err -> do nsLog $ "(Haskell) OBJC error: " ++ err
                                     return nullPtr
                      Right  y -> return y

With toCocoa the example “Array Test” is just defined as:

lengthOfStrings :: [String] -> [(Int, String)]
lengthOfStrings = map $ \x -> (length x, x)

foreign export ccall c_countAllStrings :: Id -> IO Id
c_countAllStrings = toCocoa lengthOfStrings

The toCocoa function automatically applies the necessary conversions for wrapping every function of type (OBJC a, OBJC b) => a -> b and makes it available to Objective-C!

Included are also two tests in which the execution of the Id -> IO Id function is paused by passing back a StablePtr to Objective-C. I did this, because I do not feel that comfortable with lazy evaluation. As a Haskell beginner, I find it generally quite difficult to tell which parts of an expression will be evaluated immediately and which parts will become thunks. This especially worries me, because in passing Objective-C values to Haskell we must make sure that we have handled these values on the Haskell side before they are released by the Objective-C runtime. So I wanted to test a typical scenario where one would store an OBJC instance value inside a Haskell data structure and pass a StablePtr to this value back to Objective-C.

Summary

We now have quite a nice way to convert Haskell values to Objective-C or the other way around:

Data StructureHaskell RepresentationObjective-C RepresentationRemarks
Opaque valueStableId by OBJCStablePtr from the FFI
Scalar numberInt, Double, …C types int, double, …Conversion by FFI
1st class numberInt, Double, …NSNumberConversion by OBJC
StringString, Data.TextNSStringConversion by OBJC
ContainerList, tupleNSArrayConversion by OBJC

It should be quite easy to expand this list for more data types, like e.g. NSDictionary.

But one does not necessary has to define own wrappers for every NSObject type, as one can just use the opaque type StableId and use Objective-C methods to deal with this object. In the provided source is an example (“String Test”) that shows how to send a simple message to an Objective-C object.

Conclusion

The combination of typeclasses with a strong type system offers a nice way to do quasi-automatic transformation of Objective-C values to Haskell values and vice versa. For example, pure Haskell functions of type (OBJC a, OBJC b) => (a -> b) can be made available for Objective-C just by wrapping them with toCocoa. Other functions types can be exposed easily as well.

Working on the typeclass was quite straightforward, even for a Haskell beginner like me. But I am convinced that the provided code is not written in the way that a Haskell expert would have written it.

Although I really enjoyed working on this typeclass, I feel a bit uneasy concerning lazy evaluation, as mentioned before in the chapter “An example Cocoa application”. Right now, all my test cases show no problems with lazy evaluation and object retaining/releasing. But this surely does not mean that all of this is safe in every possible situation. Reasoning about this non-strict behavior is for me so far the biggest problem in programming Haskell.

One last thing: Looking at the type of the tuple instance for OBJC, (OBJC a, OBJC b) => OBJC (a, b), and the types of functions as arguments to toCocoa, (OBJC a, OBJC b) => (a -> b), one notices that they look very similar.10 Maybe there is a sensible way to define an OBJC instance for these functions. On the Objective-C side we might have to define a new class, that will hold the original Haskell function, and that can invoke the stored function thereby automatically converting the argument to a Haskell value and the result back to an Objective-C value. Providing such an instance could help us to get rid of all those “dummy” C function definitions for the Haskell model.




  1. For details about typed pointers please consult chapter 17 of Real World Haskell.

  2. Think of converting a NSArray to a type [String]: it might be possible at compile time to deduce the array type, but in general it is impossible in Objective-C to tell the type of the contained elements in an array at compile time. It is even possible, that an Objective-C array contains objects of different types. So converting such an array to a Haskell list might result in a runtime error. This class of runtime errors are handled by the IOBJC monad.

  3. The IOOBJC just uses strings for the error. In a future version it might be better to use a dedicated instance of the Error typeclass as explained in chapter 19 of Real World Haskell.

  4. The kind of errors, that IOOBJC is used for, will not be thrown by toId, because in this case we are converting from a strongly typed language to a language with a weaker type system. So we are loosing information about the types. IOBJC is there to deal with errors, where we don’t have enough information about the types of the values at compile time.

  5. In Haskell numeric types are not only first-class citizens, but you can define your own numeric types and do quite interesting stuff as explained in chapter 13 of Real World Haskell.

  6. Maybe it would be nice to provide an instance for complete numeric typeclasses, like Fractional a.

  7. Data.Text is not yet part of the Haskell Platform, but can easily be installed from hackage.

  8. In order to define an instance for String, which is itself a type synonym for [Char], we have to switch on the TypeSynonymInstances extension of GHC.

  9. All source is licensed under the Apache License, Version 2.0.

  10. (Warning: Sloppy mathematics!) In a sense the mapping of every argument to its result for a pure function can be expressed in tuples. Therefore this similarity of tuples and pure functions is not that much of a surprise.

5 comments:

  1. I did something very similar to your OBJC class for Hubris, converting Ruby values to Haskell and back (although I split them into two classes). It's a useful pattern.

    ReplyDelete
  2. Yes, I've noticed Hubris a couple of days ago on "Haskell :: Reddit". Really nice work! (And having inline Haskell in Ruby is much more sophisticated than my simple interface ;-)

    As you said, you've got two typeclasses and following your example I could split the OBJC typeclass into Haskellable and ObjCable. This is very useful if one has types that only can be defined as an instance of one typeclass. A one-way conversion.

    For the primitive instances I've listed in this post I always was able to create both conversion functions, toId and fromId. But right now I am working on a simple instance for function types and it might make sense to only define the ObjCable instance…

    ReplyDelete
  3. Hubris isn't that much more sophisticated, really. You could probably steal a couple of the tricks - HINT was really helpful in automatically working out what was wrappable: I tend to think that the less boilerplate you have to write, the more likely you are to actually cross the border and use the libraries on the other side.

    I think Manuel Chakravarty was working on (or at least interested in) a low level ObjC bridge - something at the FFI level.

    ReplyDelete
  4. oh, and I'm not completely sure how maintained it is, but Andre Pang and Wolfgang Thaller worked on this problem a while ago: http://code.google.com/p/hoc/

    ReplyDelete
  5. Oh yes, I know about HOC and the ObjC FFI project. Actually I started with HOC but had some problems building it. After trying some time, I decided it would be fun to try to create some lightweight version of a Haskell / Objective-C interface myself. The HOC codebase is a great place to learn, although I don't feel like delving into Template Haskell at the moment.

    The Objective-C FFI project seems to be dormant right now. At least the homepage http://hackage.haskell.org/trac/ghc/wiki/ObjectiveC has not been updated for quite some time.

    ReplyDelete