status
failed

Practical Machines in 60 Seconds

At last I have gotten round to writing a blog post about the criminally underused machines library written by the terrifyingly productive Edward Kmett.

This is a very simple demonstration of usage, with a focus on machines using the IO monad. I will not cover how the library works (because I don’t know).

Let us begin with imports:

module Main where

import Control.Monad.IO.Class (MonadIO, liftIO)
import Data.Machine

What’s a machine?

The docs speak:

Machines are demand driven input sources like pipes or conduits, but can support multiple inputs. You design a Machine by writing a Plan. You then construct the machine.

Here is a plan, which is constructed into a machine:

-- | Plan to build a helloMachine
helloPlan :: Plan k String ()
helloPlan = do
  yield "Hello"
  yield "World"
  stop

-- | Construct the plan
helloMachine :: Monad m => SourceT m String
helloMachine = construct helloPlan

It output “hello”, then “world”, and then it stops. SourceT means it is a special case of ‘Machine’ with no inputs, m says it has effects in the Monad m, and String says it outputs values of type String.

If it were a unix utility, running it would look like this:

fred@forte~> helloPlan
hello
world
fred@forte~> 

We will make it into a unix utility shortly, using…

The Printer

A machine that takes String inputs, and prints each one it receives to the console. It never outputs anything.

printer :: (MonadIO m) => ProcessT m String ()
printer = repeatedly $ do
  value <- await
  liftIO $ putStrLn value

Note its type:

  • ProcessT means it can both input and output. Any SourceT is also a ProcessT, accepting any kind of input.
  • MonadIO m means that side effects can be in IO
  • String is the input type.
  • () is the output type. This could also be any variable, but we use () to indicate it has no meaning.

Note also the use of repeatedly. This is an alternative behaviour to construct, which builds a machine constantly repeating its plan: once stopped, it will always begin the plan again from the start.

The “Utility”

We can compose this program into a machine that prints “hello world” to the console and outputs nothing. Imagine the ~> operator representing data flow.

composed :: MonadIO m => SourceT m ()
composed = helloMachine ~> printer

Now we will enter the real world, and run our machine using runT_, whose type is:

runT_ :: Monad m => MachineT m k b -> m ()

This runs a machine. It causes side effects to happen in the Monad m. It throws away the output of the Machine. If you want the outputs, you can get them as a list using runT

main :: IO ()
main = runT_ composed

Now, running our program, we see:

liu@tiangong~> runhaskell Main.hs
hello
world
liu@tiangong~> 

Progress.

Progress

Let’s build a program. We have a faulty CSV file, and we want to find lines which have the wrong number of commas. Our file represents a list of people, ages, and jobs. Chopin has unfortunately too many jobs, and so is an invalid record. So too is Liu Yang, as her country is listed erroneously.

name,age,job
Frederic Chopin,39,musician, composer
Jane Smith,30,creative accountant
Johann Sebastian Bach,65,composer
Liu Yang,35,Astronaut,China

Our program will do the following:

  • Read a CSV file from stdin
  • Skip the header
  • Find lines with the wrong number of commas (i.e. not 2)
  • Print the count of bad lines

Starting in a new file, let’s import some things:

{-# LANGUAGE RankNTypes #-}
module Machines where

import Control.Applicative
import Control.Monad (unless)
import Control.Monad.IO.Class (MonadIO, liftIO)
import Data.Machine

import Data.Char
import Data.ByteString (ByteString)
import qualified Data.ByteString as BS
import System.IO (isEOF)

Read lines

Our first machine reads lines from stdin

lineSource' :: MonadIO m => SourceT m ByteString
lineSource' = construct $ go where
  go = do
    cond <- liftIO isEOF
    unless cond $ do
      line <- liftIO BS.getLine
      yield line
      go

Let’s generalise this Machine to ioSource: Produce values of type a using a program f until a monadic condition k is fulfilled.

ioSource :: MonadIO m => IO Bool -> IO a -> SourceT m a
ioSource k f = construct go where
  go = do
    cond <- liftIO k
    unless cond $ do
      liftIO f >>= yield >> go

Now we can write lineSource as simply

lineSource :: MonadIO m => SourceT m ByteString
lineSource = ioSource isEOF BS.getLine

Skip the Header

Machine #2 must skip the first line. We don’t need to write it, it’s already in Machines

So our skipHeader machine is simply:

skipHeader :: Monad m => ProcessT m a a
skipHeader = dropping 1

Count Commas

Next up: count the number of commas in each line. For this we can utilise BS.count, which counts the number of characters in a ByteString.

countCommas :: Monad m => ProcessT m BS.ByteString Int
countCommas = repeatedly $ do
  line <- await
  yield (BS.count comma line)
  where comma = fromIntegral (ord ',')

Note that instead of writing out this whole machine, we could just fmap the BS.count function over the previous machine.

Filter & Count bad lines

Finally, we filter out all the counts not equal to 2, and then count them. Filtering we can do with library functions:

filterBad :: Monad m => ProcessT m Int Int
filterBad = filtered (/=2)

Counting is easy too: the only new thing to note is the use of (<|>). That just means “if there’s nothing left to await, we’ll yield the current count and then stop”.

count :: Monad m => ProcessT m a Int
count = construct (go 0) where
  go n = do
    x <- await <|> (yield n *> stop)
    go (n + 1)

Putting it together

We already know how to compose machines, so it’s a simple task of extracting the count:

main :: IO ()
main = do
  n <- runT pipeline
  -- Remember that runT collects outputs in a list
  print (head n)
  where pipeline = lineSource ~> skipHeader ~> countCommas ~>
                   filterBad ~> count

Running it:

bach@clavier~> cat test.csv | runhaskell CsvMachines.hs
2
bach@clavier~>

Huzzah! Now, we did things a little verbosely here and there but hopefully it has demonstrated how you can get started using Machines. Hopefully a more real-world use case will be the subject of a future post.