Practical Machines in 60 Seconds
At last I have gotten round to writing a blog post about the criminally underused machines library written by the terrifyingly productive Edward Kmett.
This is a very simple demonstration of usage, with a focus on machines using the IO monad. I will not cover how the library works (because I don’t know).
Let us begin with imports:
module Main where
import Control.Monad.IO.Class (MonadIO, liftIO)
import Data.Machine
What’s a machine?
The docs speak:
Machines are demand driven input sources like pipes or conduits, but can support multiple inputs. You design a
Machine
by writing aPlan
. You thenconstruct
the machine.
Here is a plan, which is construct
ed into a machine:
-- | Plan to build a helloMachine
helloPlan :: Plan k String ()
= do
helloPlan "Hello"
yield "World"
yield
stop
-- | Construct the plan
helloMachine :: Monad m => SourceT m String
= construct helloPlan helloMachine
It output “hello”, then “world”, and then it stops. SourceT
means
it is a special case of ‘Machine’ with no inputs, m
says it has
effects in the Monad m
, and String
says it outputs values of type
String.
If it were a unix utility, running it would look like this:
fred@forte~> helloPlan
hello
world
fred@forte~>
We will make it into a unix utility shortly, using…
The Printer
A machine that takes String inputs, and prints each one it receives to the console. It never outputs anything.
printer :: (MonadIO m) => ProcessT m String ()
= repeatedly $ do
printer <- await
value $ putStrLn value liftIO
Note its type:
ProcessT
means it can both input and output. AnySourceT
is also aProcessT
, accepting any kind of input.MonadIO m
means that side effects can be in IOString
is the input type.()
is the output type. This could also be any variable, but we use()
to indicate it has no meaning.
Note also the use of repeatedly
. This is an alternative behaviour
to construct
, which builds a machine constantly repeating its plan:
once stopped, it will always begin the plan again from the start.
The “Utility”
We can compose this program into a machine that prints “hello world” to
the console and outputs nothing. Imagine the ~>
operator representing
data flow.
composed :: MonadIO m => SourceT m ()
= helloMachine ~> printer composed
Now we will enter the real world, and run our machine using runT_
,
whose type is:
runT_ :: Monad m => MachineT m k b -> m ()
This runs a machine. It causes side effects to happen in the Monad
m
. It throws away the output of the Machine
. If you want the
outputs, you can get them as a list using runT
main :: IO ()
= runT_ composed main
Now, running our program, we see:
@tiangong~> runhaskell Main.hs
liu
hello
world@tiangong~> liu
Progress.
Progress
Let’s build a program. We have a faulty CSV file, and we want to find lines which have the wrong number of commas. Our file represents a list of people, ages, and jobs. Chopin has unfortunately too many jobs, and so is an invalid record. So too is Liu Yang, as her country is listed erroneously.
name,age,job
Frederic Chopin,39,musician, composer
Jane Smith,30,creative accountant
Johann Sebastian Bach,65,composer
Liu Yang,35,Astronaut,China
Our program will do the following:
- Read a CSV file from stdin
- Skip the header
- Find lines with the wrong number of commas (i.e. not 2)
- Print the count of bad lines
Starting in a new file, let’s import some things:
{-# LANGUAGE RankNTypes #-}
module Machines where
import Control.Applicative
import Control.Monad (unless)
import Control.Monad.IO.Class (MonadIO, liftIO)
import Data.Machine
import Data.Char
import Data.ByteString (ByteString)
import qualified Data.ByteString as BS
import System.IO (isEOF)
Read lines
Our first machine reads lines from stdin
lineSource' :: MonadIO m => SourceT m ByteString
= construct $ go where
lineSource' = do
go <- liftIO isEOF
cond $ do
unless cond <- liftIO BS.getLine
line
yield line go
Let’s generalise this Machine
to ioSource
: Produce values of type
a
using a program f
until a monadic condition k
is fulfilled.
ioSource :: MonadIO m => IO Bool -> IO a -> SourceT m a
= construct go where
ioSource k f = do
go <- liftIO k
cond $ do
unless cond >>= yield >> go liftIO f
Now we can write lineSource
as simply
lineSource :: MonadIO m => SourceT m ByteString
= ioSource isEOF BS.getLine lineSource
Skip the Header
Machine #2 must skip the first line. We don’t need to write it, it’s
already in Machines
So our skipHeader machine is simply:
skipHeader :: Monad m => ProcessT m a a
= dropping 1 skipHeader
Count Commas
Next up: count the number of commas in each line. For this we can
utilise BS.count
, which counts the number of characters in a
ByteString.
countCommas :: Monad m => ProcessT m BS.ByteString Int
= repeatedly $ do
countCommas <- await
line
yield (BS.count comma line)where comma = fromIntegral (ord ',')
Note that instead of writing out this whole machine, we could just
fmap
the BS.count
function over the previous machine.
Filter & Count bad lines
Finally, we filter out all the counts not equal to 2, and then count them. Filtering we can do with library functions:
filterBad :: Monad m => ProcessT m Int Int
= filtered (/=2) filterBad
Counting is easy too: the only new thing to note is the use of
(<|>)
. That just means “if there’s nothing left to await, we’ll
yield the current count and then stop”.
count :: Monad m => ProcessT m a Int
= construct (go 0) where
count = do
go n <- await <|> (yield n *> stop)
x + 1) go (n
Putting it together
We already know how to compose machines, so it’s a simple task of extracting the count:
main :: IO ()
= do
main <- runT pipeline
n -- Remember that runT collects outputs in a list
print (head n)
where pipeline = lineSource ~> skipHeader ~> countCommas ~>
~> count filterBad
Running it:
@clavier~> cat test.csv | runhaskell CsvMachines.hs
bach2
@clavier~> bach
Huzzah! Now, we did things a little verbosely here and there but
hopefully it has demonstrated how you can get started using
Machines
. Hopefully a more real-world use case will be the subject
of a future post.