status
failed

Practical Machines in 60 Seconds

At last I have gotten round to writing a blog post about the criminally underused machines library written by the terrifyingly productive Edward Kmett.

This is a very simple demonstration of usage, with a focus on machines using the IO monad. I will not cover how the library works (because I don’t know).

Let us begin with imports:

What’s a machine?

The docs speak:

Machines are demand driven input sources like pipes or conduits, but can support multiple inputs. You design a Machine by writing a Plan. You then construct the machine.

Here is a plan, which is constructed into a machine:

It output “hello”, then “world”, and then it stops. SourceT means it is a special case of ‘Machine’ with no inputs, m says it has effects in the Monad m, and String says it outputs values of type String.

If it were a unix utility, running it would look like this:

fred@forte~> helloPlan
hello
world
fred@forte~> 

We will make it into a unix utility shortly, using…

The Printer

A machine that takes String inputs, and prints each one it receives to the console. It never outputs anything.

Note its type:

  • ProcessT means it can both input and output. Any SourceT is also a ProcessT, accepting any kind of input.
  • MonadIO m means that side effects can be in IO
  • String is the input type.
  • () is the output type. This could also be any variable, but we use () to indicate it has no meaning.

Note also the use of repeatedly. This is an alternative behaviour to construct, which builds a machine constantly repeating its plan: once stopped, it will always begin the plan again from the start.

The “Utility”

We can compose this program into a machine that prints “hello world” to the console and outputs nothing. Imagine the ~> operator representing data flow.

Now we will enter the real world, and run our machine using runT_, whose type is:

runT_ :: Monad m => MachineT m k b -> m ()

This runs a machine. It causes side effects to happen in the Monad m. It throws away the output of the Machine. If you want the outputs, you can get them as a list using runT

Now, running our program, we see:

Progress.

Progress

Let’s build a program. We have a faulty CSV file, and we want to find lines which have the wrong number of commas. Our file represents a list of people, ages, and jobs. Chopin has unfortunately too many jobs, and so is an invalid record. So too is Liu Yang, as her country is listed erroneously.

name,age,job
Frederic Chopin,39,musician, composer
Jane Smith,30,creative accountant
Johann Sebastian Bach,65,composer
Liu Yang,35,Astronaut,China

Our program will do the following:

  • Read a CSV file from stdin
  • Skip the header
  • Find lines with the wrong number of commas (i.e. not 2)
  • Print the count of bad lines

Starting in a new file, let’s import some things:

Read lines

Our first machine reads lines from stdin

Let’s generalise this Machine to ioSource: Produce values of type a using a program f until a monadic condition k is fulfilled.

Now we can write lineSource as simply

Skip the Header

Machine #2 must skip the first line. We don’t need to write it, it’s already in Machines

So our skipHeader machine is simply:

Count Commas

Next up: count the number of commas in each line. For this we can utilise BS.count, which counts the number of characters in a ByteString.

Note that instead of writing out this whole machine, we could just fmap the BS.count function over the previous machine.

Filter & Count bad lines

Finally, we filter out all the counts not equal to 2, and then count them. Filtering we can do with library functions:

Counting is easy too: the only new thing to note is the use of (<|>). That just means “if there’s nothing left to await, we’ll yield the current count and then stop”.

Putting it together

We already know how to compose machines, so it’s a simple task of extracting the count:

Running it:

Huzzah! Now, we did things a little verbosely here and there but hopefully it has demonstrated how you can get started using Machines. Hopefully a more real-world use case will be the subject of a future post.