Asynchronous array iteration in Node.js with Each

Asynchronous array iteration in Node.js with Each

Do you like our work......we hire!

Never miss our publications about Open Source, big data and distributed systems, low frequency of one email every two months.

Control flow in Node.js is the sort of library for which almost all the developers have created and publish their own libraries. They usually aim at reducing spaghetti codes made of deep callbacks. I’m no exception to the rule. After a year and a half of intensive usage, I feel like it’s about time to present Each, my own control flow library.

Well, to be exact, it isn’t a control flow library in the traditional sense. There is no such mechanism to chain and control functions. It came from my intensive need to traverse arrays and call asynchronous code on each of their elements. Think about Array.prototype.forEach on steroids.

A simple example

Let’s say that we need to create 3 directories. This operation may be run in parallel and may be composed of 3 sub-operations: check if directory exists, create the directory and make permissions.

Here’s the code:

each([
  '/data/1/my_dir'
  '/data/2/my_dir'
  '/data/3/my_dir'
])
.on 'item', (dir, next) ->
  fs.stat dir, (err, stat) ->
    return next() if stat
    fs.mkdir dir, (err) ->
      next err
.on 'error', (err) ->
  console.error err.message
.on 'end', ->
  console.log 'Success'

As you can see, Each borrow its API from the Event Emitter and Stream modules in Node.js.

Why I don’t need control flow libraries

Seems awkward to start this way but since Each is partially a Node.js control flow library, it feels important to explain why it doesn’t answer all the needs and why I don’t use any existing library to complement Each.

Asynchronous programming is great but in Node.js and Javasript, it leads to unaesthetic code in which callbacks are calling more callbacks, often called spaghetti code.

Let’s get back to our example above. One way to limit the depth of the code is by isolating the directory creation process into a single function:

create = (dir, callback) ->
  fs.stat dir, (err, stat) ->
    return next() if stat
    fs.mkdir dir, (err) ->
      next err

However, control flow libraries are not just useful at reducing code depth. They answer tricky problems as well. Let’s presume we need to create a file, whether the directory exists or not:

create = (file, content, callback) ->
  dir = path.dirname
  fs.stat dir, (err, stat) ->
    if stat
      fs.writeFile file, content, (err) ->
        callback err
    fs.mkdir dir, (err) ->
      fs.writeFile file, content, (err) ->
        callback err

Here, the code to write the file isn’t just redundant and ugly, it can become really hard when your code increase in complexity. After using different libraries, I finally came to the conclusion that the best approach to this problem was decomposing the code into small functions. Here’s how:

create = (file, content, callback) ->
  dir = path.dirname
  checkDir = ->
    fs.stat dir, (err, stat) ->
      unless stat then makeDir() else writeFile()
  makeDir = ->
    fs.mkdir dir, (err) ->
      return callback err if err
      writeFile()]
  writeFile =
      fs.writeFile file, content, (err) ->
        callback err
  checkDir()
.on('item', create)
.on 'error', (err) ->
  console.error err.message
.on 'end', ->
  console.log 'Success'

The result is a native JavaScript solution that is easy to read and efficient to run. The mecano source code is a good resource illustrating this pattern.

And why I needed Each

There is another useful usage of control flow libraries. They allow you to iterate asynchronously. There is no pretty and dead simple way to do achieve it in pure JavaScript. Things start to get excessively complicated when you need to deal with correct error handling or a limited number of concurrent tasks.

This is how I came up with Each. At the time, I was installing and running an Hadoop cluster and my tasks had to be distributed across the overall cluster. Things like starting processes, running distributed commands or collecting statistics were (and still are) run by Each and of course Node.js.

Over the time, the library has become extra flexible and ultra tested. The API is an Event Emitter API, classic of a Node.js library. It also partially borrows from the Stream API.

The more I used Each and the more I realised that my problems where not about calling functions asynchronously. Every time I was tempted to use a control flow library, I was in fact in the need to traverse arrays. Asynchronous array iteration is a complex process and Each solve it with elegancy. I invite you all to try Each and make it even better.

Again, the mecano source code (now Nikita) is an excellent source of inspiration if you which to see Each in action.

Thanks for reading. Please visit the source code on GitHub.

Share this article

Canada - Morocco - France

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.

Support Ukrain