EclairJS - Putting a Spark in Web Apps
By David WORMS
Jul 17, 2016
- Categories
- Data Engineering
- Front End
- Tags
- Jupyter
- Spark
- JavaScript
Never miss our publications about Open Source, big data and distributed systems, low frequency of one email every two months.
Presentation by David Fallside from IBM, images extracted from the presentation.
Introduction
Web Apps development has moved from Java to NodeJS and Javascript. It provides a simple and rich environment with NPM.
EclairJS is a NodeJS library that provides bindings to a Spark application:
- An RDD is bound to a JS object that is made immutable
- Spark operators are transparently mapped to JS functions (ex: flatMap, filter, …)
- Every Spark operator mapped returns a promise
The use of promises allows to emulate Spark’s use of the DAG:
- Transformations return a new object and are added to the DAG
- Actions executes the whole DAG to get a result
Architecture
EclairJS has two main components:
- Client: JS API, installed with NPM
- Server: JS providing Java mapping and able to run in the JVM using Oracle Nashorn, has to be run
The server also uses Jupyter Notebook to provide a WebSocket endpoint between client and server
Performance
In terms of performances, Spark’s native Java API is way faster, however EclairJS is twice as fast as Spark’s PySpark API.
Conclusion
EclairJS seems to be a great project if you need to integrate Spark jobs into a web application.