Clustering

A cluster represents a group of servers seen as a single one to obtain more power and availability.

Several architectures exist, the most common being the so-called "active/active" one in which each server is permanently ready to work. This architecture requires a load distribution that can be static or dynamic. Requests are then distributed according to precise rules (static) or according to a scheduling algorithm (dynamic).

The deployment of clusters includes notions of fault tolerance such as the transfer of a server's process in the event of its failure, or the ability to integrate servers into a cluster without having to restart it completely.

Learn more
Wikipedia

Related articles

Optimization of Spark applications in Hadoop YARN

Optimization of Spark applications in Hadoop YARN

Categories: Data Engineering, Learning | Tags: Tuning, Hadoop, Spark, Python

Apache Spark is an in-memory data processing tool widely used in companies to deal with Big Data issues. Running a Spark application in production requires user-defined resources. This articleā€¦

Ferdinand DE BAECQUE

By Ferdinand DE BAECQUE

Mar 30, 2020

Spark Streaming part 4: clustering with Spark MLlib

Spark Streaming part 4: clustering with Spark MLlib

Categories: Data Engineering, Data Science, Learning | Tags: Spark, Apache Spark Streaming, Big Data, Clustering, Machine Learning, Scala, Streaming

Spark MLlib is an Apacheā€™s Spark library offering scalable implementations of various supervised and unsupervised Machine Learning algorithms. Thus, Spark framework can serve as a platform forā€¦

Oskar RYNKIEWICZ

By Oskar RYNKIEWICZ

Jun 27, 2019

A CoreOS development cluster with Vagrant and VirtualBox

A CoreOS development cluster with Vagrant and VirtualBox

Categories: Hack, Infrastructure | Tags: Arch Linux, CoreOS, Linux, VirtualBox, etcd, Vagrant

Following CoreOSā€™s instructions on how to set up a development environment in VirtualBox did not work out well for me. Here are the steps I followed to get Container Linux up and running with Vagrantā€¦

Arthur BUSSER

By Arthur BUSSER

Jun 20, 2018

Advanced multi-tenant Hadoop and Zookeeper protection

Advanced multi-tenant Hadoop and Zookeeper protection

Categories: Big Data, Infrastructure | Tags: DoS, iptables, Operation, Scalability, Zookeeper, Clustering, Consensus

Zookeeper is a critical component to Hadoopā€™s high availability operation. The latter protects itself by limiting the number of maximum connections (maxConns = 400). However Zookeeper does not protectā€¦

Pierre SAUVAGE

By Pierre SAUVAGE

Jul 5, 2017

Definitions of machine learning algorithms present in Apache Mahout

Definitions of machine learning algorithms present in Apache Mahout

Categories: Data Science | Tags: Algorithm, Š”lassification, Hadoop, Mahout, Clustering, Machine Learning

Apache Mahout is a machine learning library built for scalability. Its core algorithms for clustering, classfication and batch based collaborative filtering are implemented on top of Apache Hadoopā€¦

David WORMS

By David WORMS

Mar 8, 2013

Canada - Morocco - France

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Scienceā€¦

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.

Support Ukrain