Elasticsearch

Elasticsearch is an open source analytics, storage and search engine developed by Elasticsearch B.V. and first released in 2010. It's a distributed software written in Java and built on top of Apache Lucene.The latter is used for indexing and searching data via a REST API.

It is often used with Kibana, a data visualization platform, and Logstash, a data processing pipeline, which are tools developed and maintained by the same company. Together they form what's referred to as the ELK Stack.

Grafana, while not part of the ELK stack, is another open source tool often used with with Elasticsearch for visualizing metrics such as memory, CPU usage and system I/O.

Elasticsearch provides complex search functionality, such as auto-completion, handling synonyms or even correcting typos. But, it is also used as an analytics platform by querying structured data for instance:

Analyzing application logs and system metrics
Send events to Elasticsearch
Forecast future values with machine learning and anomality detection.

Since Elasticsearch is distributed by nature, it scales very well in terms of increasing data volumes and query throughput.

Learn more: Official website
Related tags: Grafana; Kibana; Logstash

Yahoo's Vespa Engine

Categories: Tech Radar | Tags: Database, Tools, Elasticsearch, Search Engine

Vespa is Yahoo’s fully autonomous and self-sufficient big data processing and serving engine. It aims at serving results of queries on huge amounts of data in real time. An example of this would be…

By Arthur BUSSER

Oct 16, 2017

Execute Python in an Oozie workflow

Categories: Data Engineering | Tags: Oozie, Elasticsearch, Python, REST

Oozie workflows allow you to use multiple actions to execute code, however doing so with Python can be a bit tricky, let’s see how to do that. I’ve recently designed a workflow that would interact…

By César BEREZOWSKI

Mar 6, 2018

Essential questions about Time Series

Categories: Big Data | Tags: Druid, HBase, Hive, ORC, Data Science, Elasticsearch, Grafana, IOT

Today, the bulk of Big Data is temporal. We see it in the media and among our customers: smart meters, banking transactions, smart factories, connected vehicles … IoT and Big Data go hand in hand. We…

By David WORMS

Mar 18, 2018

Apache Metron in the Real World

Categories: Cyber Security, DataWorks Summit 2018 | Tags: Algorithm, Solr, Storm, pcap, RDBMS, HDFS, Kafka, Metron, NiFi, Spark, Data Science, Elasticsearch, SQL

Apache Metron is a storage and analytic platform specialized in cyber security. This talk was about demonstrating the usages and capabilities of Apache Metron in the real world. The presentation was…

By Michael HATOUM

May 29, 2018

Deploying a secured Flink cluster on Kubernetes

Categories: Big Data | Tags: Encryption, Flink, HDFS, Kafka, Elasticsearch, Kerberos, SSL/TLS

When deploying secured Flink applications inside Kubernetes, you are faced with two choices. Assuming your Kubernetes is secure, you may rely on the underlying platform or rely on Flink native…

By David WORMS

Oct 8, 2018

Monitoring a production Hadoop cluster with Kubernetes

Categories: DevOps & SRE | Tags: Thrift, Shinken, Hadoop, Knox, Cluster, Docker, Elasticsearch, Grafana, Kubernetes, Node, Node.js, Prometheus, Python

Monitoring a production grade Hadoop cluster is a real challenge and needs to be constantly evolving. The software we use today is based on Nagios. Very efficient when it comes to the simplest…

By Paul-Adrien CORDONNIER

Dec 21, 2018

Internship Data Science & Data Engineer - ML in production and streaming data ingestion

Categories: Data Engineering, Data Science | Tags: DevOps, Flink, Hadoop, HBase, Kafka, Spark, Internship, Kubernetes, Python

Context The exponential evolution of data has turned the industry upside down by redefining data storage, processing and data ingestion pipelines. Mastering these methods considerably facilitates…

By David WORMS

Nov 26, 2019

Logstash pipelines remote configuration and self-indexing

Categories: Data Engineering, Infrastructure | Tags: Docker, Elasticsearch, Kibana, Logstash, Log4j

Logstash is a powerful data collection engine that integrates in the Elastic Stack (Elasticsearch - Logstash - Kibana). The goal of this article is to show you how to deploy a fully managed Logstash…

By Paul-Adrien CORDONNIER

Dec 13, 2019

Internship in Data Engineering

Categories: Front End, Learning | Tags: Metrics, Monitoring, Hive, Kafka, Delta Lake, Elasticsearch, IaC, Internship, Kubernetes, Streaming

Job Description Data is a valuable business asset. Some call it the new oil. The data engineer collects, transform and refine raw data into information that can be used by business analysts and data…

By David WORMS

Oct 25, 2021

Spring 2022 internship - building a Data Lab

Categories: Data Science, Learning | Tags: MongoDB, Spark, Argo CD, Elasticsearch, Internship, Keycloak, Kubernetes, OpenID Connect, PostgreSQL

Job Description Over the last few years, we developed the ability to use computers to process large amounts of data. The ecosystem evolved over a large offering of tools and libraries and the creation…

By David WORMS

Nov 24, 2021

Elasticsearch

Related articles