Apache Hive

Apache Hive is a fault tolerant distributed data warehouse system built on top of Hadoop which uses a SQL type langage called HiveSQL for reading, writing, and analyzing large datasets. Hive supports Online Analytical Processing (OLAP) and was not designed for Online Transaction Processing (OLTP).

Hive enables developers and users to use SQL-like syntax and features for extract/transform/loading (ETL), reporting and data analytics. Data can then be stored in various formats in various Hadoop databases. HiveQL queries are translated into the format required for the database system. Hive provides standard operations such as filters, joins or aggregations.

Unlike relational databases, Hive does not use the schema-on-write (SoW) approach, but uses the schema-on-read (SoR) approach.

Data is always stored as is in Hadoop and is only checked against a specific schema when requested. This gives the opportunity to load data significantly faster. Also, different schemas can be applied to the same database.

Learn more: Official website
Related tags: Apache HBase; Delta Lake

CDP part 6: end-to-end data lakehouse ingestion pipeline with CDP

Categories: Big Data, Data Engineering, Learning | Tags: NiFi, Business intelligence, Data Engineering, Iceberg, Spark, Big Data, Cloudera, CDP, Data Analytics, Data Lake, Data Warehouse

In this hands-on lab session we demonstrate how to build an end-to-end big data solution with Cloudera Data Platform (CDP) Public Cloud, using the infrastructure we have deployed and configured over…

By Tobias CHAVARRIA

Jul 24, 2023

Comparison of database architectures: data warehouse, data lake and data lakehouse

Categories: Big Data, Data Engineering | Tags: Data Governance, Infrastructure, Iceberg, Parquet, Spark, Data Lake, Data lakehouse, Data Warehouse, File Format

Database architectures have experienced constant innovation, evolving with the appearence of new use cases, technical constraints, and requirements. From the three database structures we are comparing…

By Gonzalo ETSE

May 17, 2022

Internship in Big Data infrastructure with TDP

Categories: Infrastructure, Learning | Tags: Cyber Security, DevOps, Java, Hadoop, IaC, Internship, TDP

Job Description Big Data and distributed computing is at Adaltas’ core. We support our partners in the deployment, maintenance and optimization of some of France’s largest clusters. Adaltas is also an…

By Daniel HARTY

Oct 25, 2021

H2O in practice: a protocol combining AutoML with traditional modeling approaches

Categories: Data Science, Learning | Tags: Automation, Cloud, H2O, Machine Learning, MLOps, On-premises, Open source, Python, XGBoost

H20 comes with a lot of functionalities. The second part of the series H2O in practice proposes a protocol to combine AutoML modeling with traditional modeling and optimization approach. The objective…

By Petra KAFERLE DEVISSCHERE

Nov 12, 2021

Internship in Data Engineering

Categories: Front End, Learning | Tags: Metrics, Monitoring, Hive, Kafka, Delta Lake, Elasticsearch, IaC, Internship, Kubernetes, Streaming

Job Description Data is a valuable business asset. Some call it the new oil. The data engineer collects, transform and refine raw data into information that can be used by business analysts and data…

Apache Hive

Related articles