Apache Hive

Related articles

Running Apache Hive 3, new features and tips and tricks

Running Apache Hive 3, new features and tips and tricks

Categories: Big Data, Business Intelligence, DataWorks Summit 2019 | Tags: Druid, Hive, Kafka, Cloudera, Data Warehouse, JDBC, LLAP, Active Directory, Release and features, Hadoop

Apache Hive 3 brings a bunch of new and nice features to the data warehouse. Unfortunately, like many major FOSS releases, it comes with a few bugs and not much documentation. It is available since…

By Gauthier LEONARD

Jul 25, 2019

Druid and Hive integration

Druid and Hive integration

Categories: Big Data, Business Intelligence, Tech Radar | Tags: Druid, Hive, Data Analytics, Learning and tutorial, LLAP, OLAP, SQL

This article covers the integration between Hive Interactive (LDAP) and Druid. One can see it as a complement of the Ultra-fast OLAP Analytics with Apache Hive and Druid article. Tools description…

By Pierre SAUVAGE

Jun 17, 2019

Publish Spark SQL DataFrame and RDD with Spark Thrift Server

Publish Spark SQL DataFrame and RDD with Spark Thrift Server

Categories: Data Engineering | Tags: Hive, Spark, Thrift, JDBC, Hadoop, SQL

The distributed and in-memory nature of the Spark engine makes it an excellent candidate to expose data to clients which expect low latencies. Dashboards, notebooks, BI studios, KPIs-based reports…

By Oskar RYNKIEWICZ

Mar 25, 2019

Apache Knox made easy!

Apache Knox made easy!

Categories: Big Data, Cyber Security, Adaltas Summit 2018 | Tags: Ambari, Hive, Knox, Ranger, Shiro, Solr, JDBC, Kerberos, LDAP, Active Directory, REST, SSL/TLS, Hadoop, SSO

Apache Knox is the secure entry point of a Hadoop cluster, but can it also be the entry point for my REST applications? Apache Knox overview Apache Knox is an application gateway for interacting in a…

By Michael HATOUM

Feb 4, 2019

Data Lake ingestion best practices

Data Lake ingestion best practices

Categories: Big Data, Data Engineering | Tags: Avro, Hive, NiFi, ORC, Spark, Data Lake, File Format, Data Governance, HDF, Operation, Protocol Buffers, Registry, Schema

Creating a Data Lake requires rigor and experience. Here are some good practices around data ingestion both for batch and stream architectures that we recommend and implement with our customers…

By David WORMS

Jun 18, 2018

Accelerating query processing with materialized views in Apache Hive

Accelerating query processing with materialized views in Apache Hive

Categories: Business Intelligence, DataWorks Summit 2018 | Tags: Calcite, Druid, Hive, OLAP, Release and features, SQL

The new materialized view feature is coming in Apache Hive 3.0. Jesus Camacho Rodriguez from Hortonworks held a talk ”Accelerating query processing with materialized views in Apache Hive” about it…

By Paul-Adrien CORDONNIER

May 31, 2018

Present and future of Hadoop workflow scheduling: Oozie 5.x

Present and future of Hadoop workflow scheduling: Oozie 5.x

Categories: Big Data, DataWorks Summit 2018 | Tags: Hive, Oozie, Sqoop, CDH, HDP, REST, Hadoop

During the DataWorks Summit Europe 2018 in Berlin, I had the opportunity to attend a breakout session on Apache Oozie. It covers the new features released in Oozie 5.0, including future features of…

By Schoukroun LEO

May 23, 2018

Essential questions about Time Series

Essential questions about Time Series

Categories: Big Data | Tags: Druid, HBase, Hive, ORC, Elasticsearch, Graphana, IOT

Today, the bulk of Big Data is temporal. We see it in the media and among our customers: smart meters, banking transactions, smart factories, connected vehicles … IoT and Big Data go hand in hand. We…

By David WORMS

Mar 19, 2018

MariaDB integration with Hadoop

MariaDB integration with Hadoop

Categories: Infrastructure | Tags: Hive, Database, HA, MariaDB, Hadoop

During a workshop with one of our customers, Adaltas has identified a potential risk to use MariaDB’s High Availability (HA) strategy. Since the customer selected Cloudera’s CDH 5 distribution, the…

By David WORMS

Jul 31, 2017

Oracle DB synchrnozation to Hadoop with CDC

Oracle DB synchrnozation to Hadoop with CDC

Categories: Data Engineering | Tags: Hive, Sqoop, CDC, Data Warehouse, GoldenGate, Oracle

This note is the result of a discussion about the synchronization of data written in a database to a warehouse stored in Hadoop. Thanks to Claude Daub from GFI who wrote it and who authorizes us to…

By David WORMS

Jul 31, 2017

Hive Metastore HA with DBTokenStore: Failed to initialize master key

Hive Metastore HA with DBTokenStore: Failed to initialize master key

Categories: Big Data, DevOps & SRE | Tags: Hive, Bug, Infrastructure

This article describes my little adventure around a startup error with the Hive Metastore. It shall be reproducable with any secure installation, meaning with Kerberos, with high availability enabled…

By David WORMS

Jul 21, 2016

Hive, Calcite and Druid

Hive, Calcite and Druid

Categories: Big Data | Tags: Analytics, Druid, Hive, Database, Hadoop

BI/OLAP requires interactive visualization of complex data streams: Real time bidding events User activity streams Voice call logs Network trafic flows Firewall events Application KPIs Traditionnal…

By David WORMS

Jul 14, 2016

Composants for CDH and HDP

Composants for CDH and HDP

Categories: Big Data | Tags: Flume, Hive, Oozie, Sqoop, Zookeeper, Cloudera, CDH, Hortonworks, HDP, Hadoop

I was interested to compare the different components distributed by Cloudera and HortonWorks. This also gives us an idea of the versions packaged by the two distributions. At the time of this writting…

By David WORMS

Sep 22, 2013

Splitting HDFS files into multiple hive tables

Splitting HDFS files into multiple hive tables

Categories: Data Engineering | Tags: Flume, HDFS, Hive, Oozie, Pig, SQL

I am going to show how to split a CSV file stored inside HDFS as multiple Hive tables based on the content of each record. The context is simple. We are using Flume to collect logs from all over our…

By David WORMS

Sep 15, 2013

Oracle and Hive, how data are published?

Oracle and Hive, how data are published?

Categories: Big Data | Tags: Hive, Sqoop, Data Lake, Oracle

In the past few days, I’ve published 3 related articles: a first one covering the option to integrate Oracle and Hadoop, a second one explaining how to install and use the Oracle SQL Connector with…

By David WORMS

Jul 6, 2013

Oracle to Apache Hive with the Oracle SQL Connector

Oracle to Apache Hive with the Oracle SQL Connector

Categories: Business Intelligence | Tags: HDFS, Hive, Network, Oracle

In a previous article published last week, I introduced the choices available to connect Oracle and Hadoop. In a follow up article, I covered the Oracle SQL Connector, its installation and integration…

By David WORMS

May 27, 2013

Options to connect and integrate Hadoop with Oracle

Options to connect and integrate Hadoop with Oracle

Categories: Data Engineering | Tags: Avro, HDFS, Hive, MapReduce, Sqoop, Database, Java, NoSQL, Oracle, R, RDBMS, SQL

I will list the different tools and libraries available to us developers in order to integrate Oracle and Hadoop. The Oracle SQL Connector for HDFS described below is covered in a follow up article…

By David WORMS

May 15, 2013

Apache Hive Essentials How-to by Darren Lee

Apache Hive Essentials How-to by Darren Lee

Categories: Business Intelligence, Learning | Tags: Hive, File Format, UDF, Hadoop, SQL

Recently, I’ve been ask to review a new book on Apache Hive called “Apache Hive Essentials How-to” written by Darren Lee and published by Packt Publishing. To say it short, I sincerely recommend it. I…

By David WORMS

Apr 23, 2013

HDFS and Hive storage - comparing file formats and compression methods

HDFS and Hive storage - comparing file formats and compression methods

Categories: Big Data | Tags: Analytics, HBase, HDFS, Hive, ORC, Parquet, File Format

A few days ago, we have conducted a test in order to compare various Hive file formats and compression methods. Among those file formats, some are native to HDFS and apply to all Hadoop users. The…

By David WORMS

Mar 13, 2012

Two Hive UDAF to convert an aggregation to a map

Two Hive UDAF to convert an aggregation to a map

Categories: Data Engineering | Tags: Analytics, HDFS, Hive, ORC, Parquet

I am publishing two new Hive UDAF to help with maps in Apache Hive. The source code is available on GitHub in two Java classes: “UDAFToMap” and “UDAFToOrderedMap” or you can download the jar file. The…

By David WORMS

Mar 6, 2012

Timeseries storage in Hadoop and Hive

Timeseries storage in Hadoop and Hive

Categories: Data Engineering | Tags: HDFS, Hive, CRM, File Format, timeseries, Tuning, Hadoop

In the next few weeks, we will be exploring the storage and analytic of a large generated dataset. This dataset is composed of CRM tables associated to one timeserie table of about 7,000 billiard rows…

By David WORMS

Jan 10, 2012

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat
Canada

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.