Apache HBase

HBase is a column-oriented NoSQL database that is part of the Hadoop ecosystem. It is an open source distributed database specialized in Big Data storage that provides access with low latency and strong concurrency. The storage is optimized to provide access to the values through a key. Keys are sorted which provide the ability to query from one key to another (Range Query). The data is written to HDFS, which ensures replication. A HBase database consists of a master and workers, following the same principle as Hadoop. Each HBase worker has a single HRegionServer through which the data travels. It is the HRegionServer that will manage the storage of data within a single machine. The data stored in HBase is encapsulated in HRegions that correspond to a set of files from the same table (the HFiles). These HRegions are managed by the HRegionServer. Each machine can have one or more HRegions.

Related articles

Internship Data Science & Data Engineer - ML in production and streaming data ingestion

Internship Data Science & Data Engineer - ML in production and streaming data ingestion

Categories: Data Engineering, Data Science | Tags: Flink, Kafka, Spark, DevOps, Kubernetes, Hadoop, HBase, Python

Context The exponential evolution of data has turned the industry upside down by redefining data storage, processing and data ingestion pipelines. Mastering these methods considerably facilitates…

By David WORMS

Nov 26, 2019

Omid: Scalable and highly available transaction processing for Apache Phoenix

Omid: Scalable and highly available transaction processing for Apache Phoenix

Categories: Big Data, DataWorks Summit 2018 | Tags: ACID, Omid, Phoenix, Transaction, HBase, SQL

Apache Omid provides a transactional layer on top of key/value NoSQL databases. In practice, it is usually used on top of Apache HBase. Credits to Ohad Shacham for his talk and his work for Apache…

By Xavier HERMAND

May 24, 2018

Essential questions about Time Series

Essential questions about Time Series

Categories: Big Data | Tags: Druid, Hive, ORC, Elasticsearch, Graphana, IOT, HBase

Today, the bulk of Big Data is temporal. We see it in the media and among our customers: smart meters, banking transactions, smart factories, connected vehicles … IoT and Big Data go hand in hand. We…

By David WORMS

Mar 19, 2018

Hadoop and R with RHadoop

Hadoop and R with RHadoop

Categories: Business Intelligence, Data Science | Tags: HDFS, MapReduce, Thrift, Data Analytics, Learning and tutorial, R, Hadoop, HBase

RHadoop is a bridge between R, a language and environment to statistically explore data sets, and Hadoop, a framework that allows for the distributed processing of large data sets across clusters of…

By David WORMS

Jul 19, 2012

Two Hive UDAF to convert an aggregation to a map

Two Hive UDAF to convert an aggregation to a map

Categories: Data Engineering | Tags: Hive, File Format, Java, HBase

I am publishing two new Hive UDAF to help with maps in Apache Hive. The source code is available on GitHub in two Java classes: “UDAFToMap” and “UDAFToOrderedMap” or you can download the jar file. The…

By David WORMS

Mar 6, 2012

Hadoop and HBase installation on OSX in pseudo-distributed mode

Hadoop and HBase installation on OSX in pseudo-distributed mode

Categories: Big Data, Learning | Tags: Big Data, Hue, Deployment, Infrastructure, Hadoop, HBase

The operating system chosen is OSX but the procedure is not so different for any Unix environment because most of the software is downloaded from the Internet, uncompressed and set manually. Only a…

By David WORMS

Dec 1, 2010

Node HBase, a NodeJs client for Apache HBase

Node HBase, a NodeJs client for Apache HBase

Categories: Big Data, Node.js | Tags: Big Data, Node.js, REST, HBase

HBase is a “column familly” database from the Hadoop ecosystem built on the model of Google BigTable. HBase can accommodate very large volumes of data (tera or peta) while maintaining high…

By David WORMS

Nov 1, 2010

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat
Canada

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.