Apache HBase
HBase is a column-oriented NoSQL database that is part of the Hadoop ecosystem. It is an open source distributed database specialized in Big Data storage that provides access with low latency and strong concurrency. The storage is optimized to provide access to the values through a key. Keys are sorted which provide the ability to query from one key to another (Range Query). The data is written to HDFS, which ensures replication. A HBase database consists of a master and workers, following the same principle as Hadoop. Each HBase worker has a single HRegionServer through which the data travels. It is the HRegionServer that will manage the storage of data within a single machine. The data stored in HBase is encapsulated in HRegions that correspond to a set of files from the same table (the HFiles). These HRegions are managed by the HRegionServer. Each machine can have one or more HRegions.
Related articles
Internship Data Science & Data Engineer - ML in production and streaming data ingestion
Categories: Data Engineering, Data Science | Tags: Flink, Kafka, Spark, DevOps, Kubernetes, Hadoop, HBase, Python
Context The exponential evolution of data has turned the industry upside down by redefining data storage, processing and data ingestion pipelines. Mastering these methods considerably facilitates…
By David WORMS
Nov 26, 2019
Omid: Scalable and highly available transaction processing for Apache Phoenix
Categories: Big Data, DataWorks Summit 2018 | Tags: ACID, Omid, Phoenix, Transaction, HBase, SQL
Apache Omid provides a transactional layer on top of key/value NoSQL databases. In practice, it is usually used on top of Apache HBase. Credits to Ohad Shacham for his talk and his work for Apache…
May 24, 2018
Essential questions about Time Series
Categories: Big Data | Tags: Druid, Hive, ORC, Elasticsearch, Graphana, IOT, HBase
Today, the bulk of Big Data is temporal. We see it in the media and among our customers: smart meters, banking transactions, smart factories, connected vehicles … IoT and Big Data go hand in hand. We…
By David WORMS
Mar 19, 2018
Hadoop and R with RHadoop
Categories: Business Intelligence, Data Science | Tags: HDFS, MapReduce, Thrift, Data Analytics, Learning and tutorial, R, Hadoop, HBase
RHadoop is a bridge between R, a language and environment to statistically explore data sets, and Hadoop, a framework that allows for the distributed processing of large data sets across clusters of…
By David WORMS
Jul 19, 2012
Two Hive UDAF to convert an aggregation to a map
Categories: Data Engineering | Tags: Hive, File Format, Java, HBase
I am publishing two new Hive UDAF to help with maps in Apache Hive. The source code is available on GitHub in two Java classes: “UDAFToMap” and “UDAFToOrderedMap” or you can download the jar file. The…
By David WORMS
Mar 6, 2012
Hadoop and HBase installation on OSX in pseudo-distributed mode
Categories: Big Data, Learning | Tags: Big Data, Hue, Deployment, Infrastructure, Hadoop, HBase
The operating system chosen is OSX but the procedure is not so different for any Unix environment because most of the software is downloaded from the Internet, uncompressed and set manually. Only a…
By David WORMS
Dec 1, 2010
Node HBase, a NodeJs client for Apache HBase
Categories: Big Data, Node.js | Tags: Big Data, Node.js, REST, HBase
HBase is a “column familly” database from the Hadoop ecosystem built on the model of Google BigTable. HBase can accommodate very large volumes of data (tera or peta) while maintaining high…
By David WORMS
Nov 1, 2010