Cloudera Distribution Platform (CDH)
CDP is the plateform born from the merger between Cloudera and Hortonworks which provide a complete solution from the Edge to AI all with enterprise-grade security and governance. It is available both on premises and on the major Cloud providers.
Related articles
CDP part 6: end-to-end data lakehouse ingestion pipeline with CDP
Categories: Big Data, Data Engineering, Learning | Tags: NiFi, Business intelligence, Data Engineering, Iceberg, Spark, Big Data, Cloudera, CDP, Data Analytics, Data Lake, Data Warehouse
In this hands-on lab session we demonstrate how to build an end-to-end big data solution with Cloudera Data Platform (CDP) Public Cloud, using the infrastructure we have deployed and configured overā¦
Jul 24, 2023
CDP part 5: user permissions management on CDP Public Cloud
Categories: Big Data, Cloud Computing, Data Governance | Tags: Ranger, Cloudera, CDP, Data Warehouse
When you create a user or a group in CDP, it requires permissions to access resources and use the Data Services. This article is the fifth in a series of six: CDP part 1: introduction to end-to-endā¦
Jul 18, 2023
CDP part 4: user management on CDP Public Cloud with Keycloak
Categories: Big Data, Cloud Computing, Data Governance | Tags: EC2, Big Data, CDP, Docker Compose, Keycloak, SSO
Previous articles of the serie cover the deployment of a CDP Public Cloud environment. All the components are ready for use and it is time to make the environment available to other users to exploreā¦
Jul 4, 2023
CDP part 3: Data Services activation on CDP Public Cloud environment
Categories: Big Data, Cloud Computing, Infrastructure | Tags: Infrastructure, AWS, Big Data, Cloudera, CDP
One of the big selling points of Cloudera Data Platform (CDP) is their mature managed service offering. These are easy to deploy on-premises, in the public cloud or as part of a hybrid solution. Theā¦
Jun 27, 2023
CDP part 2: CDP Public Cloud deployment on AWS
Categories: Big Data, Cloud Computing, Infrastructure | Tags: Infrastructure, AWS, Big Data, Cloud, Cloudera, CDP, Cloudera Manager
The Cloudera Data Platform (CDP) Public Cloud provides the foundation upon which full featured data lakes are created. In a previous article, we introduced the CDP platform. This article is the secondā¦
Jun 19, 2023
CDP part 1: introduction to end-to-end data lakehouse architecture with CDP
Categories: Cloud Computing, Data Engineering, Infrastructure | Tags: Data Engineering, Hortonworks, Iceberg, AWS, Azure, Big Data, Cloud, Cloudera, CDP, Cloudera Manager, Data Warehouse
Cloudera Data Platform (CDP) is a hybrid data platform for big data transformation, machine learning and data analytics. In this series we describe how to build and use an end-to-end big dataā¦
By Stephan BAUM
Jun 8, 2023
Keycloak deployment in EC2
Categories: Cloud Computing, Data Engineering, Infrastructure | Tags: Security, EC2, Authentication, AWS, Docker, Keycloak, SSL/TLS, SSO
Why use Keycloak Keycloak is an open-source identity provider (IdP) using single sign-on (SSO). An IdP is a tool to create, maintain, and manage identity information for principals and to provideā¦
By Stephan BAUM
Mar 14, 2023
Spark on Hadoop integration with Jupyter
Categories: Adaltas Summit 2021, Infrastructure, Tech Radar | Tags: Infrastructure, Jupyter, Spark, YARN, CDP, HDP, Notebook, TDP
For several years, Jupyter notebook has established itself as the notebook solution in the Python universe. Historically, Jupyter is the tool of choice for data scientists who mainly develop in Pythonā¦
Sep 1, 2022
Introducing Trunk Data Platform: the Open-Source Big Data Distribution Curated by TOSIT
Categories: Big Data, DevOps & SRE, Infrastructure | Tags: DevOps, Hortonworks, Ansible, Hadoop, HBase, Knox, Ranger, Spark, Cloudera, CDP, CDH, Open source, TDP
Ever since Cloudera and Hortonworks merged, the choice of commercial Hadoop distributions for on-prem workloads essentially boils down to CDP Private Cloud. CDP can be seen as the ābest of both worldsā¦
Apr 14, 2022
Using Cloudera Deploy to install Cloudera Data Platform (CDP) Private Cloud
Categories: Big Data, Cloud Computing | Tags: Ansible, Cloudera, CDP, Cluster, Data Warehouse, Vagrant, IaC
Following our recent Cloudera Data Platform (CDP) overview, we cover how to deploy CDP private Cloud on you local infrastructure. It is entirely automated with the Ansible cookbooks published byā¦
Jul 23, 2021
An overview of Cloudera Data Platform (CDP)
Categories: Big Data, Cloud Computing, Data Engineering | Tags: SDX, Big Data, Cloud, Cloudera, CDP, CDH, Data Analytics, Data Hub, Data Lake, Data lakehouse, Data Warehouse
Cloudera Data Platform (CDP) is a cloud computing platform for businesses. It provides integrated and multifunctional self-service tools in order to analyze and centralize data. It brings security andā¦
Jul 19, 2021
Build your open source Big Data distribution with Hadoop, HBase, Spark, Hive & Zeppelin
Categories: Big Data, Infrastructure | Tags: Maven, Hadoop, HBase, Hive, Spark, Git, Release and features, TDP, Unit tests
The Hadoop ecosystem gave birth to many popular projects including HBase, Spark and Hive. While technologies like Kubernetes and S3 compatible object storages are growing in popularity, HDFS and YARNā¦
Dec 18, 2020
Rebuilding HDP Hive: patch, test and build
Categories: Big Data, Infrastructure | Tags: Maven, Java, Hive, Git, GitHub, Release and features, TDP, Unit tests
The Hortonworks HDP distribution will soon be deprecated in favor of Clouderaās CDP. One of our clients wanted a new Apache Hive feature backported into HDP 2.6.0. We thought it was a good opportunityā¦
Oct 6, 2020
Installing Hadoop from source: build, patch and run
Categories: Big Data, Infrastructure | Tags: Maven, Java, LXD, Hadoop, HDFS, Docker, TDP, Unit tests
Commercial Apache Hadoop distributions have come and gone. The two leaders, Cloudera and Hortonworks, have merged: HDP is no more and CDH is now CDP. MapR has been acquired by HP and IBM BigInsightsā¦
Aug 4, 2020
Should you move your Big Data and Data Lake to the Cloud
Categories: Big Data, Cloud Computing | Tags: DevOps, AWS, Azure, Cloud, CDP, Databricks, GCP
Should you follow the trend and migrate your data, workflows and infrastructure to GCP, AWS and Azure? During the Strata Data Conference in New-York, a general focus was put on moving customerās Bigā¦
Dec 9, 2019