Amazon Web Services (AWS)

Amazon Web Services is a division of Amazon that offers a platform for on-demand cloud computing with more than 175 affiliated services. Among them, AWS S3 and AWS EC2, which allows you to respectively host your data and rent virtual machines to execute various types of actions.

Related articles

CDP part 3: Data Services activation on CDP Public Cloud environment

CDP part 3: Data Services activation on CDP Public Cloud environment

Categories: Big Data, Cloud Computing, Infrastructure | Tags: Infrastructure, AWS, Big Data, Cloudera, CDP

One of the big selling points of Cloudera Data Platform (CDP) is their mature managed service offering. These are easy to deploy on-premises, in the public cloud or as part of a hybrid solution. Theā€¦

Albert KONRAD

By Albert KONRAD

Jun 27, 2023

CDP part 2: CDP Public Cloud deployment on AWS

CDP part 2: CDP Public Cloud deployment on AWS

Categories: Big Data, Cloud Computing, Infrastructure | Tags: Infrastructure, AWS, Big Data, Cloud, Cloudera, CDP, Cloudera Manager

The Cloudera Data Platform (CDP) Public Cloud provides the foundation upon which full featured data lakes are created. In a previous article, we introduced the CDP platform. This article is the secondā€¦

Albert KONRAD

By Albert KONRAD

Jun 19, 2023

CDP part 1: introduction to end-to-end data lakehouse architecture with CDP

CDP part 1: introduction to end-to-end data lakehouse architecture with CDP

Categories: Cloud Computing, Data Engineering, Infrastructure | Tags: Data Engineering, Hortonworks, Iceberg, AWS, Azure, Big Data, Cloud, Cloudera, CDP, Cloudera Manager, Data Warehouse

Cloudera Data Platform (CDP) is a hybrid data platform for big data transformation, machine learning and data analytics. In this series we describe how to build and use an end-to-end big dataā€¦

Stephan BAUM

By Stephan BAUM

Jun 8, 2023

Keycloak deployment in EC2

Keycloak deployment in EC2

Categories: Cloud Computing, Data Engineering, Infrastructure | Tags: Security, EC2, Authentication, AWS, Docker, Keycloak, SSL/TLS, SSO

Why use Keycloak Keycloak is an open-source identity provider (IdP) using single sign-on (SSO). An IdP is a tool to create, maintain, and manage identity information for principals and to provideā€¦

Stephan BAUM

By Stephan BAUM

Mar 14, 2023

Data platform requirements and expectations

Data platform requirements and expectations

Categories: Big Data, Infrastructure | Tags: Data Engineering, Data Governance, Data Analytics, Data Hub, Data Lake, Data lakehouse, Data Science

A big data platform is a complex and sophisticated system that enables organizations to store, process, and analyze large volumes of data from a variety of sources. It is composed of severalā€¦

David WORMS

By David WORMS

Mar 23, 2023

Self-Paced training from Databricks: a guide to self-enablement on Big Data & AI

Self-Paced training from Databricks: a guide to self-enablement on Big Data & AI

Categories: Data Engineering, Learning | Tags: Cloud, Data Lake, Databricks, Delta Lake, MLflow

Self-paced trainings are proposed by Databricks inside their Academy program. The price is $ 2000 USD for unlimited access to the training courses for a period of 1 year, but also free for customersā€¦

Anna KNYAZEVA

By Anna KNYAZEVA

May 26, 2021

Find your way into data related Microsoft Azure certifications

Find your way into data related Microsoft Azure certifications

Categories: Cloud Computing, Data Engineering | Tags: Data Governance, Azure, Data Science

Microsoft Azure has certification paths for many technical job roles such as developer, Data Engineer, Data Scientist and solution architect among others. Each of these certifications consists ofā€¦

Barthelemy NGOM

By Barthelemy NGOM

Apr 14, 2021

Importing data to Databricks: external tables and Delta Lake

Importing data to Databricks: external tables and Delta Lake

Categories: Data Engineering, Data Science, Learning | Tags: Parquet, AWS, Amazon S3, Azure Data Lake Storage (ADLS), Databricks, Delta Lake, Python

During a Machine Learning project we need to keep track of the training data we are using. This is important for audit purposes and for assessing the performance of the models, developed at a laterā€¦

Introducing Apache Airflow on AWS

Introducing Apache Airflow on AWS

Categories: Big Data, Cloud Computing, Containers Orchestration | Tags: PySpark, Learning and tutorial, Airflow, Oozie, Spark, AWS, Docker, Python

Apache Airflow offers a potential solution to the growing challenge of managing an increasingly complex landscape of data management tools, scripts and analytics processes. It is an open-sourceā€¦

Aargan COINTEPAS

By Aargan COINTEPAS

May 5, 2020

Snowflake, the Data Warehouse for the Cloud, introduction and tutorial

Snowflake, the Data Warehouse for the Cloud, introduction and tutorial

Categories: Business Intelligence, Cloud Computing | Tags: Cloud, Data Lake, Data Science, Data Warehouse, Snowflake

Snowflake is a SaaS-based data-warehousing platform that centralizes, in the cloud, the storage and processing of structured and semi-structured data. The increasing generation of data produced overā€¦

Jules HAMELIN-BOYER

By Jules HAMELIN-BOYER

Apr 7, 2020

MLflow tutorial: an open source Machine Learning (ML) platform

MLflow tutorial: an open source Machine Learning (ML) platform

Categories: Data Engineering, Data Science, Learning | Tags: AWS, Azure, Databricks, Deep Learning, Deployment, Machine Learning, MLflow, MLOps, Python, Scikit-learn

Introduction and principles of MLflow With increasingly cheaper computing power and storage and at the same time increasing data collection in all walks of life, many companies integrated Data Scienceā€¦

Cloudera CDP and Cloud migration of your Data Warehouse

Cloudera CDP and Cloud migration of your Data Warehouse

Categories: Big Data, Cloud Computing | Tags: Azure, Cloudera, Data Hub, Data Lake, Data Warehouse

While one of our customer is anticipating a move to the Cloud and with the recent announcement of Cloudera CDP availability mi-september during the Strata conference, it seems like the appropriateā€¦

David WORMS

By David WORMS

Dec 16, 2019

Should you move your Big Data and Data Lake to the Cloud

Should you move your Big Data and Data Lake to the Cloud

Categories: Big Data, Cloud Computing | Tags: DevOps, AWS, Azure, Cloud, CDP, Databricks, GCP

Should you follow the trend and migrate your data, workflows and infrastructure to GCP, AWS and Azure? During the Strata Data Conference in New-York, a general focus was put on moving customerā€™s Bigā€¦

Joris RUMMENS

By Joris RUMMENS

Dec 9, 2019

Google Cloud Summit Paris Notes

Google Cloud Summit Paris Notes

Categories: Events | Tags: AWS, Azure, Cloud, GCP, Kubernetes, On-premises

Google organized its yearly Summit edition 2019 in Paris on the 18th of June. This yearā€™s event was the biggest yet in Paris, which reflect Googleā€™s commitment to position itself in the French marketā€¦

Tariq SAHNOUNI

By Tariq SAHNOUNI

Jun 26, 2019

Running Enterprise Workloads in the Cloud with Cloudbreak

Running Enterprise Workloads in the Cloud with Cloudbreak

Categories: Big Data, Cloud Computing, DataWorks Summit 2018 | Tags: Cloudbreak, Operation, Hadoop, AWS, Azure, GCP, HDP, OpenStack

This article is based on Peter Darvasi and Richard Doktoricsā€™ talk Running Enterprise Workloads in the Cloud at the DataWorks Summit 2018 in Berlin. It presents Hortonworksā€™ automated deployment toolā€¦

Joris RUMMENS

By Joris RUMMENS

May 28, 2018

Canada - Morocco - France

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Scienceā€¦

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.

Support Ukrain