DataWorks Summit 2018 : Quelques jours pour parler Hadoop
L'équipe Adaltas s'est rendue au complet à la conférence DataWorks Summit 2018 qui a eu lieu à Berlin les 18 et 19 avril. A cette occasion, nous avons compilé une série d'articles sur les présentations qui nous ont les plus marquées.
Parmi les différents sujets abordés au cours de ces 2 jours, la nouvelle version 3 de Hadoop et ses sous-projets, HDFS et YARN, était probablement le plus en vogue. En outre, les différentes discussions relatives à la gouvernance de plate-formes et de Data Lakes illustrent la maturité atteinte par l'écosystème.
Articles associé à la conférence
Apache Hadoop YARN 3.0 – State of the union
Catégories : Big Data, DataWorks Summit 2018 | Tags : GPU, Hortonworks, Hadoop, HDFS, MapReduce, YARN, Cloudera, Data Science, Docker, Release and features
This article covers the ”Apache Hadoop YARN: state of the union” talk held by Wangda Tan from Hortonworks during the Dataworks Summit 2018. What is Apache YARN? As a reminder, YARN is one of the two…
Par Lucas BAKALIAN
31 mai 2018
Accelerating query processing with materialized views in Apache Hive
Catégories : Business Intelligence, DataWorks Summit 2018 | Tags : Calcite, OLAP, Druid, Hive, Release and features, SQL
The new materialized view feature is coming in Apache Hive 3.0. Jesus Camacho Rodriguez from Hortonworks held a talk ”Accelerating query processing with materialized views in Apache Hive” about it…
31 mai 2018
YARN and GPU Distribution for Machine Learning
Catégories : Data Science, DataWorks Summit 2018 | Tags : arXiv, GPU, Grafana, MXNet, YARN, Docker, Machine Learning, Neural Network, Storage, TensorFlow
This article goes over the fundamental principles of Machine Learning and what tools are currently used to run machine learning algorithms. We will then see how a resource manager such as YARN can be…
Par Grégor JOUET
30 mai 2018
TensorFlow on Spark 2.3: The Best of Both Worlds
Catégories : Data Science, DataWorks Summit 2018 | Tags : Mesos, C++, CPU, GPU, Tuning, Spark, YARN, JavaScript, Keras, Kubernetes, Machine Learning, Python, TensorFlow
The integration of TensorFlow With Spark has a lot of potential and creates new opportunities. This article is based on a conference seen at the DataWorks Summit 2018 in Berlin. It was about the new…
Par Yliess HATI
29 mai 2018
Apache Metron in the Real World
Catégories : Cyber Security, DataWorks Summit 2018 | Tags : Algorithm, NiFi, Solr, Storm, pcap, RDBMS, HDFS, Kafka, Metron, Spark, Data Science, Elasticsearch, SQL
Apache Metron is a storage and analytic platform specialized in cyber security. This talk was about demonstrating the usages and capabilities of Apache Metron in the real world. The presentation was…
Par Michael HATOUM
29 mai 2018
Running Enterprise Workloads in the Cloud with Cloudbreak
Catégories : Big Data, Cloud Computing, DataWorks Summit 2018 | Tags : Cloudbreak, Operation, Hadoop, AWS, Azure, GCP, HDP, OpenStack
This article is based on Peter Darvasi and Richard Doktorics’ talk Running Enterprise Workloads in the Cloud at the DataWorks Summit 2018 in Berlin. It presents Hortonworks’ automated deployment tool…
Par Joris RUMMENS
28 mai 2018
Omid: Scalable and highly available transaction processing for Apache Phoenix
Catégories : Big Data, DataWorks Summit 2018 | Tags : Omid, Phoenix, Transaction, ACID, HBase, SQL
Apache Omid provides a transactional layer on top of key/value NoSQL databases. In practice, it is usually used on top of Apache HBase. Credits to Ohad Shacham for his talk and his work for Apache…
Par Xavier HERMAND
24 mai 2018
Apache Beam: a unified programming model for data processing pipelines
Catégories : Data Engineering, DataWorks Summit 2018 | Tags : Apex, Beam, Java, Pipeline, Flink, Spark, Batch processing, Python, Streaming, TCO
In this article, we will review the concepts, the history and the future of Apache Beam, that may well become the new standard for data processing pipelines definition. At Dataworks Summit 2018 in…
Par Gauthier LEONARD
24 mai 2018
Present and future of Hadoop workflow scheduling: Oozie 5.x
Catégories : Big Data, DataWorks Summit 2018 | Tags : Hadoop, Hive, Oozie, Sqoop, CDH, HDP, Python, REST
During the DataWorks Summit Europe 2018 in Berlin, I had the opportunity to attend a breakout session on Apache Oozie. It covers the new features released in Oozie 5.0, including future features of…
Par Leo SCHOUKROUN
23 mai 2018
What's new in Apache Spark 2.3?
Catégories : Data Engineering, DataWorks Summit 2018 | Tags : Arrow, PySpark, Tuning, ORC, Spark, Spark MLlib, Data Science, Docker,