Data Science

Data Science, and more generally Artificial Intelligence (AI), differs from traditional programming and analysis in its ability to extract knowledge from data and modify its behavior and learn without specific programming. While traditional software predefines the logic that governs their processes, Data Science's algorithms build and discover models and are able to continually improve them.

Data Science brings together a set of skills including Machine Learning, Natural Language Processing (NLP), speech, images and faces recognition (among other applications). In some applications, the algorithms go so far as to simulate human intelligence.

Convergence of business, data and statitics

Articles related to Data Science

Avoid Bottlenecks in distributed Deep Learning pipelines with Horovod

Avoid Bottlenecks in distributed Deep Learning pipelines with Horovod

Categories: Data Science | Tags: Deep Learning, GPU, Keras, TensorFlow, Horovod

The Deep Learning training process can be greatly speed up using a cluster of GPUs. When dealing with huge amounts of data, distributed computing quickly becomes a challenge. A common obstacle which…

By Grégor JOUET

Nov 15, 2019

Innovation, project vs product culture in Data Science

Innovation, project vs product culture in Data Science

Categories: Data Science, Data Governance | Tags: DevOps, Agile, Scrum

Data Science carries the jobs of tomorrow. It is closely linked to the understanding of the business usecases, the behaviors and the insights that will be extracted from existing data. The stakes are…

By David WORMS

Oct 8, 2019

Machine Learning model deployment

Machine Learning model deployment

Categories: Big Data, Data Engineering, Data Science, DevOps & SRE | Tags: AI, Cloud, DevOps, Machine Learning, On-premise, Operation, Schema

“Enterprise Machine Learning requires looking at the big picture … from a data engineering and a data platform perspective,” lectured Justin Norman during the talk on the deployment of Machine…

By Oskar RYNKIEWICZ

Sep 30, 2019

TensorFlow installation on Docker

TensorFlow installation on Docker

Categories: Containers Orchestration, Data Science, Learning | Tags: AI, CPU, Deep Learning, Docker, Jupyter, Linux, TensorFlow

TensorFlow is an Open Source software from Google for numerical computation using a graph representation: Vertex (nodes) represent mathematical operations Edges represent N-dimensional data array…

By Pierre SAUVAGE

Aug 5, 2019

Spark Streaming part 4: clustering with Spark MLlib

Spark Streaming part 4: clustering with Spark MLlib

Categories: Data Engineering, Data Science, Learning | Tags: Spark, Apache Spark Streaming, Big Data, Clustering, Machine Learning, Scala, Streaming

Spark MLlib is an Apache’s Spark library offering scalable implementations of various supervised and unsupervised Machine Learning algorithms. Thus, Spark framework can serve as a platform for…

By Oskar RYNKIEWICZ

Jul 11, 2019

Introduction to Cloudera Data Science Workbench

Introduction to Cloudera Data Science Workbench

Categories: Data Science | Tags: Cloud, Cloudera, Docker, Git, Kubernetes, Machine Learning, Azure, Notebook, Tuning

Cloudera Data Science Workbench is a platform that allows Data Scientists to create, manage, run and schedule data science workflows from their browser. Thus it enables them to focus on their main…

By Mehdi ELALAMI

Feb 28, 2019

Applying Deep Reinforcement Learning to Poker

Applying Deep Reinforcement Learning to Poker

Categories: Data Science | Tags: Algorithms, Deep Learning, Gaming, Machine Learning, Python, Q-learning, Neural Network

We will cover the subject of Deep Reinforcement Learning, more specifically the Deep Q Learning algorithm introduced by DeepMind, and then we’ll apply a version of this algorithm to the game of Poker…

By Oscar BLAZEJEWSKI

Jan 9, 2019

CodaLab – Data Science competitions

CodaLab – Data Science competitions

Categories: Data Science, Adaltas Summit 2018, Learning | Tags: Database, Infrastructure, Machine Learning, MySQL, Node.js, Python

CodaLab Competition is a platform for code execution in the field of Data Science. It is a web interface on which a user can submit code or results and compare themselves to others. Let’s see how it…

By Robert Walid SOARES

Dec 17, 2018

Nvidia and AI on the edge

Nvidia and AI on the edge

Categories: Data Science | Tags: AI, Caffe, Deep Learning, Edge computing, GPU, Keras, NVIDIA, PyTorch, TensorFlow

In the last four years, corporations have been investing a lot in AI and particularly in Deep Learning and Edge Computing. While the theory has taken huge steps forward and new algorithms are invented…

By Yliess HATI

Oct 10, 2018

Lando: Deep Learning used to summarize conversations

Lando: Deep Learning used to summarize conversations

Categories: Data Science, Learning | Tags: Deep Learning, Kubernetes, Micro Services, Node.js, Open API, Neural Network

Lando is an application to summarize conversations using Speech To Text to translate the written record of a meeting into text and Deep Learning technics to summarize contents. It allows users to…

By Yliess HATI

Sep 18, 2018

Deep learning on YARN: running Tensorflow and friends on Hadoop cluster

Deep learning on YARN: running Tensorflow and friends on Hadoop cluster

Categories: Data Science | Tags: Spark, Spark MLlib, YARN, Deep Learning, GPU, PyTorch, TensorFlow, XGBoost, Hadoop

With the arrival of Hadoop 3, YARN offer more flexibility in resource management. It is now possible to perform Deep Learning analysis on GPUs with specific development environments, leveraging…

By Louis BIANCHERIN

Jul 24, 2018

YARN and GPU Distribution for Machine Learning

YARN and GPU Distribution for Machine Learning

Categories: Data Science, DataWorks Summit 2018 | Tags: YARN, GPU, Machine Learning, Storage, Neural Network

This article goes over the fundamental principles of Machine Learning and what tools are currently used to run machine learning algorithms. We will then see how a resource manager such as YARN can be…

By Grégor JOUET

May 30, 2018

TensorFlow on Spark 2.3: The Best of Both Worlds

TensorFlow on Spark 2.3: The Best of Both Worlds

Categories: Data Science, DataWorks Summit 2018 | Tags: Mesos, Spark, YARN, C++, CPU, GPU, JavaScript, Keras, Kubernetes, Machine Learning, Python, TensorFlow, Tuning

The integration of TensorFlow With Spark has a lot of potential and creates new opportunities. This article is based on a conference seen at the DataWorks Summit 2018 in Berlin. It was about the new…

By Yliess HATI

May 29, 2018

Apache Apex with Apache SAMOA

Apache Apex with Apache SAMOA

Categories: Data Science, Events, Tech Radar | Tags: Apex, Flink, Samoa, Storm, Machine Learning, Tools, Hadoop

Traditional Machine Learning Batch Oriented Supervised - most common Training and Scoring One time model building Data set Training: Model building Holdout: Paremeter tuning Test: Accuracy Online…

By Pierre SAUVAGE

Jul 17, 2016

Apache Apex: next gen Big Data analytics

Apache Apex: next gen Big Data analytics

Categories: Data Science, Events, Tech Radar | Tags: Apex, Flink, Kafka, Storm, Data Science, Machine Learning, Tools, Hadoop

Below is a compilation of my notes taken during the presentation of Apache Apex by Thomas Weise from DataTorrent, the company behind Apex. Introduction Apache Apex is an in-memory distributed parallel…

By César BEREZOWSKI

Jul 17, 2016

Definitions of machine learning algorithms present in Apache Mahout

Definitions of machine learning algorithms present in Apache Mahout

Categories: Data Science | Tags: Algorithms, Mahout, Сlassification, Clustering, Machine Learning, Hadoop

Apache Mahout is a machine learning library built for scalability. Its core algorithms for clustering, classfication and batch based collaborative filtering are implemented on top of Apache Hadoop…

By David WORMS

Mar 8, 2013

Hadoop and R with RHadoop

Hadoop and R with RHadoop

Categories: Business Intelligence, Data Science | Tags: HBase, HDFS, MapReduce, Thrift, Data Analytics, Learning and tutorial, R, Hadoop

RHadoop is a bridge between R, a language and environment to statistically explore data sets, and Hadoop, a framework that allows for the distributed processing of large data sets across clusters of…

By David WORMS

Jul 19, 2012

Installing and using MADlib with PostgreSQL on OSX

Installing and using MADlib with PostgreSQL on OSX

Categories: Data Science | Tags: Database, Greenplum, PostgreSQL, Statistics, SQL

We cover basic installation and usage of PostgreSQL and MADlib on OSX and Ubuntu. Instructions for other environments should be similar. PostgreSQL is an Open Source database with enterprise…

By David WORMS

Jul 7, 2012

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat
Canada

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.