Spring 2022 internship - building a Data Lab
By David WORMS
Nov 24, 2021
- Categories
- Data Science
- Learning
- Tags
- MongoDB
- Spark
- Argo CD
- Elasticsearch
- Internship
- Keycloak
- Kubernetes
- OpenID Connect
- PostgreSQL [more][less]
Never miss our publications about Open Source, big data and distributed systems, low frequency of one email every two months.
Job Description
Over the last few years, we developed the ability to use computers to process large amounts of data. The ecosystem evolved over a large offering of tools and libraries and the creation of the field of data science. Connecting all those components into a coherent and secured platform is a daunting task. Newcomers, as well as more experienced users, benefit from platforms that offer a first-class developer experience.
Data Labs provide developers with a comprehensive suite of software to help them explore, visualize, process, and expose data. Using their favorite languages such as Python, JavaScript, or SQL, they build pipelines to collect and store data, build visualization dashboards and deploy machine learning models.
As part of your internship, you will assemble multiple open source technologies to provide the data scientists with a modern environment suiting their needs. Data scientists expect a user-friendly web interface to provision their favorite development editors, the ability to use their favorite libraries without restriction in an isolated and self-contained environment, the scaling of resources according to their requirements, and the ability to push their code into production.
The Datalab platform relies on the flexible Kubernetes backend coupled with document storage compatible with any S3 standard interface. On-demand containers should be provisioned and cover a large panel of databases (Elasticsearch, MongoDB, PostgreSQL, …), environments (TensorFlow, VSCode, Jupyter, RStudio, …), and complementary tools such as secrets management with Vault, automated provisioning with Argo CD, OpenID Connect authentication with Keycloack, workflow scheduling, API publishing, …
During this internship, you will become familiar with the Kubernetes and the CNCF ecosystem, gain a deep understanding of the roles and the responsibilities expected from Data Scientists and become comfortable in addressing their needs. You will join an agile team led by a Data Science expert.
In addition, you will obtain at the end of the internship a certification from a Cloud provider, and a Databricks certification.
Company presentation
Adaltas is a consulting agency led by a team of open source experts focusing on data management. We deploy and operate the storage and computing infrastructures in collaboration with our customers.
Partner with Cloudera and Databricks, we are also open source contributors. We invite you to browse our site and our many technical publications to learn more about the company.
Responsibilities
- Understand and address the need for data science
- learn the various moving pieces of a Datalab
- Deploy the Datalab inside a Kubernetes cluster
- Deploy machine learning workflows
Expected qualifications
- Engineering school, end of studies internship
- Analytical and structured
- Autonomous and curious
- You are an open-minded person who enjoys sharing, communicating, and learning from others
- Good knowledge of Python, Spark, and Linux systems
You will be in charge of understanding the architecture and integrating it with an existing infrastructure. You will work with InfraOps and data scientists. We are looking for a person who will develop skills on the following tools and solutions:
All complementary experiences are valuable.
Additional information
- Location: Boulogne Billancourt, France
- Languages: French or English
- Start: February 2022
- Duration: 6 months
- Teleworking: possibility of working 2 days a week remotely
Available hardware
A laptop with the following characteristics:
- 32GB RAM
- 1TB SSD
- 8c/16t CPU
A cluster made up of:
- 3x 28c/56t Intel Xeon Scalable Gold 6132
- 3x 192TB RAM DDR4 ECC 2666MHz
- 3x 14 SSD 480GB SATA Intel S4500 6Gbps
Platforms, components, tools
A Kubernetes cluster.
Remuneration
- Salary 1200 € / month
- Restaurant tickets
- Transportation pass
- Participation in one international conference
In the past, the conferences which we attended included the KubeCon organized by the CNCF foundation, the Open Source Summit from the Linux Foundation and the Fosdem.
Contact
For any request for additional information and to submit your application, please contact David Worms:
- david@adaltas.com
- +33 6 76 88 72 13
- https://www.linkedin.com/in/david-worms/