Cloud agnostic datalab

Cloud agnostic datalab#

The Cloud Agnostic Datalab is a web service to easily deploy and use containerized data science services like JupyterLab, Rstudio, Spark, Superset and PostgreSQL. It is the main service provided for the hackathon.

Agnostic overview

Typical use case

It can be used if you want to read data from S3 buckets (with programmatic access) and do some data processing with R/Python/Spark and finally you can create a specific data view, push it to PostgreSQL database and create a custom dashboard with Superset.

Login#

For access to the service you have to click on the EC DataPlatform Azure AD.

Login screen

There you have to use the provided Azure AD login credentials which you received previously and activated with MFA.

Azure AD login

For the first time login you have to enter your private email address that is used in MS Teams.

Azure AD login update

Home#

After successful login you arrive at the home screen of the Cloud Agnostic Datalab.

Home

Here you can find a link to this documentation and the terms of use of these services. Please read it at least once, because by using these services you agree those terms.

My Account#

Under the My Account you find your Azure AD identifier, your name, your email which is used in the MS Teams group and where you receive the information about your Azure AD account activation. In addition, you will find here the use of resources for your Data Science Lab (DSL) available for your group. Each team have 128 vCPU and 512 GB RAM.

Account

Service Catalog#

Under the Service Catalog you find the available services you can launch.

Catalogue

You can choose from the following services:

Jupyterlab (Spark) v3.4.3

For the Jupyterlab (Spark) configuration you have to provide:

Jupyterlab v3.2.8

For the Jupyterlab configuration you have to provide:

My Services#

Under the My Services you will find the list of services you launched.

Services

After the launch of a service it will have a status PENDING. A few minutes later refreshing the page the status should change to ACTIVE and the link to Open (or Copy for PostgreSQL) should show up in the last column. Clicking on the Open button the webseite of the service will open in a new window.
Here you can also Terminate services, and PLEASE terminate those services which are not used to save energy and allow your teammate to use them if necessary.

My Data#

Under the My Data you will find the list of passwords/secrets created by the services you launched.

Data

By clicking on the line of a service you can view, edit or copy to the clipboard the password of the service.

Secret

You can add also add additional secrets that you can use in the launched services.

Limitations#

The Cloud Agnostic Datalab has the following restrictions:

The deployment configuration cannot be changed after its launch
It is not possible to manually update an automatically generated secret at the instance deployment
Later changing the secret under My Data will not change the password for the launched service
Each service is provided with a specific service’s version
Don’t update services’ version manually/from the UI
No root access for JupyterLab and RStudio
Apache Superset does not able to connect to database if the browser is using incognito mode
Apache Superset First name & Last name field values must not contain a space character (use an underscore or single word)
Not all libraries are installed, some libraries can be installed by the user if it does not require root access
Simultaneous access to the shared service of RStudio is limited, only one person can be connected to it
The storage quota of the DSL is not validated against its limits