All-in-one web-based IDE for machine learning and data science.
Deploy this app to Linode with a free $100 credit!
All-in-one web-based development environment for machine learning
Getting Started • Features & Screenshots • Support • Report a Bug • FAQ • Known Issues • Contribution
The ML workspace is an all-in-one web-based IDE specialized for machine learning and data science. It is simple to deploy and gets you started within minutes to productively built ML solutions on your own machines. This workspace is the ultimate tool for developers preloaded with a variety of popular data science libraries (e.g., Tensorflow, PyTorch, Keras, Sklearn) and dev tools (e.g., Jupyter, VS Code, Tensorboard) perfectly configured, optimized, and integrated.
The workspace requires Docker to be installed on your machine (📖 Installation Guide).
Deploying a single workspace instance is as simple as:
docker run -p 8080:8080 mltooling/ml-workspace:0.13.2
Voilà, that was easy! Now, Docker will pull the latest workspace image to your machine. This may take a few minutes, depending on your internet speed. Once the workspace is started, you can access it via http://localhost:8080.
If started on another machine or with a different port, make sure to use the machine's IP/DNS and/or the exposed port.
To deploy a single instance for productive usage, we recommend to apply at least the following options:
docker run -d \
-p 8080:8080 \
--name "ml-workspace" \
-v "${PWD}:/workspace" \
--env AUTHENTICATE_VIA_JUPYTER="mytoken" \
--shm-size 512m \
--restart always \
mltooling/ml-workspace:0.13.2
This command runs the container in background (-d
), mounts your current working directory into the /workspace
folder (-v
), secures the workspace via a provided token (--env AUTHENTICATE_VIA_JUPYTER
), provides 512MB of shared memory (--shm-size
) to prevent unexpected crashes (see known issues section), and keeps the container running even on system restarts (--restart always
). You can find additional options for docker run here and workspace configuration options in the section below.
The workspace provides a variety of configuration options that can be used by setting environment variables (via docker run option: --env
).
Variable | Description | Default |
---|---|---|
WORKSPACE_BASE_URL | The base URL under which Jupyter and all other tools will be reachable from. | / |
WORKSPACE_SSL_ENABLED | Enable or disable SSL. When set to true, either certificate (cert.crt) must be mounted to /resources/ssl or, if not, the container generates self-signed certificate. |
false |
WORKSPACE_AUTH_USER | Basic auth user name. To enable basic auth, both the user and password need to be set. We recommend to use the AUTHENTICATE_VIA_JUPYTER for securing the workspace. |
|
WORKSPACE_AUTH_PASSWORD | Basic auth user password. To enable basic auth, both the user and password need to be set. We recommend to use the AUTHENTICATE_VIA_JUPYTER for securing the workspace. |
|
WORKSPACE_PORT | Configures the main container-internal port of the workspace proxy. For most scenarios, this configuration should not be changed, and the port configuration via Docker should be used instead of the workspace should be accessible from a different port. | 8080 |
CONFIG_BACKUP_ENABLED | Automatically backup and restore user configuration to the persisted /workspace folder, such as the .ssh, .jupyter, or .gitconfig from the users home directory. |
true |
SHARED_LINKS_ENABLED | Enable or disable the capability to share resources via external links. This is used to enable file sharing, access to workspace-internal ports, and easy command-based SSH setup. All shared links are protected via a token. However, there are certain risks since the token cannot be easily invalidated after sharing and does not expire. | true |
INCLUDE_TUTORIALS | If true , a selection of tutorial and introduction notebooks are added to the /workspace folder at container startup, but only if the folder is empty. |
true |
MAX_NUM_THREADS | The number of threads used for computations when using various common libraries (MKL, OPENBLAS, OMP, NUMBA, ...). You can also use auto to let the workspace dynamically determine the number of threads based on available CPU resources. This configuration can be overwritten by the user from within the workspace. Generally, it is good to set it at or below the number of CPUs available to the workspace. |
auto |
Jupyter Configuration: | ||
SHUTDOWN_INACTIVE_KERNELS | Automatically shutdown inactive kernels after a given timeout (to clean up memory or GPU resources). Value can be either a timeout in seconds or set to true with a default value of 48h. |
false |
AUTHENTICATE_VIA_JUPYTER | If true , all HTTP requests will be authenticated against the Jupyter server, meaning that the authentication method configured with Jupyter will be used for all other tools as well. This can be deactivated with false . Any other value will activate this authentication and are applied as token via NotebookApp.token configuration of Jupyter. |
false |
NOTEBOOK_ARGS | Add and overwrite Jupyter configuration options via command line args. Refer to this overview for all options. |
To persist the data, you need to mount a volume into /workspace
(via docker run option: -v
).
We strongly recommend enabling authentication via one of the following two options. For both options, the user will be required to authenticate for accessing any of the pre-installed tools.
The authentication only works for all tools accessed through the main workspace port (default:
8080
). This works for all preinstalled tools and the Access Ports feature. If you expose another port of the container, please make sure to secure it with authentication as well!
docker run -p 8080:8080 --env AUTHENTICATE_VIA_JUPYTER="mytoken" mltooling/ml-workspace:0.13.2
You can also use `docker run -p 8080:8080 --env WORKSPACE_AUTH_USER="user" --env WORKSPACE_AUTH_PASSWORD="pwd" mltooling/ml-workspace:0.13.2
The basic authentication is configured via the nginx proxy and might be more performant compared to the other option since with `AUTHENTICATE_VIA_JUPYTER` every request to any tool in the workspace will check via the Jupyter instance if the user (based on the request cookies) is authenticated.
We recommend enabling SSL so that the workspace is accessible via HTTPS (encrypted communication). SSL encryption can be activated via the WORKSPACE_SSL_ENABLED
variable.
docker run \
-p 8080:8080 \
--env WORKSPACE_SSL_ENABLED="true" \
-v /path/with/certificate/files:/resources/ssl:ro \
mltooling/ml-workspace:0.13.2
If you want to host the workspace on a public domain, we recommend to use [Let's encrypt](https://letsencrypt.org/getting-started/) to get a trusted certificate for your domain. To use the generated certificate (e.g., via [certbot](https://certbot.eff.org/) tool) for the workspace, the `privkey.pem` corresponds to the `cert.key` file and the `fullchain.pem` to the `cert.crt` file.
> _When you enable SSL support, you must access the workspace over `https://`, not over plain `http://`._
By default, the workspace container has no resource constraints and can use as much of a given resource as the host’s kernel scheduler allows. Docker provides ways to control how much memory, or CPU a container can use, by setting runtime configuration flags of the docker run command.
The workspace requires atleast 2 CPUs and 500MB to run stable and be usable.
docker run -p 8080:8080 --cpus=8 --memory=16g --shm-size=1G mltooling/ml-workspace:0.13.2
> 📖 _For more options and documentation on resource constraints, please refer to the [official docker guide](https://docs.docker.com/config/containers/resource_constraints/)._
If a proxy is required, you can pass the proxy configuration via the HTTP_PROXY
, HTTPS_PROXY
, and NO_PROXY
environment variables.
In addition to the main workspace image (mltooling/ml-workspace
), we provide other image flavors that extend the features or minimize the image size to support a variety of use cases.
docker run -p 8080:8080 mltooling/ml-workspace-minimal:0.13.2
docker run -p 8080:8080 mltooling/ml-workspace-r:0.12.1
docker run -p 8080:8080 mltooling/ml-workspace-spark:0.12.1
docker run -p 8080:8080 --gpus all mltooling/ml-workspace-gpu:0.13.2
- (Docker < 19.03) Nvidia Docker 2.0 ([📖 Instructions](https://github.com/NVIDIA/nvidia-docker/wiki/Installation-(version-2.0))).
docker run -p 8080:8080 --runtime nvidia --env NVIDIA_VISIBLE_DEVICES="all" mltooling/ml-workspace-gpu:0.13.2
The GPU flavor also comes with a few additional configuration options, as explained below:
Variable | Description | Default |
---|---|---|
NVIDIA_VISIBLE_DEVICES | Controls which GPUs will be accessible inside the workspace. By default, all GPUs from the host are accessible within the workspace. You can either use all , none , or specify a comma-separated list of device IDs (e.g., 0,1 ). You can find out the list of available device IDs by running nvidia-smi on the host machine. |
all |
CUDA_VISIBLE_DEVICES | Controls which GPUs CUDA applications running inside the workspace will see. By default, all GPUs that the workspace has access to will be visible. To restrict applications, provide a comma-separated list of internal device IDs (e.g., 0,2 ) based on the available devices within the workspace (run nvidia-smi ). In comparison to NVIDIA_VISIBLE_DEVICES , the workspace user will be still able to access other GPUs by overwriting this configuration from within the workspace. |
|
TF_FORCE_GPU_ALLOW_GROWTH | By default, the majority of GPU memory will be allocated by the first execution of a TensorFlow graph. While this behavior can be desirable for production pipelines, it is less desirable for interactive use. Use true to enable dynamic GPU Memory allocation or false to instruct TensorFlow to allocate all memory at execution. |
true |
The workspace is designed as a single-user development environment. For a multi-user setup, we recommend deploying 🧰 ML Hub. ML Hub is based on JupyterHub with the task to spawn, manage, and proxy workspace instances for multiple users.
docker run -p 8080:8080 -v /var/run/docker.sock:/var/run/docker.sock mltooling/ml-hub:latest
For more information and documentation about ML Hub, please take a look at the [Github Site](https://github.com/ml-tooling/ml-hub).
This project is maintained by Benjamin Räthlein, Lukas Masuch, and Jan Kalkan. Please understand that we won't be able to provide individual support via email. We also believe that help is much more valuable if it's shared publicly so that more people can benefit from it.
Type | Channel |
---|---|
🚨 Bug Reports | |
🎁 Feature Requests | |
👩💻 Usage Questions | |
📢 Announcements | |
❓ Other Requests |
Jupyter • Desktop GUI • VS Code • JupyterLab • Git Integration • File Sharing • Access Ports • Tensorboard • Extensibility • Hardware Monitoring • SSH Access • Remote Development • Job Execution
The workspace is equipped with a selection of best-in-class open-source development tools to help with the machine learning workflow. Many of these tools can be started from the Open Tool
menu from Jupyter (the main application of the workspace):
Within your workspace you have full root & sudo privileges to install any library or tool you need via terminal (e.g.,
pip
,apt-get
,conda
, ornpm
). You can find more ways to extend the workspace within the Extensibility section
Jupyter Notebook is a web-based interactive environment for writing and running code. The main building blocks of Jupyter are the file-browser, the notebook editor, and kernels. The file-browser provides an interactive file manager for all notebooks, files, and folders in the /workspace
directory.
A new notebook can be created by clicking on the New
drop-down button at the top of the list and selecting the desired language kernel.
You can spawn interactive terminal instances as well by selecting
New -> Terminal
in the file-browser.
The notebook editor enables users to author documents that include live code, markdown text, shell commands, LaTeX equations, interactive widgets, plots, and images. These notebook documents provide a complete and self-contained record of a computation that can be converted to various formats and shared with others.
This workspace has a variety of third-party Jupyter extensions activated. You can configure these extensions in the nbextensions configurator:
nbextensions
tab on the file browser
The Notebook allows code to be run in a range of different programming languages. For each notebook document that a user opens, the web application starts a kernel that runs the code for that notebook and returns output. This workspace has a Python 3 kernel pre-installed. Additional Kernels can be installed to get access to other languages (e.g., R, Scala, Go) or additional computing resources (e.g., GPUs, CPUs, Memory).
Python 2 is deprected and we do not recommend to use it. However, you can still install a Python 2.7 kernel via this command:
/bin/bash /resources/tools/python-27.sh
This workspace provides an HTTP-based VNC access to the workspace via noVNC. Thereby, you can access and work within the workspace with a fully-featured desktop GUI. To access this desktop GUI, go to Open Tool
, select VNC
, and click the Connect
button. In the case you are asked for a password, use vncpassword
.
Once you are connected, you will see a desktop GUI that allows you to install and use full-fledged web-browsers or any other tool that is available for Ubuntu. Within the Tools
folder on the desktop, you will find a collection of install scripts that makes it straightforward to install some of the most commonly used development tools, such as Atom, PyCharm, R-Runtime, R-Studio, or Postman (just double-click on the script).
Clipboard: If you want to share the clipboard between your machine and the workspace, you can use the copy-paste functionality as described below:
💡 Long-running tasks: Use the desktop GUI for long-running Jupyter executions. By running notebooks from the browser of your workspace desktop GUI, all output will be synchronized to the notebook even if you have disconnected your browser from the notebook.
Visual Studio Code (Open Tool -> VS Code
) is an open-source lightweight but powerful code editor with built-in support for a variety of languages and a rich ecosystem of extensions. It combines the simplicity of a source code editor with powerful developer tooling, like IntelliSense code completion and debugging. The workspace integrates VS Code as a web-based application accessible through the browser-based on the awesome code-server project. It allows you to customize every feature to your liking and install any number of third-party extensions.
The workspace also provides a VS Code integration into Jupyter allowing you to open a VS Code instance for any selected folder, as shown below:
JupyterLab (Open Tool -> JupyterLab
) is the next-generation user interface for Project Jupyter. It offers all the familiar building blocks of the classic Jupyter Notebook (notebook, terminal, text editor, file browser, rich outputs, etc.) in a flexible and powerful user interface. This JupyterLab instance comes pre-installed with a few helpful extensions such as a the jupyterlab-toc, jupyterlab-git, and juptyterlab-tensorboard.
Version control is a crucial aspect of productive collaboration. To make this process as smooth as possible, we have integrated a custom-made Jupyter extension specialized on pushing single notebooks, a full-fledged web-based Git client (ungit), a tool to open and edit plain text documents (e.g., .py
, .md
) as notebooks (jupytext), as well as a notebook merging tool (nbdime). Additionally, JupyterLab and VS Code also provide GUI-based Git clients.
For cloning repositories via https
, we recommend to navigate to the desired root folder and to click on the git
button as shown below:
This might ask for some required settings and, subsequently, opens ungit, a web-based Git client with a clean and intuitive UI that makes it convenient to sync your code artifacts. Within ungit, you can clone any repository. If authentication is required, you will get asked for your credentials.
To commit and push a single notebook to a remote Git repository, we recommend to use the Git plugin integrated into Jupyter, as shown below:
For more advanced Git operations, we recommend to use ungit. With ungit, you can do most of the common git actions such as push, pull, merge, branch, tag, checkout, and many more.
Jupyter notebooks are great, but they often are huge files, with a very specific JSON file format. To enable seamless diffing and merging via Git this workspace is pre-installed with nbdime. Nbdime understands the structure of notebook documents and, therefore, automatically makes intelligent decisions when diffing and merging notebooks. In the case you have merge conflicts, nbdime will make sure that the notebook is still readable by Jupyter, as shown below:
Furthermore, the workspace comes pre-installed with jupytext, a Jupyter plugin that reads and writes notebooks as plain text files. This allows you to open, edit, and run scripts or markdown files (e.g., .py
, .md
) as notebooks within Jupyter. In the following screenshot, we have opened a markdown file via Jupyter:
In combination with Git, jupytext enables a clear diff history and easy merging of version conflicts. With both of those tools, collaborating on Jupyter notebooks with Git becomes straightforward.
The workspace has a feature to share any file or folder with anyone via a token-protected link. To share data via a link, select any file or folder from the Jupyter directory tree and click on the share button as shown in the following screenshot:
This will generate a unique link protected via a token that gives anyone with the link access to view and download the selected data via the Filebrowser UI:
To deactivate or manage (e.g., provide edit permissions) shared links, open the Filebrowser via Open Tool -> Filebrowser
and select Settings->User Management
.
It is possible to securely access any workspace internal port by selecting Open Tool -> Access Port
. With this feature, you are able to access a REST API or web application running inside the workspace directly with your browser. The feature enables developers to build, run, test, and debug REST APIs or web applications directly from the workspace.
If you want to use an HTTP client or share access to a given port, you can select the Get shareable link
option. This generates a token-secured link that anyone with access to the link can use to access the specified port.
The HTTP app requires to be resolved from a relative URL path or configure a base path (
/tools/PORT/
). Tools made accessible this way are secured by the workspace's authentication system! If you decide to publish any other port of the container yourself instead of using this feature to make a tool accessible, please make sure to secure it via an authentication mechanism!
SSH provides a powerful set of features that enables you to be more productive with your development tasks. You can easily set up a secure and passwordless SSH connection to a workspace by selecting Open Tool -> SSH
. This will generate a secure setup command that can be run on any Linux or Mac machine to configure a passwordless & secure SSH connection to the workspace. Alternatively, you can also download the setup script and run it (instead of using the command).
The setup script only runs on Mac and Linux. Windows is currently not supported.
Just run the setup command or script on the machine from where you want to setup a connection to the workspace and input a name for the connection (e.g., my-workspace
). You might also get asked for some additional input during the process, e.g. to install a remote kernel if remote_ikernel
is installed. Once the passwordless SSH connection is successfully setup and tested, you can securely connect to the workspace by simply executing ssh my-workspace
.
Besides the ability to execute commands on a remote machine, SSH also provides a variety of other features that can improve your development workflow as described in the following sections.
ssh -nNT -L 5000:localhost:5901 my-workspace
> _To expose an application port from your local machine to a workspace, use the `-R` option (instead of `-L`)._
After the tunnel is established, you can use your favorite VNC viewer on your local machine and connect to `vnc://localhost:5000` (default password: `vncpassword`). To make the tunnel connection more resistant and reliable, we recommend to use [autossh](https://www.harding.motd.ca/autossh/) to automatically restart SSH tunnels in the case that the connection dies:
autossh -M 0 -f -nNT -L 5000:localhost:5901 my-workspace
Port tunneling is quite useful when you have started any server-based tool within the workspace that you like to make accessible for another machine. In its default setting, the workspace has a variety of tools already running on different ports, such as:
- `8080`: Main workspace port with access to all integrated tools.
- `8090`: Jupyter server.
- `8054`: VS Code server.
- `5901`: VNC server.
- `22`: SSH server.
You can find port information on all the tools in the [supervisor configuration](https://github.com/ml-tooling/ml-workspace/blob/main/resources/supervisor/supervisord.conf).
> 📖 _For more information about port tunneling/forwarding, we recommend [this guide](https://www.everythingcli.org/ssh-tunnelling-for-fun-and-profit-local-vs-remote/)._
scp ./local-file.txt my-workspace:/workspace
To copy the `/workspace` directory from `my-workspace` to the working directory of the local machine, execute:
scp -r my-workspace:/workspace .
> 📖 _For more information about scp, we recommend [this guide](https://www.garron.me/en/articles/scp.html)._
rsync -rlptzvP --delete --exclude=".git" "./local-project-folder/" "my-workspace:/workspace/remote-project-folder/"
If you have some changes inside the folder on the workspace, you can sync those changes back to the local folder by changing the source and destination arguments:
rsync -rlptzvP --delete --exclude=".git" "my-workspace:/workspace/remote-project-folder/" "./local-project-folder/"
You can rerun these commands each time you want to synchronize the latest copy of your files. Rsync will make sure that only updates will be transferred.
> 📖 _You can find more information about rsync on [this man page](https://linux.die.net/man/1/rsync)._
sshfs -o reconnect my-workspace:/workspace /local/folder/path
Once the remote directory is mounted, you can interact with the remote file system the same way as with any local directory and file.
> 📖 _For more information about sshfs, we recommend [this guide](https://www.digitalocean.com/community/tutorials/how-to-use-sshfs-to-mount-remote-file-systems-over-ssh)._
The workspace can be integrated and used as a remote runtime (also known as remote kernel/machine/interpreter) for a variety of popular development tools and IDEs, such as Jupyter, VS Code, PyCharm, Colab, or Atom Hydrogen. Thereby, you can connect your favorite development tool running on your local machine to a remote machine for code execution. This enables a local-quality development experience with remote-hosted compute resources.
These integrations usually require a passwordless SSH connection from the local machine to the workspace. To set up an SSH connection, please follow the steps explained in the SSH Access section.
# Change my-workspace with the name of a workspace SSH connection
remote_ikernel manage --add \
--interface=ssh \
--kernel_cmd="ipython kernel -f {connection_file}" \
--name="ml-server (Python)" \
--host="my-workspace"
You can use the remote_ikernel command line functionality to list (`remote_ikernel manage --show`) or delete (`remote_ikernel manage --delete Tensorboard provides a suite of visualization tools to make it easier to understand, debug, and optimize your experiment runs. It includes logging features for scalar, histogram, model structure, embeddings, and text & image visualization. The workspace comes pre-installed with jupyter_tensorboard extension that integrates Tensorboard into the Jupyter interface with functionalities to start, manage, and stop instances. You can open a new instance for a valid logs directory, as shown below:
If you have opened a Tensorboard instance in a valid log directory, you will see the visualizations of your logged data:
Tensorboard can be used in combination with many other ML frameworks besides Tensorflow. By using the tensorboardX library you can log basically from any python based library. Also, PyTorch has a direct Tensorboard integration as described here.
If you prefer to see the tensorboard directly within your notebook, you can make use of following Jupyter magic:
%load_ext tensorboard
%tensorboard --logdir /workspace/path/to/logs
The workspace provides two pre-installed web-based tools to help developers during model training and other experimentation tasks to get insights into everything happening on the system and figure out performance bottlenecks.
Netdata (Open Tool -> Netdata
) is a real-time hardware and performance monitoring dashboard that visualize the processes and services on your Linux systems. It monitors metrics about CPU, GPU, memory, disks, networks, processes, and more.
Glances (Open Tool -> Glances
) is a web-based hardware monitoring dashboard as well and can be used as an alternative to Netdata.
Netdata and Glances will show you the hardware statistics for the entire machine on which the workspace container is running.
A job is defined as any computational task that runs for a certain time to completion, such as a model training or a data pipeline.
The workspace image can also be used to execute arbitrary Python code without starting any of the pre-installed tools. This provides a seamless way to productize your ML projects since the code that has been developed interactively within the workspace will have the same environment and configuration when run as a job via the same workspace image.
docker run --env EXECUTE_CODE="git+https://github.com/ml-tooling/ml-workspace.git#subdirectory=resources/tests/ml-job" mltooling/ml-workspace:0.13.2
> 📖 _For additional information on how to specify branches, commits, or tags please refer to [this guide](https://pip.pypa.io/en/stable/reference/pip_install/#vcs-support)._
#### Run code mounted into the workspace
In the following example, we mount and execute the current working directory (expected to contain our code) into the `/workspace/ml-job/` directory of the workspace:
docker run -v "${PWD}:/workspace/ml-job/" --env EXECUTE_CODE="/workspace/ml-job/" mltooling/ml-workspace:0.13.2
#### Install Dependencies
In the case that the pre-installed workspace libraries are not compatible with your code, you can install or change dependencies by just adding one or multiple of the following files to your code directory:
- `requirements.txt`: [pip requirements format](https://pip.pypa.io/en/stable/user_guide/#requirements-files) for pip-installable dependencies.
- `environment.yml`: [conda environment file](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html?highlight=environment.yml#creating-an-environment-file-manually) to create a separate Python environment.
- `setup.sh`: A shell script executed via `/bin/bash`.
The execution order is 1. `environment.yml` -> 2. `setup.sh` -> 3. `requirements.txt`
#### Test job in interactive mode
You can test your job code within the workspace (started normally with interactive tools) by executing the following python script:
python /resources/scripts/execute_code.py /path/to/your/job
#### Build a custom job image
It is also possible to embed your code directly into a custom job image, as shown below:
FROM mltooling/ml-workspace:0.13.2
# Add job code to image
COPY ml-job /workspace/ml-job
ENV EXECUTE_CODE=/workspace/ml-job
# Install requirements only
RUN python /resources/scripts/execute_code.py --requirements-only
# Execute only the code at container startup
CMD ["python", "/resources/docker-entrypoint.py", "--code-only"]
The workspace is pre-installed with many popular interpreters, data science libraries, and ubuntu packages:
conda
, pip
, apt-get
, npm
, yarn
, sdk
, poetry
, gdebi
... The full list of installed tools can be found within the Dockerfile.
For every minor version release, we run vulnerability, virus, and security checks within the workspace using safety, clamav, trivy, and snyk via docker scan to make sure that the workspace environment is as secure as possible. We are committed to fix and prevent all high- or critical-severity vulnerabilities. You can find some up-to-date reports here.
The workspace provides a high degree of extensibility. Within the workspace, you have full root & sudo privileges to install any library or tool you need via terminal (e.g., pip
, apt-get
, conda
, or npm
). You can open a terminal by one of the following ways:
New -> Terminal
Applications -> Terminal Emulator
File -> New -> Terminal
Terminal -> New Terminal
Additionally, pre-installed tools such as Jupyter, JupyterLab, and Visual Studio Code each provide their own rich ecosystem of extensions. The workspace also contains a collection of installer scripts for many commonly used development tools or libraries (e.g., PyCharm
, Zeppelin
, RStudio
, Starspace
). You can find and execute all tool installers via Open Tool -> Install Tool
. Those scripts can be also executed from the Desktop VNC (double-click on the script within the Tools
folder on the Desktop VNC).
/resources/tools/zeppelin.sh --port=1234
After installation, refresh the Jupyter website and the Zeppelin tool will be available under `Open Tool -> Zeppelin`. Other tools might only be available within the Desktop VNC (e.g., `atom` or `pycharm`) or do not provide any UI (e.g., `starspace`, `docker-client`).
As an alternative to extending the workspace at runtime, you can also customize the workspace Docker image to create your own flavor as explained in the FAQ section.
# Extend from any of the workspace versions/flavors
FROM mltooling/ml-workspace:0.13.2
# Run you customizations, e.g.
RUN \
# Install r-runtime, r-kernel, and r-studio web server from provided install scripts
/bin/bash $RESOURCES_PATH/tools/r-runtime.sh --install && \
/bin/bash $RESOURCES_PATH/tools/r-studio-server.sh --install && \
# Cleanup Layer - removes unneccessary cache files
clean-layer.sh
Finally, use [docker build](https://docs.docker.com/engine/reference/commandline/build/) to build your customized Docker image.
> 📖 _For a more comprehensive Dockerfile example, take a look at the [Dockerfile of the R-flavor](https://github.com/ml-tooling/ml-workspace/blob/main/r-flavor/Dockerfile)._
docker run -d \
-p 8080:8080 \
--name "ml-workspace" \
-v "/path/on/host:/workspace" \
--env AUTHENTICATE_VIA_JUPYTER="mytoken" \
--restart always \
mltooling/ml-workspace:0.8.7
and needs to be updated to version `0.9.1`, you need to:
1. Stop and remove the running workspace container: `docker stop "ml-workspace" && docker rm "ml-workspace"`
2. Start a new workspace container with the newer image and same configuration: `docker run -d -p 8080:8080 --name "ml-workspace" -v "/path/on/host:/workspace" --env AUTHENTICATE_VIA_JUPYTER="mytoken" --restart always mltooling/ml-workspace:0.9.1`
Variable | Description | Default |
---|---|---|
VNC_PW | Password of VNC connection. This password only needs to be secure if the VNC server is directly exposed. If it is used via noVNC, it is already protected based on the configured authentication mechanism. | vncpassword |
VNC_RESOLUTION | Default desktop resolution of VNC connection. When using noVNC, the resolution will be dynamically adapted to the window size. | 1600x900 |
VNC_COL_DEPTH | Default color depth of VNC connection. | 24 |
# Create environment in the working directory
python -m venv my-venv
# Activate environment in shell
source ./my-venv/bin/activate
# Optional: Create Jupyter kernel for this environment
pip install ipykernel
python -m ipykernel install --user --name=my-venv --display-name="my-venv ($(python --version))"
# Optional: Close enviornment session
deactivate
**pipenv** (recommended):
To create a virtual environment via [pipenv](https://pipenv.pypa.io/en/latest/), execute the following commands:
# Create environment in the working directory
pipenv install
# Activate environment session in shell
pipenv shell
# Optional: Create Jupyter kernel for this environment
pipenv install ipykernel
python -m ipykernel install --user --name=my-pipenv --display-name="my-pipenv ($(python --version))"
# Optional: Close environment session
exit
**virtualenv**:
To create a virtual environment via [virtualenv](https://virtualenv.pypa.io/en/latest/), execute the following commands:
# Create environment in the working directory
virtualenv my-virtualenv
# Activate environment session in shell
source ./my-virtualenv/bin/activate
# Optional: Create Jupyter kernel for this environment
pip install ipykernel
python -m ipykernel install --user --name=my-virtualenv --display-name="my-virtualenv ($(python --version))"
# Optional: Close environment session
deactivate
**conda**:
To create a virtual environment via [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html), execute the following commands:
# Create environment (globally)
conda create -n my-conda-env
# Activate environment session in shell
conda activate my-conda-env
# Optional: Create Jupyter kernel for this environment
python -m ipykernel install --user --name=my-conda-env --display-name="my-conda-env ($(python --version))"
# Optional: Close environment session
conda deactivate
**Tip: Shell Commands in Jupyter Notebooks:**
If you install and use a virtual environment via a dedicated Jupyter Kernel and use shell commands within Jupyter (e.g. `!pip install matplotlib`), the wrong python/pip version will be used. To use the python/pip version of the selected kernel, do the following instead:
import sys
!{sys.executable} -m pip install matplotlib
# Install python vers
pipenv install --python=3.7.8
# Activate environment session in shell
pipenv shell
# Check python installation
python --version
# Optional: Create Jupyter kernel for this environment
pipenv install ipykernel
python -m ipykernel install --user --name=my-pipenv --display-name="my-pipenv ($(python --version))"
# Optional: Close environment session
exit
**pyenv**:
To install a different python version (e.g. `3.7.8`) within the workspace via [pyenv](https://github.com/pyenv/pyenv), execute the following commands:
# Install python version
pyenv install 3.7.8
# Make globally accessible
pyenv global 3.7.8
# Activate python version in shell
pyenv shell 3.7.8
# Check python installation
python3.7 --version
# Optional: Create Jupyter kernel for this python version
python3.7 -m pip install ipykernel
python3.7 -m ipykernel install --user --name=my-pyenv-3.7.8 --display-name="my-pyenv (Python 3.7.8)"
**conda**:
To install a different python version (e.g. `3.7.8`) within the workspace via [conda](https://github.com/pyenv/pyenv), execute the following commands:
# Create environment with python version
conda create -n my-conda-3.7 python=3.7.8
# Activate environment session in shell
conda activate my-conda-3.7
# Check python installation
python --version
# Optional: Create Jupyter kernel for this python version
pip install ipykernel
python -m ipykernel install --user --name=my-conda-3.7 --display-name="my-conda ($(python --version))"
# Optional: Close environment session
conda deactivate
**Tip: Shell Commands in Jupyter Notebooks:**
If you install and use another Python version via a dedicated Jupyter Kernel and use shell commands within Jupyter (e.g. `!pip install matplotlib`), the wrong python/pip version will be used. To use the python/pip version of the selected kernel, do the following instead:
import sys
!{sys.executable} -m pip install matplotlib
docker run --shm-size=2G mltooling/ml-workspace:0.13.2
import os
MAX_NUM_THREADS = int(os.getenv("MAX_NUM_THREADS"))
# Set in pytorch
import torch
torch.set_num_threads(MAX_NUM_THREADS)
# Set in tensorflow
import tensorflow as tf
config = tf.ConfigProto(
device_count={"CPU": MAX_NUM_THREADS},
inter_op_parallelism_threads=MAX_NUM_THREADS,
intra_op_parallelism_threads=MAX_NUM_THREADS,
)
tf_session = tf.Session(config=config)
# Set session for keras
import keras.backend as K
K.set_session(tf_session)
# Set in sklearn estimator
from sklearn.linear_model import LogisticRegression
LogisticRegression(n_jobs=MAX_NUM_THREADS).fit(X, y)
# Set for multiprocessing pool
from multiprocessing import Pool
with Pool(MAX_NUM_THREADS) as pool:
results = pool.map(lst)
exited: nginx (terminated by SIGILL (core dumped); not expected)
The OpenResty/Nginx binary package used within the workspace requires to run on a CPU with `SSE4.2` support (see [this issue](https://github.com/openresty/openresty/issues/267#issuecomment-309296900)). Unfortunately, some older CPUs do not have support for `SSE4.2` and, therefore, will not be able to run the workspace container. On Linux, you can check if your CPU supports `SSE4.2` when looking into the `cat /proc/cpuinfo` flags section. If you encounter this problem, feel free to notify us by commenting on the following issue: [#30](https://github.com/ml-tooling/ml-workspace/issues/30).
Requirements: Docker and Act are required to be installed on your machine to execute the build process.
To simplify the process of building this project from scratch, we provide build-scripts - based on universal-build - that run all necessary steps (build, test, and release) within a containerized environment. To build and test your changes, execute the following command in the project root folder:
act -b -j build
Under the hood it uses the build.py files in this repo based on the universal-build library. So, if you want to build it locally, you can also execute this command in the project root folder to build the docker container:
python build.py --make
For additional script options:
python build.py --help
Refer to our contribution guides for more detailed information on our build scripts and development process.
Licensed Apache 2.0. Created and maintained with ❤️ by developers from Berlin.
Please login to review this project.
No reviews for this project yet.
Open source workspace server and cloud IDE.
Web browser based IDE for R.
VS Code in the browser, hosted on a remote server.
Comments (0)
Please login to join the discussion on this project.