Python Env for Scientific Experiment

3 minute read

Published:

As we usually need to work on various project, the environment for the experiment some times different from the local environment. Docker is one of the option, to run on the virtual machine. However, Conda or mamba is more like the standard for current academia.

Creating Conda Environments for Scientific Experiments: A Step-by-Step Guide

When working on data science or machine learning experiments, environment isolation is crucial for reproducibility. Conda (via Anaconda/Miniconda) is a powerful tool to manage dependencies and Python versions. Below is a detailed workflow to create and manage Conda environments tailored for experimental projects.


Why Use Conda for Experiments?

  • Isolation: Prevent dependency conflicts between projects
  • Reproducibility: Share exact environment specifications
  • Python Version Control: Test experiments across Python versions
  • Cross-Platform Support: Works on Windows/Linux/macOS

Step 1: Install Miniconda

If you don’t have Conda installed:

# For Linux/macOS
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

# For Windows: Download installer from https://docs.conda.io/en/latest/miniconda.html

Step 2: Create a Dedicated Environment

Basic Creation

conda create --name experiment_env python=3.11 -y

Specify Package Versions

conda create --name precision_test python=3.11 numpy=1.24 pandas=2.0 scikit-learn=1.3 -y

Channel Management

For specialized packages:

conda create --name bio_exp --channel bioconda --channel conda-forge python=3.11 samtools=1.17 -y

Step 3: Activate and Verify

conda activate experiment_env

# Check Python version
python --version

# View installed packages
conda list

# Check environment location
conda env list

Step 4: Environment Configuration

Export Environment Specs

# For sharing exact versions
conda env export > environment.yml

# For cross-platform sharing (no build info)
conda env export --from-history > simplified_env.yml

Create from YAML

conda env create -f environment.yml

Step 5: Package Management

Install During Experimentation

conda install -c conda-forge tensorflow=2.13

# For PyPI packages
conda install pip
pip install experimental-package

Snapshot Versions

conda list --export > requirements.txt
conda list --explicit > spec-file.txt

Step 6: Jupyter Integration

conda install ipykernel
python -m ipykernel install --user --name experiment_env --display-name "Python 3.11 (Experiment)"

Best Practices for Experiments

  1. Naming Conventions
    • Use descriptive names: nlp_bert_exp, bio_rnaseq_2023
    • Include date/version: cv_project_v2_oct23
  2. Isolation Strategies
    # Separate environment for each experiment phase
    conda create --name data_preprocessing python=3.11
    conda create --name model_training python=3.11
    conda create --name result_analysis python=3.11
    
  3. GPU/CPU Optimization
    conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
    
  4. Size Management
    # Clean unused packages
    conda clean --all -y
    

Troubleshooting Common Issues

1. Environment Solving Takes Forever

Fix:

conda config --set solver libmamba
conda install -n base conda-libmamba-solver

2. Missing Python Version

Use conda-forge:

conda create --name py311_exp --channel conda-forge python=3.11

3. Broken Environment

Export packages and recreate:

conda env export > emergency_backup.yml
conda remove --name broken_env --all
conda env create -f emergency_backup.yml

Experimental Workflow Example

# Create environment
conda create --name drug_discovery python=3.11 pandas=2.0 scipy=1.11 -y

# Activate and work
conda activate drug_discovery

# Install temporary experiment package
conda install --channel experimental_channel prototype_ml=0.9b

# When experiment concludes
conda env export > successful_trial_20231015.yml
conda env remove --name drug_discovery

Conclusion

Proper Conda environment management enables:

  • ✔️ Clean separation between experiments
  • ✔️ Easy replication of successful trials
  • ✔️ Safe testing of bleeding-edge packages
  • ✔️ Efficient resource utilization

Remember to:
🚨 Regularly prune unused environments (conda env list)
📝 Document environment changes in lab notebooks
💾 Backup critical environment YAML files

By following these practices, you’ll spend less time debugging dependencies and more time doing actual science!


Let me know if you’d like me to expand on any specific section or add additional workflow examples!