Python Env for Scientific Experiment
Published:
As we usually need to work on various project, the environment for the experiment some times different from the local environment. Docker is one of the option, to run on the virtual machine. However, Conda or mamba is more like the standard for current academia.
Creating Conda Environments for Scientific Experiments: A Step-by-Step Guide
When working on data science or machine learning experiments, environment isolation is crucial for reproducibility. Conda (via Anaconda/Miniconda) is a powerful tool to manage dependencies and Python versions. Below is a detailed workflow to create and manage Conda environments tailored for experimental projects.
Why Use Conda for Experiments?
- Isolation: Prevent dependency conflicts between projects
- Reproducibility: Share exact environment specifications
- Python Version Control: Test experiments across Python versions
- Cross-Platform Support: Works on Windows/Linux/macOS
Step 1: Install Miniconda
If you don’t have Conda installed:
# For Linux/macOS
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
# For Windows: Download installer from https://docs.conda.io/en/latest/miniconda.html
Step 2: Create a Dedicated Environment
Basic Creation
conda create --name experiment_env python=3.11 -y
Specify Package Versions
conda create --name precision_test python=3.11 numpy=1.24 pandas=2.0 scikit-learn=1.3 -y
Channel Management
For specialized packages:
conda create --name bio_exp --channel bioconda --channel conda-forge python=3.11 samtools=1.17 -y
Step 3: Activate and Verify
conda activate experiment_env
# Check Python version
python --version
# View installed packages
conda list
# Check environment location
conda env list
Step 4: Environment Configuration
Export Environment Specs
# For sharing exact versions
conda env export > environment.yml
# For cross-platform sharing (no build info)
conda env export --from-history > simplified_env.yml
Create from YAML
conda env create -f environment.yml
Step 5: Package Management
Install During Experimentation
conda install -c conda-forge tensorflow=2.13
# For PyPI packages
conda install pip
pip install experimental-package
Snapshot Versions
conda list --export > requirements.txt
conda list --explicit > spec-file.txt
Step 6: Jupyter Integration
conda install ipykernel
python -m ipykernel install --user --name experiment_env --display-name "Python 3.11 (Experiment)"
Best Practices for Experiments
- Naming Conventions
- Use descriptive names:
nlp_bert_exp
,bio_rnaseq_2023
- Include date/version:
cv_project_v2_oct23
- Use descriptive names:
- Isolation Strategies
# Separate environment for each experiment phase conda create --name data_preprocessing python=3.11 conda create --name model_training python=3.11 conda create --name result_analysis python=3.11
- GPU/CPU Optimization
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
- Size Management
# Clean unused packages conda clean --all -y
Troubleshooting Common Issues
1. Environment Solving Takes Forever
Fix:
conda config --set solver libmamba
conda install -n base conda-libmamba-solver
2. Missing Python Version
Use conda-forge:
conda create --name py311_exp --channel conda-forge python=3.11
3. Broken Environment
Export packages and recreate:
conda env export > emergency_backup.yml
conda remove --name broken_env --all
conda env create -f emergency_backup.yml
Experimental Workflow Example
# Create environment
conda create --name drug_discovery python=3.11 pandas=2.0 scipy=1.11 -y
# Activate and work
conda activate drug_discovery
# Install temporary experiment package
conda install --channel experimental_channel prototype_ml=0.9b
# When experiment concludes
conda env export > successful_trial_20231015.yml
conda env remove --name drug_discovery
Conclusion
Proper Conda environment management enables:
- ✔️ Clean separation between experiments
- ✔️ Easy replication of successful trials
- ✔️ Safe testing of bleeding-edge packages
- ✔️ Efficient resource utilization
Remember to:
🚨 Regularly prune unused environments (conda env list
)
📝 Document environment changes in lab notebooks
💾 Backup critical environment YAML files
By following these practices, you’ll spend less time debugging dependencies and more time doing actual science!
Let me know if you’d like me to expand on any specific section or add additional workflow examples!