Appendix: Getting Started with Jupyter Notebooks (JupyterLab and Google Colab)

20. Appendix: Getting Started with Jupyter Notebooks (JupyterLab and Google Colab)#

1. Introduction to Jupyter Notebooks#

Jupyter Notebooks are interactive computing environments that allow you to create and share documents containing live code, equations, visualizations, and narrative text. They have become the standard tool for data science, research, and educational purposes.

1.1 Key Features:#

Interactive Computing: Execute code cells individually and see results immediately
Rich Media Support: Include text, images, graphs, and mathematical equations
Multiple Languages: Support for Python, R, Julia, Scala, and many others
Shareable: Easy to share with colleagues and the broader community
Reproducible: Others can run your notebooks and get the same results

1.2 Common Use Cases:#

Data analysis and visualization
Machine learning experiments
Educational materials and tutorials
Rapid prototyping
Scientific computing and research

1.3 Examples#

# Let's start with a simple example
print("Welcome to Jupyter Notebooks!")
print("This is a Python code cell.")

# You can execute mathematical operations
result = 2 + 2
print(f"2 + 2 = {result}")

Welcome to Jupyter Notebooks!
This is a Python code cell.
2 + 2 = 4

# Import common libraries used in data science
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Create a simple dataset
data = np.random.randn(100)
print(f"Generated {len(data)} random numbers")
print(f"Mean: {np.mean(data):.2f}")
print(f"Standard deviation: {np.std(data):.2f}")

Generated 100 random numbers
Mean: -0.07
Standard deviation: 1.01

2. Getting Started with Google Colab#

Google Colab (Colaboratory) is a free cloud-based Jupyter notebook environment that requires no setup and runs entirely in the browser.

2.1 Advantages of Google Colab:#

No Installation Required: Runs in your web browser
Free GPU/TPU Access: Hardware acceleration for machine learning
Pre-installed Libraries: Most popular data science libraries are already installed
Google Drive Integration: Easy file storage and sharing
Collaboration Features: Real-time collaboration like Google Docs

2.2 Getting Started with Colab:#

Access Google Colab:
- Go to colab.research.google.com
- Sign in with your Google account
Create a New Notebook:
- Click “New notebook” or “File” → “New notebook”
- Your notebook will be automatically saved to Google Drive
Basic Operations: check if we’re running in Colab

# This cell demonstrates basic Colab functionality
import sys
print(f"Python version: {sys.version}")

# Check if we're running in Colab
try:
    import google.colab
    print("Running in Google Colab")
    IN_COLAB = True
except ImportError:
    print("Not running in Google Colab")
    IN_COLAB = False

Python version: 3.12.6 (tags/v3.12.6:a4a2d2b, Sep  6 2024, 20:11:23) [MSC v.1940 64 bit (AMD64)]
Not running in Google Colab

Installing Additional Packages in Colab:

# Install packages not included by default
# Use ! to run shell commands
!pip install seaborn plotly

# Import the newly installed packages
import seaborn as sns
import plotly.express as px
print("Additional packages installed successfully!")

Additional packages installed successfully!

WARNING: Ignoring invalid distribution ~upyterlab (C:\Python312\Lib\site-packages)
WARNING: Ignoring invalid distribution ~upyterlab (C:\Python312\Lib\site-packages)
WARNING: Ignoring invalid distribution ~upyterlab (C:\Python312\Lib\site-packages)

Requirement already satisfied: seaborn in c:\python312\lib\site-packages (0.13.2)
Requirement already satisfied: plotly in c:\python312\lib\site-packages (6.2.0)
Requirement already satisfied: numpy!=1.24.0,>=1.20 in c:\python312\lib\site-packages (from seaborn) (2.3.1)
Requirement already satisfied: pandas>=1.2 in c:\python312\lib\site-packages (from seaborn) (2.3.1)
Requirement already satisfied: matplotlib!=3.6.1,>=3.4 in c:\python312\lib\site-packages (from seaborn) (3.10.3)
Requirement already satisfied: narwhals>=1.15.1 in c:\python312\lib\site-packages (from plotly) (1.47.1)
Requirement already satisfied: packaging in c:\python312\lib\site-packages (from plotly) (24.2)
Requirement already satisfied: contourpy>=1.0.1 in c:\python312\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.3.2)
Requirement already satisfied: cycler>=0.10 in c:\python312\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in c:\python312\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (4.58.5)
Requirement already satisfied: kiwisolver>=1.3.1 in c:\python312\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.4.8)
Requirement already satisfied: pillow>=8 in c:\python312\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (11.3.0)
Requirement already satisfied: pyparsing>=2.3.1 in c:\python312\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (3.2.3)
Requirement already satisfied: python-dateutil>=2.7 in c:\python312\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in c:\python312\lib\site-packages (from pandas>=1.2->seaborn) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in c:\python312\lib\site-packages (from pandas>=1.2->seaborn) (2025.2)
Requirement already satisfied: six>=1.5 in c:\python312\lib\site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.4->seaborn) (1.17.0)

Mounting Google Drive in Colab:

# Mount Google Drive to access your files
if IN_COLAB:
    from google.colab import drive
    drive.mount('/content/drive')
    print("Google Drive mounted successfully!")
    
    # You can now access files in your Google Drive
    # Files will be available at /content/drive/MyDrive/
    # You may open a terminal and use it to display the list of folders and files in your Google Drive. 
    # This can help verify that your files are correctly uploaded and accessible from the environment.

3. Setting Up JupyterLab#

JupyterLab is the next-generation web-based user interface for the Jupyter Project. It provides a more powerful and flexible environment than the classic Jupyter Notebook.

3.1 Installing JupyterLab#

The preferred way to install JupyterLab is by using pip.

# Install JupyterLab using pip
pip install jupyterlab

# Launch JupyterLab
jupyter lab

3.2 Launch JupyterLab#

To launch JupyterLab, open a command prompt or terminal, navigate to your target folder, and enter the following command:

# Launch JupyterLab
jupyter lab

3.3 JupyterLab Features#

JupyterLab provides enhanced features over classic Jupyter:

Flexible user interface
Multiple document support
File browser integrationTerminal integration
Extension system
Drag and drop functionality
Multiple kernels support

3.4 JupyterLab Extensions#

JupyterLab extensions can be installed via pip. Below are some popular JupyterLab extensions:

Variable Inspector: Shows variables in the current namespace
Table of Contents: Generates TOC of contents from notebook headers
Git: Git integration for version control
Plotly: Enhanced support for Plotly visualizations
Spreadsheet: Provides Excel-like spreadsheet functionality

3.5 Customizing JupyterLab#

You can customize JupyterLab through the Settings menu or by editing configuration files directly. Common customizations include:

Theme selection (dark/light)
Keyboard shortcuts
Extension management
Notebook cell execution settings
File browser preferences

These settings allow users to tailor the JupyterLab environment to their workflow and preferences.

4. Google Colab vs JupyterLab#

4.1 Comparing Google Colab vs JupyterLab#

Here’s a detailed comparison between Google Colab and JupyterLab across various dimensions:

Feature	Google Colab	JupyterLab
Cost	Free (with usage limits)	Free (self-hosted) Only costs for hardware/cloud hosting
Setup Required	None - runs in browser Just need Google account	Installation required Python + JupyterLab installation
Hardware Access	Free GPU/TPU access High-end cloud hardware Limited session time	Local hardware only Performance depends on your machine No session time limits
Storage	Google Drive integration 15GB free storage Automatic cloud backup	Local file system Unlimited local storage Manual backup required
Collaboration	Real-time collaboration Easy sharing via links Comment and suggestion features	Limited collaboration Share via file export Version control with Git
Offline Access	No - requires internet connection Cannot work offline	Yes - fully offline capable No internet required for work
Customization	Limited theme options Basic settings only Cannot install extensions	Highly customizable Extensive themes and extensions Full UI customization
Library Management	Pre-installed common libraries Easy pip install with !pip Limited to available packages	Full control over environment Any Python package Virtual environments supported
Performance	High-performance cloud hardware Fast for ML workloads Shared resources	Depends on local hardware Dedicated resources Can be optimized for specific tasks
Data Privacy	Data stored on Google servers Subject to Google’s privacy policy Not ideal for sensitive data	Full control over data Data stays on your machine Complete privacy control
Session Persistence	Sessions timeout (12-24 hours) Runtime disconnections Variables lost on timeout	Persistent sessions No automatic timeouts Variables retained until restart
File Management	Google Drive interface Limited file operations Cloud-based file system	Full file system access Advanced file operations Integration with OS
Debugging Tools	Basic debugging features Limited debugging options Browser-based tools only	Advanced debugging Full IDE features Integration with debugging tools
Version Control	Basic Git support Manual version management Limited Git integration	Full Git integration Advanced version control Branch management
Multi-language Support	Python focus Limited other language support R and Swift available	Python, R, Julia, Scala Multiple kernels supported Language-specific features

4.2 When to Use Google Colab#

Choose Google Colab when:

Getting started with data science or machine learning
You need free GPU or TPU access
You are working on small to medium projects
Collaboration is important
You don’t want to manage a local environment
You use it occasionally or for learning purposes
You need to share your work easily

4.3 When to Use JupyterLab#

Choose JupyterLab when:

You are working with sensitive or proprietary data
You need offline access
You require specific software versions
You are working with large datasets
You need extensive customization
You are in a professional or enterprise environment
You want full control over the environment
You are working with multiple programming languages

5. Installing and Managing Libraries#

One of the most important aspects of working with Jupyter notebooks is managing libraries and packages. The approach differs between Google Colab and JupyterLab.

5.1 Understanding Package Management Systems#

First, let’s understand the different package managers:

import sys
print(f"Python version: {sys.version}")
print(f"Python executable: {sys.executable}")

# Check which package managers are available
import subprocess
import os

def check_package_manager(command):
    try:
        result = subprocess.run([command, '--version'], 
                              capture_output=True, text=True, timeout=5)
        return result.returncode == 0
    except:
        return False

managers = {
    'pip': check_package_manager('pip'),
    'conda': check_package_manager('conda'),
    'mamba': check_package_manager('mamba')
}

print("\nAvailable package managers:")
for manager, available in managers.items():
    status = "✓ Available" if available else "✗ Not available"
    print(f"  {manager}: {status}")

Python version: 3.12.6 (tags/v3.12.6:a4a2d2b, Sep  6 2024, 20:11:23) [MSC v.1940 64 bit (AMD64)]
Python executable: C:\Python312\python.exe

Available package managers:
  pip: ✓ Available
  conda: ✗ Not available
  mamba: ✗ Not available

5.2 Installing Libraries in Google Colab#

This code demonstrates six common ways to install Python libraries in a Google Colab environment using pip:

# Method 1: Using pip (most common in Colab)
# The ! prefix runs shell commands
!pip install seaborn plotly scikit-learn

# Method 2: Installing specific versions
!pip install pandas==1.5.3 numpy>=1.20.0

# Method 3: Installing from requirements file
# First, create a requirements.txt file
requirements_content = """
matplotlib>=3.5.0
seaborn>=0.11.0
plotly>=5.0.0
scikit-learn>=1.0.0
requests>=2.25.0
"""

# Write to file (in Colab)
with open('requirements.txt', 'w') as f:
    f.write(requirements_content)

# Install from requirements file
!pip install -r requirements.txt

print("Libraries installed successfully in Colab!")

# Method 4: Installing from GitHub repositories
!pip install git+https://github.com/user/repository.git

# Method 5: Installing with specific options
!pip install --upgrade tensorflow  # Upgrade existing package
!pip install --no-cache-dir torch   # Install without using cache
!pip install --user package_name    # Install for current user only

# Method 6: Installing development versions
!pip install --pre scikit-learn  # Install pre-release version

5.3 Checking Installed Libraries in Google Colab#

This code demonstrates different ways to check the installation status of Python packages in a Google Colab environment:

Using pip list with grep: Filters and displays installed packages matching pandas, numpy, or matplotlib.
Using pip show: Retrieves detailed metadata (version, location, dependencies) for the pandas package.
Programmatic Check: Defines a function to check if a given package is installed using import. It then loops through a list of common libraries (pandas, numpy, matplotlib, seaborn, plotly) and prints whether each is installed.

# Checking installed packages in Colab
!pip list | grep -E "(pandas|numpy|matplotlib)"

# Get detailed information about a package
!pip show pandas

# Check if a package is installed programmatically
def check_package_installed(package_name):
    try:
        __import__(package_name)
        return True
    except ImportError:
        return False

packages_to_check = ['pandas', 'numpy', 'matplotlib', 'seaborn', 'plotly']
print("Package installation status:")
for package in packages_to_check:
    status = "✓ Installed" if check_package_installed(package) else "✗ Not installed"
    print(f"  {package}: {status}")

5.4 Installing Libraries in JupyterLab#

This code outlines multiple methods for installing Python packages in a JupyterLab environment, with a focus on ensuring the packages are installed into the correct Python environment:

# Method 1: Using pip from within notebook
import sys
!{sys.executable} -m pip install seaborn plotly scikit-learn

# This ensures pip installs to the same Python environment as the notebook

# Method 2: Using conda (if available)
# Note: This works if JupyterLab is running in a conda environment
!conda install -c conda-forge seaborn plotly scikit-learn -y

# Method 3: Using mamba (faster conda alternative)
!mamba install -c conda-forge seaborn plotly scikit-learn -y

# Method 4: Installing in the correct environment
# This is crucial when you have multiple Python environments

# Check current environment
import os
print(f"Current Python executable: {sys.executable}")
print(f"Current working directory: {os.getcwd()}")

# Install to current environment
!{sys.executable} -m pip install --user package_name

5.5 Managing Virtual Environments#

a virtual environment is an isolated Python workspace that allows you to install and manage packages separately from the system-wide Python installation. Using a virtual environment ensures that:

Projects do not interfere with each other’s dependencies.
You can safely test or upgrade packages without breaking other work.
Your Jupyter Notebook or Colab environment remains clean and reproducible.

This code introduces virtual environment best practices and demonstrates how to create and inspect a requirements.txt file in a Python or Jupyter environment:

# Understanding virtual environments
print("Virtual Environment Best Practices:")
print("1. Create separate environments for different projects")
print("2. Use requirements.txt to track dependencies")
print("3. Activate the correct environment before starting Jupyter")

# Example of creating a requirements.txt from current environment
!pip freeze > requirements.txt

# Read and display the requirements
try:
    with open('requirements.txt', 'r') as f:
        requirements = f.read()
    print("\nCurrent environment requirements:")
    print(requirements[:500] + "..." if len(requirements) > 500 else requirements)
except FileNotFoundError:
    print("No requirements.txt file found")

Virtual Environment Best Practices:
1. Create separate environments for different projects
2. Use requirements.txt to track dependencies
3. Activate the correct environment before starting Jupyter

Current environment requirements:
annotated-types==0.7.0
anyio==3.7.1
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens==3.0.0
async-lru==2.0.4
attrs==24.3.0
babel==2.16.0
beautifulsoup4==4.12.3
bleach==6.2.0
blinker==1.8.2
certifi==2024.12.14
cffi==1.17.1
charset-normalizer==3.4.0
click==8.1.7
colorama==0.4.6
comm==0.2.2
contourpy==1.3.2
cycler==0.12.1
debugpy==1.8.11
decorator==5.1.1
defusedxml==0.7.1
executing==2.1.0
fastapi==0.104.1
fastjsonschema==2.21.1
Flask==2.3.3
Flask-Cors==4.0.0
fonttools==4.58.5...

WARNING: Ignoring invalid distribution ~upyterlab (C:\Python312\Lib\site-packages)

6. Basic Operations and Examples#

Let’s explore basic operations that work in both environments:

6.1 Working with Data#

This code generates and displays a sample time-series dataset using NumPy and pandas, commonly used for data analysis or visualization practice.

# Create sample data
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Generate sample dataset
np.random.seed(42)
dates = pd.date_range('2023-01-01', periods=100, freq='D')
values = np.cumsum(np.random.randn(100)) + 100

df = pd.DataFrame({
    'date': dates,
    'value': values,
    'category': np.random.choice(['A', 'B', 'C'], 100)
})

print("Sample Dataset:")
print(df.head())
print(f"\nDataset shape: {df.shape}")

Sample Dataset:
        date       value category
0 2023-01-01  100.496714        A
1 2023-01-02  100.358450        B
2 2023-01-03  101.006138        A
3 2023-01-04  102.529168        A
4 2023-01-05  102.295015        C

Dataset shape: (100, 3)

6.2 Data Visualization#

This code generates two side-by-side visualizations using matplotlib to explore a sample dataset (df), providing insights into time series trends and categorical distribution.

# Create visualizations
plt.figure(figsize=(12, 4))

# Time series plot
plt.subplot(1, 2, 1)
plt.plot(df['date'], df['value'])
plt.title('Time Series Data')
plt.xlabel('Date')
plt.ylabel('Value')
plt.xticks(rotation=45)

# Category distribution
plt.subplot(1, 2, 2)
df['category'].value_counts().plot(kind='bar')
plt.title('Category Distribution')
plt.xlabel('Category')
plt.ylabel('Count')

plt.tight_layout()
plt.show()

../../_images/76641caea6882a79d7a94cd77cfc09e9601c8a2fdb593d48e6f9cb4e6c8c7621.png

6.3 Statistical Analysis#

This code performs basic statistical analysis on the df DataFrame, summarizing the numerical column ‘value’ and exploring category-based differences.

# Basic statistical analysis
print("Statistical Summary:")
print(df['value'].describe())

# Convert dates to numeric format for correlation
date_numeric = df['date'].map(pd.Timestamp.toordinal)
correlation = date_numeric.corr(df['value'])
print(f"\nCorrelation between date and value: {correlation:.3f}")

# Group by category
category_stats = df.groupby('category')['value'].agg(['mean', 'std', 'count'])
print("\nStatistics by Category:")
print(category_stats)

Statistical Summary:
count    100.000000
mean      93.594818
std        4.643998
min       87.753344
25%       90.132492
50%       91.727866
75%       95.940029
max      104.480611
Name: value, dtype: float64

Correlation between date and value: -0.800

Statistics by Category:
               mean       std  count
category                            
A         93.933355  5.040641     34
B         94.168522  4.683877     30
C         92.797001  4.221481     36

6.4 Working with External Data#

This code explains common methods to load external data into a Python environment (JupyterLab or Google Colab) and demonstrates loading a sample dataset from a public URL.

# Example of reading data from different sources
# Note: This would work differently in Colab vs JupyterLab

print("Methods to load external data:")
print("1. Local files (JupyterLab): pd.read_csv('local_file.csv')")
print("2. Google Drive (Colab): pd.read_csv('/content/drive/MyDrive/file.csv')")
print("3. URLs (both): pd.read_csv('https://example.com/data.csv')")
print("4. APIs (both): Using requests library")

# Example with a public dataset
try:
    # This URL provides sample data
    url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv"
    iris_df = pd.read_csv(url)
    print(f"\nLoaded Iris dataset: {iris_df.shape}")
    print(iris_df.head())
except Exception as e:
    print(f"Could not load external data: {e}")

Methods to load external data:
Local files (JupyterLab): pd.read_csv('local_file.csv')
Google Drive (Colab): pd.read_csv('/content/drive/MyDrive/file.csv')
URLs (both): pd.read_csv('https://example.com/data.csv')
APIs (both): Using requests library

Loaded Iris dataset: (150, 5)
   sepal_length  sepal_width  petal_length  petal_width species
         5.1          3.5           1.4          0.2  setosa
         4.9          3.0           1.4          0.2  setosa
         4.7          3.2           1.3          0.2  setosa
         4.6          3.1           1.5          0.2  setosa
         5.0          3.6           1.4          0.2  setosa