1 Introduction & Setup

1.1 What is Machine Learning?

Machine Learning (ML) is a field of artificial intelligence that enables computers to learn from data without being explicitly programmed. Instead of writing rules manually, we feed data to algorithms that discover patterns and make predictions.

Example: Instead of programming rules like “if email contains ‘prize’ and ‘click here’, mark as spam,” ML algorithms learn what spam looks like from thousands of examples.

1.2 Supervised vs Unsupervised Learning

Machine learning can be broadly categorized into two main types:

1.2.1 Supervised Learning

In supervised learning, we have labeled data - we know the correct answers.

Use cases: - Regression: Predicting continuous values (house prices, temperature, sales) - Classification: Predicting categories (spam/not spam, cat/dog, disease diagnosis)

Example: Training a model to predict house prices using historical data where we know both features (size, location, bedrooms) and the actual prices.

1.2.2 Unsupervised Learning

In unsupervised learning, we have unlabeled data - we want to discover hidden patterns.

Use cases: - Clustering: Grouping similar data points (customer segmentation, document organization) - Dimensionality Reduction: Reducing features while preserving information (data visualization, compression) - Anomaly Detection: Finding unusual patterns (fraud detection, system monitoring)

Example: Grouping customers into segments based on purchasing behavior without pre-defined categories.

1.2.3 Other Types (Brief Mention)

Semi-supervised Learning: Mix of labeled and unlabeled data
Reinforcement Learning: Learning through trial and error with rewards (game AI, robotics)

This book focuses on supervised and unsupervised learning using scikit-learn.

1.3 Environment Setup

1.3.1 System Requirements

Before starting, ensure you have:

Operating System: Linux, macOS, or Windows
Python: Version 3.8 or higher (python.org)

1.3.2 Option 1: Local Setup (Recommended for Long-term Learning)

1.3.2.1 Step 1: Create a Virtual Environment

Using a virtual environment keeps your project dependencies isolated and manageable.

On Linux/macOS:

# Create virtual environment
python3 -m venv mlbook-env

# Activate it
source mlbook-env/bin/activate

On Windows:

# Create virtual environment
python -m venv mlbook-env

# Activate it
mlbook-env\Scripts\activate

1.3.2.2 Step 2: Install Required Libraries

pip install jupyter numpy pandas scikit-learn matplotlib seaborn

1.3.2.3 Step 3: Optional Libraries (for Chapter 12)

pip install xgboost

1.3.2.4 Step 4: Verify Installation

import numpy as np
import pandas as pd
import sklearn
import matplotlib.pyplot as plt
import seaborn as sns

print(f"NumPy: {np.__version__}")
print(f"Pandas: {pd.__version__}")
print(f"Scikit-learn: {sklearn.__version__}")
print("All libraries loaded successfully!")

1.3.3 Option 2: Cloud-Based Platforms (Quick Start)

If you want to start immediately without local setup, these free cloud platforms are great alternatives:

Popular Options:

Google Colab (colab.research.google.com)
- Free GPU access
- Pre-installed ML libraries
- Works in browser
- Saves to Google Drive
Kaggle Notebooks (kaggle.com/code)
- Free TPU/GPU access
- Large dataset library
- Community competitions
- 30+ hours/week of GPU
Jupyter.org (jupyter.org/try)
- Try Jupyter without installation
- Temporary sessions
- Good for quick experiments

Note: While cloud platforms are convenient for getting started, having your own local setup gives you: - Full control over your environment - No internet dependency - Better for learning and experimentation - Privacy for your data and code

1.3.4 Building Your Own ML Machine

For serious machine learning work, especially deep learning, you may eventually want a dedicated machine with a GPU. For comprehensive guides on building or buying ML-capable hardware, visit tensorrigs.com - they provide detailed tutorials and recommendations for ML workstations at various budgets.

1.4 Your First Model (5-Minute Example)

Let’s build a complete ML pipeline in under 10 lines of code:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)

# Evaluate
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy:.2%}")

Congratulations! You just built a classifier with 90%+ accuracy.

1.5 Troubleshooting

Common Installation Issues

“No module named ‘sklearn’” - Solution: Make sure your virtual environment is activated - Run: pip install scikit-learn

Jupyter kernel not found - Solution: python -m ipykernel install --user --name=mlbook-env

Import errors in Colab/Kaggle - Both platforms have everything pre-installed! - Just run: import sklearn

For more troubleshooting help: - Visit TensorRigs Documentation for detailed guides on resolving common ML/DL errors, environment setup issues, and hardware configuration

1.6 What’s Next?

In the next chapter, we’ll dive deep into data preparation - the foundation of any successful ML project.

1.7 Summary

Machine learning enables computers to learn from data
Supervised learning uses labeled data for prediction
Unsupervised learning discovers patterns in unlabeled data
Scikit-learn is the go-to library for classical ML
Building models involves: load data → split → train → evaluate