import numpy as np
import pandas as pd
import sklearn
import matplotlib.pyplot as plt
import seaborn as sns
print(f"NumPy: {np.__version__}")
print(f"Pandas: {pd.__version__}")
print(f"Scikit-learn: {sklearn.__version__}")
print("All libraries loaded successfully!")1 Introduction & Setup
1.1 What is Machine Learning?
Machine Learning (ML) is a field of artificial intelligence that enables computers to learn from data without being explicitly programmed. Instead of writing rules manually, we feed data to algorithms that discover patterns and make predictions.
Example: Instead of programming rules like “if email contains ‘prize’ and ‘click here’, mark as spam,” ML algorithms learn what spam looks like from thousands of examples.
1.2 Supervised vs Unsupervised Learning
Machine learning can be broadly categorized into two main types:
1.2.1 Supervised Learning
In supervised learning, we have labeled data - we know the correct answers.
Use cases: - Regression: Predicting continuous values (house prices, temperature, sales) - Classification: Predicting categories (spam/not spam, cat/dog, disease diagnosis)
Example: Training a model to predict house prices using historical data where we know both features (size, location, bedrooms) and the actual prices.
1.2.2 Unsupervised Learning
In unsupervised learning, we have unlabeled data - we want to discover hidden patterns.
Use cases: - Clustering: Grouping similar data points (customer segmentation, document organization) - Dimensionality Reduction: Reducing features while preserving information (data visualization, compression) - Anomaly Detection: Finding unusual patterns (fraud detection, system monitoring)
Example: Grouping customers into segments based on purchasing behavior without pre-defined categories.
1.2.3 Other Types (Brief Mention)
- Semi-supervised Learning: Mix of labeled and unlabeled data
- Reinforcement Learning: Learning through trial and error with rewards (game AI, robotics)
This book focuses on supervised and unsupervised learning using scikit-learn.
1.3 Environment Setup
1.3.1 System Requirements
Before starting, ensure you have:
- Operating System: Linux, macOS, or Windows
- Python: Version 3.8 or higher (python.org)
1.3.2 Option 1: Local Setup (Recommended for Long-term Learning)
1.3.2.1 Step 1: Create a Virtual Environment
Using a virtual environment keeps your project dependencies isolated and manageable.
On Linux/macOS:
# Create virtual environment
python3 -m venv mlbook-env
# Activate it
source mlbook-env/bin/activateOn Windows:
# Create virtual environment
python -m venv mlbook-env
# Activate it
mlbook-env\Scripts\activate1.3.2.2 Step 2: Install Required Libraries
pip install jupyter numpy pandas scikit-learn matplotlib seaborn1.3.2.3 Step 3: Optional Libraries (for Chapter 12)
pip install xgboost1.3.2.4 Step 4: Verify Installation
1.3.3 Option 2: Cloud-Based Platforms (Quick Start)
If you want to start immediately without local setup, these free cloud platforms are great alternatives:
Popular Options:
- Google Colab (colab.research.google.com)
- Free GPU access
- Pre-installed ML libraries
- Works in browser
- Saves to Google Drive
 
- Kaggle Notebooks (kaggle.com/code)
- Free TPU/GPU access
- Large dataset library
- Community competitions
- 30+ hours/week of GPU
 
- Jupyter.org (jupyter.org/try)
- Try Jupyter without installation
- Temporary sessions
- Good for quick experiments
 
Note: While cloud platforms are convenient for getting started, having your own local setup gives you: - Full control over your environment - No internet dependency - Better for learning and experimentation - Privacy for your data and code
1.3.4 Building Your Own ML Machine
For serious machine learning work, especially deep learning, you may eventually want a dedicated machine with a GPU. For comprehensive guides on building or buying ML-capable hardware, visit tensorrigs.com - they provide detailed tutorials and recommendations for ML workstations at various budgets.
1.4 Your First Model (5-Minute Example)
Let’s build a complete ML pipeline in under 10 lines of code:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
# Load data
iris = load_iris()
X, y = iris.data, iris.target
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train model
model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)
# Evaluate
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy:.2%}")Congratulations! You just built a classifier with 90%+ accuracy.
1.5 Troubleshooting
1.6 What’s Next?
In the next chapter, we’ll dive deep into data preparation - the foundation of any successful ML project.
1.7 Summary
- Machine learning enables computers to learn from data
- Supervised learning uses labeled data for prediction
- Unsupervised learning discovers patterns in unlabeled data
- Scikit-learn is the go-to library for classical ML
- Building models involves: load data → split → train → evaluate