AI

Artificial Intelligence Tutorial

From Data Science fundamentals to RAG, Agent AI, and beyond โ€” based on structured classroom notes

1 Data Science

Data Science is the field of collecting, cleaning, analyzing, and interpreting data to find useful information and make decisions.

Data Science Umbrella
Artificial Intelligence
โ†’
Machine Learning
โ†’
Deep Learning
โ†’
Generative AI
โ†’
Agent AI

๐Ÿ”‘ Common Tools in Data Science

Python โ€” programming language
Pandas โ€” data manipulation
NumPy โ€” numerical computing

2 What is Artificial Intelligence?

Artificial Intelligence (AI) is a field of computer science that focuses on creating machines that can perform tasks that normally require human intelligence.

๐Ÿ’ก Example

ChatGPT โ€” understands language and generates human-like responses using AI.

Python vs Java for AI

Comparison
Feature Python Java
Type System Dynamic Static
Code Length Less lines More lines
Learning Easier Complicated
Primary Use AI, Data Science, ML Enterprise, Android

3 Machine Learning (ML)

Machine Learning is a subset of AI where computers learn patterns from data and improve their performance without being explicitly programmed.

๐Ÿ”‘ Example Library

Scikit-learn โ€” the most popular Python library for classical machine learning algorithms.

4 Types of Machine Learning

Supervised Machine Learning

The model is trained using labeled data, meaning the correct output is already known. The algorithm is trained on labelled data where the answer is provided.

Unsupervised Machine Learning

The model works with unlabeled data and tries to find patterns or structures in the data. No need to provide trained/labeled data โ€” the model discovers hidden patterns on its own.

Reinforcement Learning

Learning through rewards and penalties. An agent learns by interacting with an environment and receiving rewards or penalties. The agent learns to make decisions by performing actions in an environment to maximize cumulative rewards.

Ensemble Learning

A technique that combines multiple machine learning models to produce a better and more accurate prediction. Combining multiple models for better accuracy.

Neural Network

A computational model inspired by the human brain, consisting of interconnected nodes that process information. It is a brain-inspired model used in deep learning.

Semi-Supervised ML

A type of ML where the model is trained using both labeled and unlabeled data.

ML Types Summary
Supervised
Labeled data
Unsupervised
Unlabeled data
Reinforcement
Rewards & Penalties
Ensemble
Multiple models
Semi-Supervised
Both types

5 Deep Learning

Deep Learning is a subset of Machine Learning that uses artificial neural networks with many layers to learn complex patterns from large amounts of data.

๐Ÿ”‘ Deep Learning Frameworks

TensorFlow โ€” Google's open-source ML framework
PyTorch โ€” Facebook's deep learning framework, popular in research

6 Generative AI (Gen AI)

Generative AI is a type of AI that can generate new content such as text, images, audio, video, or code based on the data it has learned from training.

In short: Generative AI creates new content instead of only analyzing data.

๐Ÿ’ก Examples

ChatGPT โ€” generates text conversations
DALL-E โ€” generates images from text prompts

7 Agent AI

Agent AI refers to autonomous AI systems that can plan, make decisions, and perform tasks automatically using different tools and data sources.

๐Ÿ”‘ Framework

LangChain โ€” the most popular framework for building AI Agents

8 Hugging Face

Hugging Face is a company and open-source platform that provides tools, libraries, and models for AI and NLP (Natural Language Processing). It helps developers and researchers easily build, train and use machine learning models, especially for language-related tasks.

๐Ÿ”‘ What it offers

Pre-trained language models ยท Tokenizers ยท Datasets ยท Training pipelines ยท Model Hub (thousands of free models to download and use)

9 Embeddings & Vectors

What is an Embedding?

An embedding is a technique or method used to represent data. It is a way of converting data (such as words, sentences, or images) into numerical vectors so that computers can understand and process them.

What is a Vector?

A vector is the numerical output produced by the embedding. It is a list of numbers that captures the meaning or features of the original data.

Text โ†’ Embedding โ†’ Vector
"Hello World"
Raw Text
โ†’
Embedding Model
Hugging Face / OpenAI
โ†’
[0.12, -0.45, 0.87โ€ฆ]
Vector (numbers)

๐Ÿ’ก Key Insight

Similar content produces similar vectors. This is how AI can understand semantic meaning โ€” not just keyword matching.

10 Tokenization

Tokenization is the process of breaking text into smaller units called tokens so a model can process them.

๐Ÿ”‘ Token Types

Words โ€” "Hello" is one token
Subwords โ€” "playing" โ†’ "play" + "ing"
Characters โ€” single letters
Punctuation โ€” "." "," "!"

In short: Breaking text into smaller units (tokens).

11 Vectorization

Vectorization is the process of converting data (text, images, audio, etc.) into numerical vectors so that machines can process and understand it.

In short: Converting tokens to numerical vectors.

๐Ÿ’ก Why is Vectorization Multidimensional?

Vectorization is multidimensional because multiple numeric dimensions are needed to represent complex patterns, meanings, and relationships in data. A single number cannot capture the full meaning of a word or sentence.

12 Transformers

Positional Encoding

Positional encoding is a technique used in transformer models to inject information about the order of tokens in a sequence. Since transformers process all tokens simultaneously, positional encoding tells the model where each token appears.

Self-Attention in Transformers

Self-attention allows each word to look at other words in the sentence and decide which ones are important. Words look at each other โ€” giving the model a richer understanding of context.

๐Ÿ’ก Example: "The bank by the river"

Self-attention helps the model understand "bank" means a riverbank โ€” not a financial bank โ€” by looking at the surrounding words "river".

HNSW โ€” Hierarchical Navigable Small World

HNSW is a graph-based ANN (Approximate Nearest Neighbor) algorithm used to perform fast vector similarity search. It is one of the most popular algorithms for ANN search.

How it works:

  1. Start from the top layer
  2. Move to nodes closer to the query vector
  3. Go down to lower layers
  4. Continue until the nearest vector is found

Vector databases using HNSW

Weaviate ยท Qdrant ยท Milvus

ANN โ€” Approximate Nearest Neighbor

ANN is a technique used to quickly find vectors that are most similar to a query vector, without checking every vector in the dataset. In vector search systems you may have millions or billions of embeddings โ€” computing similarity with every vector would be too slow. ANN finds very close neighbors quickly, but the result may be approximate rather than perfectly exact.

13 Database Types

Types of Databases
Relational (RDBMS)
MySQL, PostgreSQL
NoSQL
MongoDB, Redis
Graph
Neo4j, ArangoDB
Vector
ChromaDB, Weaviate

RDBMS โ€” Relational Databases

A Relational Database Management System stores data in tables with rows and columns. Tables are connected using keys (Primary Keys, Foreign Keys). Uses SQL language for queries.

Examples: MySQL ยท PostgreSQL ยท Oracle ยท Microsoft SQL Server ยท SQLite ยท MariaDB ยท Amazon Aurora

NoSQL Databases

NoSQL means "Not only SQL". A NoSQL database stores unstructured or flexible data and is used for large-scale web applications.

NoSQL Sub-types

Key-Value โ€” Redis, Amazon DynamoDB
Document โ€” MongoDB, CouchDB
Column โ€” Apache Cassandra, HBase
Graph โ€” Neo4j, OrientDB

Graph Databases

A Graph Database stores data as nodes and relationships (edges). Used when relationships between data are very important.

Examples: Neo4j ยท Amazon Neptune ยท ArangoDB ยท TigerGraph ยท OrientDB

14 Vector Databases

A Vector Database stores vectors (numerical embeddings) instead of traditional text data. These databases are mostly used in AI and ML applications.

Vector Database in RAG
Text Chunks
โ†’
Embedding Model
โ†’
Vector DB (ChromaDB)
Stores vectors + text + metadata

๐Ÿ”‘ ChromaDB

ChromaDB is a popular open-source vector database. All vectors are stored here with their original text. It acts like a semantic search engine.

15 Vector Mathematics for AI Engineers

Cosine Similarity, Euclidean Distance, Dot Product, and L2 Normalization are used in every vector database operation.

1. Cosine Similarity (used for embeddings)

This measures how similar two vectors are by measuring the angle between them.

Formula
cos(ฮธ) = (A ยท B) / (โ€–Aโ€– ร— โ€–Bโ€–)

Range: -1 to 1
  1  โ†’ very similar
  0  โ†’ unrelated
 -1  โ†’ opposite

Used in

Semantic Search ยท Recommendation Systems ยท RAG Systems

2. Euclidean Distance

This measures the straight-line distance between two vectors in n-dimensional space. Lower = more similar. Uses magnitude (size matters).

Formula
d = โˆšฮฃ(Aแตข - Bแตข)ยฒ

Small distance  โ†’ Similar vectors
Large distance  โ†’ Different vectors

Used in AI

KNN ยท Clustering (K-means) ยท Anomaly Detection

3. Dot Product

The dot product multiplies matching elements and adds them. It is the fastest to compute and equals Cosine Similarity when vectors are L2-normalized.

Formula
A ยท B = ฮฃ(Aแตข ร— Bแตข)

Example:
a = [1, 2, 3]  b = [4, 5, 6]
1ร—4 + 2ร—5 + 3ร—6 = 32

Used in AI

Attention in Transformers ยท Embedding Similarity ยท Neural Network Calculations

16 What is RAG?

RAG = Retrieval-Augmented Generation

RAG is an AI technique where a system retrieves relevant information from external data sources and then uses a generative model to produce an accurate answer.

๐Ÿ’ก Simple Formula

RAG = Search + Generative AI

RAG is an AI framework that improves Large Language Model (LLM) accuracy by fetching data from external trusted sources โ€” such as company documents or databases โ€” before generating a response.

Output: Accurate & context-aware answers

17 RAG Architecture

Pipeline 1 โ€” Ingestion (Offline / Batch)

Ingestion Pipeline
Data Source
PDFs, Websites, DBs
โ†’
Document Chunking
300โ€“500 tokens
โ†’
Embedding Model
โ†’
Vector Database

Document Chunking (300โ€“500 tokens)

Large documents are split into smaller pieces called chunks. Embedding models have token limits, and smaller chunks = more precise retrieval.

Embedding Model (OpenAI / Hugging Face)

Each chunk is converted into a vector (a list of numbers). These vectors capture semantic meaning โ€” similar content = similar vectors.

Pipeline 2 โ€” Query (Online / Real-time)

Query Pipeline
User Question
โ†’
Embed Query
โ†’
Similarity Search
Top-K
โ†’
Prompt Construction
โ†’
LLM (GPT-4o)
โ†’
Cited Answer

Step-by-step Explained

1. User Question โ€” e.g. "What is our refund policy?" โ€” kicks off the real-time pipeline.
2. Embed Query โ€” the question is converted to a vector using the same embedding model.
3. Similarity Search (Top-K) โ€” the query vector is compared against all stored vectors in ChromaDB.
4. Prompt Construction โ€” retrieved chunks + original question are combined into a prompt.
5. LLM (GPT-4o) โ€” the constructed prompt is sent to the LLM. The LLM reads the context and generates an answer.
6. Cited Answer โ€” final output with source citations. User knows exactly which document the answer came from.

18 RAG Challenges

โš ๏ธ Shortfalls & Challenges of RAG

1. Chunking Problems
2. Retrieval Failures
3. Hierarchical Indexing
4. Evaluation & Monitoring
5. Contextual Metadata
6. Iterative / Multi-Query RAG
7. Continuous Re-indexing Pipeline
8. Security & Privacy

19 What is an AI Agent?

An AI Agent is a system that can perceive its environment, make decisions, and take actions automatically to achieve a goal.

Think of an AI agent like a smart assistant that:

  • Observes โ€” what's happening
  • Thinks about what to do
  • Acts to complete a task
Basic Components of an AI Agent
Perception
Gets info (text, images, sensors)
โ†’
Decision Making
Uses logic / AI model
โ†’
Action
Replies, moves, controls

๐Ÿ’ก Examples

ChatGPT ยท Voice assistants like Siri or Google Assistant ยท Self-driving systems in cars ยท Game AI

20 Types of AI Agents

Agent Types
Reactive Agents
Just respond (no memory)
Goal-Based Agents
Try to achieve a goal
Learning Agents
Improve over time using experience

How an AI Agent Flows

  1. User asks question
  2. LLM reads system prompt
  3. LLM decides: Answer directly OR use tool
  4. Tool runs
  5. Result goes back to LLM
  6. Final answer generated

๐Ÿ”‘ Core Rules for Building Agent AI

1. Clear Function Design
2. Use Type Hints
3. Write Clean Docstrings
4. Always Return Strings

21 Multi-Agent Systems (MAS)

Single Agent System

Only one AI agent is doing everything. Example: A coding assistant that reads your request, writes code, and fixes errors โ€” one brain doing all tasks.

Multi-Agent System (MAS)

Multiple AI agents working together, each with a specific role. Think of it like a team:

  • Agent 1 โ†’ Planner
  • Agent 2 โ†’ Researcher
  • Agent 3 โ†’ Executor
  • Agent 4 โ†’ Reviewer

All cooperate to finish a task.

Single vs Multi-Agent
Feature Single Agent Multi-Agent
No. of agents 1 Many
Complexity Simple More Complex
Performance Limited More Powerful
Example One chatbot Team of AI bots

๐Ÿ’ก Why Multi-Agent Systems are Important

Handle complex tasks ยท Work faster (parallel work) ยท More accurate (agents check each other)

Building a Multi-Agent System

Example: Take a topic and create a short blog post.

  1. Planner Agent โ†’ decides structure
  2. Writer Agent โ†’ writes content
  3. Reviewer Agent โ†’ improves it
Python โ€” Setup with CrewAI
# Step 1: Choose Tools
# Popular choices: LangChain ยท AutoGPT ยท CrewAI

# Step 2: Install requirements
pip install crewai openai

# Step 3: Define Agents
# Step 4: Define Tasks (tasks tell agents what to do)
# Step 5: Create the Crew (connect agents)

# Step 6: How it works (flow)
# 1. Planner  โ†’ creates outline
# 2. Writer   โ†’ uses outline
# 3. Reviewer โ†’ improves final output

22 Orchestration System

An Orchestration System manages and coordinates multiple agents, tools, and steps to complete a task.

In AI systems, an orchestration system controls:

  1. Which agent runs next
  2. How data flows (state)
  3. When to loop
  4. When to stop
  5. When to ask human
Orchestration Flow
User
โ†’
Planner
โ†’
Writer
โ†’
Reviewer
โ†’
Final Output

โ†‘ Controlled by the Orchestrator

๐Ÿ”‘ What Orchestration Manages

1. Task Flow
2. Agent Communication
3. Decision Making
4. Tool Usage
5. Error Handling
6. Human-in-the-loop

๐Ÿ’ก In short

Orchestration = Controlling and coordinating all parts of an AI system to work together smoothly.

23 GCP & Cloud Computing

What is Cloud Computing?

Cloud computing means using remote servers on the internet instead of local computers. Instead of saving data on your laptop, you store it in the cloud.

Examples

Google Cloud ยท AWS (Amazon Web Services) ยท Microsoft Azure

What is GCP?

GCP = Google Cloud Platform. GCP is a cloud computing platform provided by Google that offers services like computing power, storage, databases, machine learning, and networking over the internet.

GCP allows developers and companies to run applications, store data and build software on Google's servers instead of their own computers.

24 FastAPI for AI

FastAPI is a modern Python web framework used to build high-performance APIs. In AI projects it is the standard tool for exposing ML models as REST APIs.

Example: Sentiment Detection API

Technology Stack:
Python โ€” programming language
FastAPI โ€” for API creation
Uvicorn โ€” to run the API server
Postman โ€” for testing API requests
Pytest โ€” for unit testing

The API allows a user to:
โ€ข Send a sentence
โ€ข The server analyzes it
โ€ข The API returns the sentiment result

Python โ€” FastAPI setup
# Step 1: Create project folder
mkdir project && cd project

# Step 2: Create virtual environment
python -m venv venv
venv\Scripts\activate

# Step 3: Install required libraries
pip install fastapi uvicorn pytest

# Step 4: Create API file
notepad sentiment-api.py

# Step 5: Run the API
uvicorn sentiment-api:app --reload
# Visit: http://127.0.0.1:8000/docs

# Step 6: Test with Postman
# Step 7: Create unit test file
notepad test-api.py
pytest

25 Streamlit for UI

Streamlit is a Python framework used to quickly build web-based user interfaces (UI) for data science, machine learning, and AI applications.

๐Ÿ’ก Why use Streamlit?

No HTML/CSS needed ยท Write pure Python ยท Instantly see the UI in your browser ยท Perfect for demos and prototypes of AI models

Python โ€” Streamlit example
pip install streamlit

# app.py
import streamlit as st

st.title("My AI App")
user_input = st.text_input("Ask me anything:")
if user_input:
    st.write("You asked:", user_input)

# Run
streamlit run app.py