Artificial Intelligence Tutorial
From Data Science fundamentals to RAG, Agent AI, and beyond โ based on structured classroom notes
1 Data Science
Data Science is the field of collecting, cleaning, analyzing, and interpreting data to find useful information and make decisions.
๐ Common Tools in Data Science
Python โ programming language
Pandas โ data manipulation
NumPy โ numerical computing
2 What is Artificial Intelligence?
Artificial Intelligence (AI) is a field of computer science that focuses on creating machines that can perform tasks that normally require human intelligence.
๐ก Example
ChatGPT โ understands language and generates human-like responses using AI.
Python vs Java for AI
| Feature | Python | Java |
|---|---|---|
| Type System | Dynamic | Static |
| Code Length | Less lines | More lines |
| Learning | Easier | Complicated |
| Primary Use | AI, Data Science, ML | Enterprise, Android |
3 Machine Learning (ML)
Machine Learning is a subset of AI where computers learn patterns from data and improve their performance without being explicitly programmed.
๐ Example Library
Scikit-learn โ the most popular Python library for classical machine learning algorithms.
4 Types of Machine Learning
Supervised Machine Learning
The model is trained using labeled data, meaning the correct output is already known. The algorithm is trained on labelled data where the answer is provided.
Unsupervised Machine Learning
The model works with unlabeled data and tries to find patterns or structures in the data. No need to provide trained/labeled data โ the model discovers hidden patterns on its own.
Reinforcement Learning
Learning through rewards and penalties. An agent learns by interacting with an environment and receiving rewards or penalties. The agent learns to make decisions by performing actions in an environment to maximize cumulative rewards.
Ensemble Learning
A technique that combines multiple machine learning models to produce a better and more accurate prediction. Combining multiple models for better accuracy.
Neural Network
A computational model inspired by the human brain, consisting of interconnected nodes that process information. It is a brain-inspired model used in deep learning.
Semi-Supervised ML
A type of ML where the model is trained using both labeled and unlabeled data.
5 Deep Learning
Deep Learning is a subset of Machine Learning that uses artificial neural networks with many layers to learn complex patterns from large amounts of data.
๐ Deep Learning Frameworks
TensorFlow โ Google's open-source ML framework
PyTorch โ Facebook's deep learning framework, popular in research
6 Generative AI (Gen AI)
Generative AI is a type of AI that can generate new content such as text, images, audio, video, or code based on the data it has learned from training.
In short: Generative AI creates new content instead of only analyzing data.
๐ก Examples
ChatGPT โ generates text conversations
DALL-E โ generates images from text prompts
7 Agent AI
Agent AI refers to autonomous AI systems that can plan, make decisions, and perform tasks automatically using different tools and data sources.
๐ Framework
LangChain โ the most popular framework for building AI Agents
8 Hugging Face
Hugging Face is a company and open-source platform that provides tools, libraries, and models for AI and NLP (Natural Language Processing). It helps developers and researchers easily build, train and use machine learning models, especially for language-related tasks.
๐ What it offers
Pre-trained language models ยท Tokenizers ยท Datasets ยท Training pipelines ยท Model Hub (thousands of free models to download and use)
9 Embeddings & Vectors
What is an Embedding?
An embedding is a technique or method used to represent data. It is a way of converting data (such as words, sentences, or images) into numerical vectors so that computers can understand and process them.
What is a Vector?
A vector is the numerical output produced by the embedding. It is a list of numbers that captures the meaning or features of the original data.
๐ก Key Insight
Similar content produces similar vectors. This is how AI can understand semantic meaning โ not just keyword matching.
10 Tokenization
Tokenization is the process of breaking text into smaller units called tokens so a model can process them.
๐ Token Types
Words โ "Hello" is one token
Subwords โ "playing" โ "play" + "ing"
Characters โ single letters
Punctuation โ "." "," "!"
In short: Breaking text into smaller units (tokens).
11 Vectorization
Vectorization is the process of converting data (text, images, audio, etc.) into numerical vectors so that machines can process and understand it.
In short: Converting tokens to numerical vectors.
๐ก Why is Vectorization Multidimensional?
Vectorization is multidimensional because multiple numeric dimensions are needed to represent complex patterns, meanings, and relationships in data. A single number cannot capture the full meaning of a word or sentence.
12 Transformers
Positional Encoding
Positional encoding is a technique used in transformer models to inject information about the order of tokens in a sequence. Since transformers process all tokens simultaneously, positional encoding tells the model where each token appears.
Self-Attention in Transformers
Self-attention allows each word to look at other words in the sentence and decide which ones are important. Words look at each other โ giving the model a richer understanding of context.
๐ก Example: "The bank by the river"
Self-attention helps the model understand "bank" means a riverbank โ not a financial bank โ by looking at the surrounding words "river".
HNSW โ Hierarchical Navigable Small World
HNSW is a graph-based ANN (Approximate Nearest Neighbor) algorithm used to perform fast vector similarity search. It is one of the most popular algorithms for ANN search.
How it works:
- Start from the top layer
- Move to nodes closer to the query vector
- Go down to lower layers
- Continue until the nearest vector is found
Vector databases using HNSW
Weaviate ยท Qdrant ยท Milvus
ANN โ Approximate Nearest Neighbor
ANN is a technique used to quickly find vectors that are most similar to a query vector, without checking every vector in the dataset. In vector search systems you may have millions or billions of embeddings โ computing similarity with every vector would be too slow. ANN finds very close neighbors quickly, but the result may be approximate rather than perfectly exact.
13 Database Types
RDBMS โ Relational Databases
A Relational Database Management System stores data in tables with rows and columns. Tables are connected using keys (Primary Keys, Foreign Keys). Uses SQL language for queries.
Examples: MySQL ยท PostgreSQL ยท Oracle ยท Microsoft SQL Server ยท SQLite ยท MariaDB ยท Amazon Aurora
NoSQL Databases
NoSQL means "Not only SQL". A NoSQL database stores unstructured or flexible data and is used for large-scale web applications.
NoSQL Sub-types
Key-Value โ Redis, Amazon DynamoDB
Document โ MongoDB, CouchDB
Column โ Apache Cassandra, HBase
Graph โ Neo4j, OrientDB
Graph Databases
A Graph Database stores data as nodes and relationships (edges). Used when relationships between data are very important.
Examples: Neo4j ยท Amazon Neptune ยท ArangoDB ยท TigerGraph ยท OrientDB
14 Vector Databases
A Vector Database stores vectors (numerical embeddings) instead of traditional text data. These databases are mostly used in AI and ML applications.
๐ ChromaDB
ChromaDB is a popular open-source vector database. All vectors are stored here with their original text. It acts like a semantic search engine.
15 Vector Mathematics for AI Engineers
Cosine Similarity, Euclidean Distance, Dot Product, and L2 Normalization are used in every vector database operation.
1. Cosine Similarity (used for embeddings)
This measures how similar two vectors are by measuring the angle between them.
cos(ฮธ) = (A ยท B) / (โAโ ร โBโ)
Range: -1 to 1
1 โ very similar
0 โ unrelated
-1 โ opposite
Used in
Semantic Search ยท Recommendation Systems ยท RAG Systems
2. Euclidean Distance
This measures the straight-line distance between two vectors in n-dimensional space. Lower = more similar. Uses magnitude (size matters).
d = โฮฃ(Aแตข - Bแตข)ยฒ
Small distance โ Similar vectors
Large distance โ Different vectors
Used in AI
KNN ยท Clustering (K-means) ยท Anomaly Detection
3. Dot Product
The dot product multiplies matching elements and adds them. It is the fastest to compute and equals Cosine Similarity when vectors are L2-normalized.
A ยท B = ฮฃ(Aแตข ร Bแตข)
Example:
a = [1, 2, 3] b = [4, 5, 6]
1ร4 + 2ร5 + 3ร6 = 32
Used in AI
Attention in Transformers ยท Embedding Similarity ยท Neural Network Calculations
16 What is RAG?
RAG = Retrieval-Augmented Generation
RAG is an AI technique where a system retrieves relevant information from external data sources and then uses a generative model to produce an accurate answer.
๐ก Simple Formula
RAG = Search + Generative AI
RAG is an AI framework that improves Large Language Model (LLM) accuracy by fetching data from external trusted sources โ such as company documents or databases โ before generating a response.
Output: Accurate & context-aware answers
17 RAG Architecture
Pipeline 1 โ Ingestion (Offline / Batch)
Document Chunking (300โ500 tokens)
Large documents are split into smaller pieces called chunks. Embedding models have token limits, and smaller chunks = more precise retrieval.
Embedding Model (OpenAI / Hugging Face)
Each chunk is converted into a vector (a list of numbers). These vectors capture semantic meaning โ similar content = similar vectors.
Pipeline 2 โ Query (Online / Real-time)
Step-by-step Explained
1. User Question โ e.g. "What is our refund policy?" โ kicks off the real-time pipeline.
2. Embed Query โ the question is converted to a vector using the same embedding model.
3. Similarity Search (Top-K) โ the query vector is compared against all stored vectors in ChromaDB.
4. Prompt Construction โ retrieved chunks + original question are combined into a prompt.
5. LLM (GPT-4o) โ the constructed prompt is sent to the LLM. The LLM reads the context and generates an answer.
6. Cited Answer โ final output with source citations. User knows exactly which document the answer came from.
18 RAG Challenges
โ ๏ธ Shortfalls & Challenges of RAG
1. Chunking Problems
2. Retrieval Failures
3. Hierarchical Indexing
4. Evaluation & Monitoring
5. Contextual Metadata
6. Iterative / Multi-Query RAG
7. Continuous Re-indexing Pipeline
8. Security & Privacy
19 What is an AI Agent?
An AI Agent is a system that can perceive its environment, make decisions, and take actions automatically to achieve a goal.
Think of an AI agent like a smart assistant that:
- Observes โ what's happening
- Thinks about what to do
- Acts to complete a task
๐ก Examples
ChatGPT ยท Voice assistants like Siri or Google Assistant ยท Self-driving systems in cars ยท Game AI
20 Types of AI Agents
How an AI Agent Flows
- User asks question
- LLM reads system prompt
- LLM decides: Answer directly OR use tool
- Tool runs
- Result goes back to LLM
- Final answer generated
๐ Core Rules for Building Agent AI
1. Clear Function Design
2. Use Type Hints
3. Write Clean Docstrings
4. Always Return Strings
21 Multi-Agent Systems (MAS)
Single Agent System
Only one AI agent is doing everything. Example: A coding assistant that reads your request, writes code, and fixes errors โ one brain doing all tasks.
Multi-Agent System (MAS)
Multiple AI agents working together, each with a specific role. Think of it like a team:
- Agent 1 โ Planner
- Agent 2 โ Researcher
- Agent 3 โ Executor
- Agent 4 โ Reviewer
All cooperate to finish a task.
| Feature | Single Agent | Multi-Agent |
|---|---|---|
| No. of agents | 1 | Many |
| Complexity | Simple | More Complex |
| Performance | Limited | More Powerful |
| Example | One chatbot | Team of AI bots |
๐ก Why Multi-Agent Systems are Important
Handle complex tasks ยท Work faster (parallel work) ยท More accurate (agents check each other)
Building a Multi-Agent System
Example: Take a topic and create a short blog post.
- Planner Agent โ decides structure
- Writer Agent โ writes content
- Reviewer Agent โ improves it
# Step 1: Choose Tools
# Popular choices: LangChain ยท AutoGPT ยท CrewAI
# Step 2: Install requirements
pip install crewai openai
# Step 3: Define Agents
# Step 4: Define Tasks (tasks tell agents what to do)
# Step 5: Create the Crew (connect agents)
# Step 6: How it works (flow)
# 1. Planner โ creates outline
# 2. Writer โ uses outline
# 3. Reviewer โ improves final output
22 Orchestration System
An Orchestration System manages and coordinates multiple agents, tools, and steps to complete a task.
In AI systems, an orchestration system controls:
- Which agent runs next
- How data flows (state)
- When to loop
- When to stop
- When to ask human
โ Controlled by the Orchestrator
๐ What Orchestration Manages
1. Task Flow
2. Agent Communication
3. Decision Making
4. Tool Usage
5. Error Handling
6. Human-in-the-loop
๐ก In short
Orchestration = Controlling and coordinating all parts of an AI system to work together smoothly.
23 GCP & Cloud Computing
What is Cloud Computing?
Cloud computing means using remote servers on the internet instead of local computers. Instead of saving data on your laptop, you store it in the cloud.
Examples
Google Cloud ยท AWS (Amazon Web Services) ยท Microsoft Azure
What is GCP?
GCP = Google Cloud Platform. GCP is a cloud computing platform provided by Google that offers services like computing power, storage, databases, machine learning, and networking over the internet.
GCP allows developers and companies to run applications, store data and build software on Google's servers instead of their own computers.
24 FastAPI for AI
FastAPI is a modern Python web framework used to build high-performance APIs. In AI projects it is the standard tool for exposing ML models as REST APIs.
Example: Sentiment Detection API
Technology Stack:
Python โ programming language
FastAPI โ for API creation
Uvicorn โ to run the API server
Postman โ for testing API requests
Pytest โ for unit testing
The API allows a user to:
โข Send a sentence
โข The server analyzes it
โข The API returns the sentiment result
# Step 1: Create project folder
mkdir project && cd project
# Step 2: Create virtual environment
python -m venv venv
venv\Scripts\activate
# Step 3: Install required libraries
pip install fastapi uvicorn pytest
# Step 4: Create API file
notepad sentiment-api.py
# Step 5: Run the API
uvicorn sentiment-api:app --reload
# Visit: http://127.0.0.1:8000/docs
# Step 6: Test with Postman
# Step 7: Create unit test file
notepad test-api.py
pytest
25 Streamlit for UI
Streamlit is a Python framework used to quickly build web-based user interfaces (UI) for data science, machine learning, and AI applications.
๐ก Why use Streamlit?
No HTML/CSS needed ยท Write pure Python ยท Instantly see the UI in your browser ยท Perfect for demos and prototypes of AI models
pip install streamlit
# app.py
import streamlit as st
st.title("My AI App")
user_input = st.text_input("Ask me anything:")
if user_input:
st.write("You asked:", user_input)
# Run
streamlit run app.py