AI Tutorial - Georgin Tutorials

1 Data Science

Data Science is the field of collecting, cleaning, analyzing, and interpreting data to find useful information and make decisions.

Data Science Umbrella

Artificial Intelligence

→

Machine Learning

→

Deep Learning

→

Generative AI

→

Agent AI

🔑 Common Tools in Data Science

Python — programming language
Pandas — data manipulation
NumPy — numerical computing

2 What is Artificial Intelligence?

Artificial Intelligence (AI) is a field of computer science that focuses on creating machines that can perform tasks that normally require human intelligence.

💡 Example

ChatGPT — understands language and generates human-like responses using AI.

Python vs Java for AI

Comparison

Feature	Python	Java
Type System	Dynamic	Static
Code Length	Less lines	More lines
Learning	Easier	Complicated
Primary Use	AI, Data Science, ML	Enterprise, Android

3 Machine Learning (ML)

Machine Learning is a subset of AI where computers learn patterns from data and improve their performance without being explicitly programmed.

🔑 Example Library

Scikit-learn — the most popular Python library for classical machine learning algorithms.

4 Types of Machine Learning

Supervised Machine Learning

The model is trained using labeled data, meaning the correct output is already known. The algorithm is trained on labelled data where the answer is provided.

Unsupervised Machine Learning

The model works with unlabeled data and tries to find patterns or structures in the data. No need to provide trained/labeled data — the model discovers hidden patterns on its own.

Reinforcement Learning

Learning through rewards and penalties. An agent learns by interacting with an environment and receiving rewards or penalties. The agent learns to make decisions by performing actions in an environment to maximize cumulative rewards.

Ensemble Learning

A technique that combines multiple machine learning models to produce a better and more accurate prediction. Combining multiple models for better accuracy.

Neural Network

A computational model inspired by the human brain, consisting of interconnected nodes that process information. It is a brain-inspired model used in deep learning.

Semi-Supervised ML

A type of ML where the model is trained using both labeled and unlabeled data.

ML Types Summary

Supervised

Labeled data

Unsupervised

Unlabeled data

Reinforcement

Rewards & Penalties

Ensemble

Multiple models

Semi-Supervised

Both types

5 Deep Learning

Deep Learning is a subset of Machine Learning that uses artificial neural networks with many layers to learn complex patterns from large amounts of data.

🔑 Deep Learning Frameworks

TensorFlow — Google's open-source ML framework
PyTorch — Facebook's deep learning framework, popular in research

6 Generative AI (Gen AI)

Generative AI is a type of AI that can generate new content such as text, images, audio, video, or code based on the data it has learned from training.

In short: Generative AI creates new content instead of only analyzing data.

💡 Examples

ChatGPT — generates text conversations
DALL-E — generates images from text prompts

7 Agent AI

Agent AI refers to autonomous AI systems that can plan, make decisions, and perform tasks automatically using different tools and data sources.

🔑 Framework

LangChain — the most popular framework for building AI Agents

8 Hugging Face

Hugging Face is a company and open-source platform that provides tools, libraries, and models for AI and NLP (Natural Language Processing). It helps developers and researchers easily build, train and use machine learning models, especially for language-related tasks.

🔑 What it offers

Pre-trained language models · Tokenizers · Datasets · Training pipelines · Model Hub (thousands of free models to download and use)

9 Embeddings & Vectors

What is an Embedding?

An embedding is a technique or method used to represent data. It is a way of converting data (such as words, sentences, or images) into numerical vectors so that computers can understand and process them.

What is a Vector?

A vector is the numerical output produced by the embedding. It is a list of numbers that captures the meaning or features of the original data.

Text → Embedding → Vector

"Hello World"

Raw Text

→

Embedding Model

Hugging Face / OpenAI

→

[0.12, -0.45, 0.87…]

Vector (numbers)

💡 Key Insight

Similar content produces similar vectors. This is how AI can understand semantic meaning — not just keyword matching.

10 Tokenization

Tokenization is the process of breaking text into smaller units called tokens so a model can process them.

🔑 Token Types

Words — "Hello" is one token
Subwords — "playing" → "play" + "ing"
Characters — single letters
Punctuation — "." "," "!"

In short: Breaking text into smaller units (tokens).

11 Vectorization

Vectorization is the process of converting data (text, images, audio, etc.) into numerical vectors so that machines can process and understand it.

In short: Converting tokens to numerical vectors.

💡 Why is Vectorization Multidimensional?

Vectorization is multidimensional because multiple numeric dimensions are needed to represent complex patterns, meanings, and relationships in data. A single number cannot capture the full meaning of a word or sentence.

12 Transformers

Positional Encoding

Positional encoding is a technique used in transformer models to inject information about the order of tokens in a sequence. Since transformers process all tokens simultaneously, positional encoding tells the model where each token appears.

Self-Attention in Transformers

Self-attention allows each word to look at other words in the sentence and decide which ones are important. Words look at each other — giving the model a richer understanding of context.

💡 Example: "The bank by the river"

Self-attention helps the model understand "bank" means a riverbank — not a financial bank — by looking at the surrounding words "river".

HNSW — Hierarchical Navigable Small World

HNSW is a graph-based ANN (Approximate Nearest Neighbor) algorithm used to perform fast vector similarity search. It is one of the most popular algorithms for ANN search.

How it works:

Start from the top layer
Move to nodes closer to the query vector
Go down to lower layers
Continue until the nearest vector is found

Vector databases using HNSW

Weaviate · Qdrant · Milvus

ANN — Approximate Nearest Neighbor

ANN is a technique used to quickly find vectors that are most similar to a query vector, without checking every vector in the dataset. In vector search systems you may have millions or billions of embeddings — computing similarity with every vector would be too slow. ANN finds very close neighbors quickly, but the result may be approximate rather than perfectly exact.

13 Database Types

Types of Databases

Relational (RDBMS)

MySQL, PostgreSQL

NoSQL

MongoDB, Redis

Graph

Neo4j, ArangoDB

Vector

ChromaDB, Weaviate

RDBMS — Relational Databases

A Relational Database Management System stores data in tables with rows and columns. Tables are connected using keys (Primary Keys, Foreign Keys). Uses SQL language for queries.

Examples: MySQL · PostgreSQL · Oracle · Microsoft SQL Server · SQLite · MariaDB · Amazon Aurora

NoSQL Databases

NoSQL means "Not only SQL". A NoSQL database stores unstructured or flexible data and is used for large-scale web applications.

NoSQL Sub-types

Key-Value — Redis, Amazon DynamoDB
Document — MongoDB, CouchDB
Column — Apache Cassandra, HBase
Graph — Neo4j, OrientDB

Graph Databases

A Graph Database stores data as nodes and relationships (edges). Used when relationships between data are very important.

Examples: Neo4j · Amazon Neptune · ArangoDB · TigerGraph · OrientDB

14 Vector Databases

A Vector Database stores vectors (numerical embeddings) instead of traditional text data. These databases are mostly used in AI and ML applications.

Vector Database in RAG

Text Chunks

→

Embedding Model

→

Vector DB (ChromaDB)

Stores vectors + text + metadata

🔑 ChromaDB

ChromaDB is a popular open-source vector database. All vectors are stored here with their original text. It acts like a semantic search engine.

15 Vector Mathematics for AI Engineers

Cosine Similarity, Euclidean Distance, Dot Product, and L2 Normalization are used in every vector database operation.

1. Cosine Similarity (used for embeddings)

This measures how similar two vectors are by measuring the angle between them.

Formula

cos(θ) = (A · B) / (‖A‖ × ‖B‖)

Range: -1 to 1
  1  → very similar
  0  → unrelated
 -1  → opposite

Used in

Semantic Search · Recommendation Systems · RAG Systems

2. Euclidean Distance

This measures the straight-line distance between two vectors in n-dimensional space. Lower = more similar. Uses magnitude (size matters).

Formula

d = √Σ(Aᵢ - Bᵢ)²

Small distance  → Similar vectors
Large distance  → Different vectors

Used in AI

KNN · Clustering (K-means) · Anomaly Detection

3. Dot Product

The dot product multiplies matching elements and adds them. It is the fastest to compute and equals Cosine Similarity when vectors are L2-normalized.

Formula

A · B = Σ(Aᵢ × Bᵢ)

Example:
a = [1, 2, 3]  b = [4, 5, 6]
1×4 + 2×5 + 3×6 = 32

Used in AI

Attention in Transformers · Embedding Similarity · Neural Network Calculations

16 What is RAG?

RAG = Retrieval-Augmented Generation

RAG is an AI technique where a system retrieves relevant information from external data sources and then uses a generative model to produce an accurate answer.

💡 Simple Formula

RAG = Search + Generative AI

RAG is an AI framework that improves Large Language Model (LLM) accuracy by fetching data from external trusted sources — such as company documents or databases — before generating a response.

Output: Accurate & context-aware answers

17 RAG Architecture

Pipeline 1 — Ingestion (Offline / Batch)

Ingestion Pipeline

Data Source

PDFs, Websites, DBs

→

Document Chunking

300–500 tokens

→

Embedding Model

→

Vector Database

Document Chunking (300–500 tokens)

Large documents are split into smaller pieces called chunks. Embedding models have token limits, and smaller chunks = more precise retrieval.

Embedding Model (OpenAI / Hugging Face)

Each chunk is converted into a vector (a list of numbers). These vectors capture semantic meaning — similar content = similar vectors.

Pipeline 2 — Query (Online / Real-time)

Query Pipeline

User Question

→

Embed Query

→

Similarity Search

Top-K

→

Prompt Construction

→

LLM (GPT-4o)

→

Cited Answer

Step-by-step Explained

1. User Question — e.g. "What is our refund policy?" — kicks off the real-time pipeline.
2. Embed Query — the question is converted to a vector using the same embedding model.
3. Similarity Search (Top-K) — the query vector is compared against all stored vectors in ChromaDB.
4. Prompt Construction — retrieved chunks + original question are combined into a prompt.
5. LLM (GPT-4o) — the constructed prompt is sent to the LLM. The LLM reads the context and generates an answer.
6. Cited Answer — final output with source citations. User knows exactly which document the answer came from.

18 RAG Challenges

⚠️ Shortfalls & Challenges of RAG

1. Chunking Problems
2. Retrieval Failures
3. Hierarchical Indexing
4. Evaluation & Monitoring
5. Contextual Metadata
6. Iterative / Multi-Query RAG
7. Continuous Re-indexing Pipeline
8. Security & Privacy

19 What is an AI Agent?

An AI Agent is a system that can perceive its environment, make decisions, and take actions automatically to achieve a goal.

Think of an AI agent like a smart assistant that:

Observes — what's happening
Thinks about what to do
Acts to complete a task

Basic Components of an AI Agent

Perception

Gets info (text, images, sensors)

→

Decision Making

Uses logic / AI model

→

Action

Replies, moves, controls

💡 Examples

ChatGPT · Voice assistants like Siri or Google Assistant · Self-driving systems in cars · Game AI

20 Types of AI Agents

Agent Types

Reactive Agents

Just respond (no memory)

Goal-Based Agents

Try to achieve a goal

Learning Agents

Improve over time using experience

How an AI Agent Flows

User asks question
LLM reads system prompt
LLM decides: Answer directly OR use tool
Tool runs
Result goes back to LLM
Final answer generated

🔑 Core Rules for Building Agent AI

1. Clear Function Design
2. Use Type Hints
3. Write Clean Docstrings
4. Always Return Strings

21 Multi-Agent Systems (MAS)

Single Agent System

Only one AI agent is doing everything. Example: A coding assistant that reads your request, writes code, and fixes errors — one brain doing all tasks.

Multi-Agent System (MAS)

Multiple AI agents working together, each with a specific role. Think of it like a team:

Agent 1 → Planner
Agent 2 → Researcher
Agent 3 → Executor
Agent 4 → Reviewer

All cooperate to finish a task.

Single vs Multi-Agent

Feature	Single Agent	Multi-Agent
No. of agents	1	Many
Complexity	Simple	More Complex
Performance	Limited	More Powerful
Example	One chatbot	Team of AI bots

💡 Why Multi-Agent Systems are Important

Handle complex tasks · Work faster (parallel work) · More accurate (agents check each other)

Building a Multi-Agent System

Example: Take a topic and create a short blog post.

Planner Agent → decides structure
Writer Agent → writes content
Reviewer Agent → improves it

Python — Setup with CrewAI

# Step 1: Choose Tools
# Popular choices: LangChain · AutoGPT · CrewAI

# Step 2: Install requirements
pip install crewai openai

# Step 3: Define Agents
# Step 4: Define Tasks (tasks tell agents what to do)
# Step 5: Create the Crew (connect agents)

# Step 6: How it works (flow)
# 1. Planner  → creates outline
# 2. Writer   → uses outline
# 3. Reviewer → improves final output

22 Orchestration System

An Orchestration System manages and coordinates multiple agents, tools, and steps to complete a task.

In AI systems, an orchestration system controls:

Which agent runs next
How data flows (state)
When to loop
When to stop
When to ask human

Orchestration Flow

User

→

Planner

→

Writer

→

Reviewer

→

Final Output

↑ Controlled by the Orchestrator

🔑 What Orchestration Manages

1. Task Flow
2. Agent Communication
3. Decision Making
4. Tool Usage
5. Error Handling
6. Human-in-the-loop

💡 In short

Orchestration = Controlling and coordinating all parts of an AI system to work together smoothly.

23 GCP & Cloud Computing

What is Cloud Computing?

Cloud computing means using remote servers on the internet instead of local computers. Instead of saving data on your laptop, you store it in the cloud.

Examples

Google Cloud · AWS (Amazon Web Services) · Microsoft Azure

What is GCP?

GCP = Google Cloud Platform. GCP is a cloud computing platform provided by Google that offers services like computing power, storage, databases, machine learning, and networking over the internet.

GCP allows developers and companies to run applications, store data and build software on Google's servers instead of their own computers.

24 FastAPI for AI

FastAPI is a modern Python web framework used to build high-performance APIs. In AI projects it is the standard tool for exposing ML models as REST APIs.

Example: Sentiment Detection API

Technology Stack:
Python — programming language
FastAPI — for API creation
Uvicorn — to run the API server
Postman — for testing API requests
Pytest — for unit testing

The API allows a user to:
• Send a sentence
• The server analyzes it
• The API returns the sentiment result

Python — FastAPI setup

# Step 1: Create project folder
mkdir project && cd project

# Step 2: Create virtual environment
python -m venv venv
venv\Scripts\activate

# Step 3: Install required libraries
pip install fastapi uvicorn pytest

# Step 4: Create API file
notepad sentiment-api.py

# Step 5: Run the API
uvicorn sentiment-api:app --reload
# Visit: http://127.0.0.1:8000/docs

# Step 6: Test with Postman
# Step 7: Create unit test file
notepad test-api.py
pytest

25 Streamlit for UI

Streamlit is a Python framework used to quickly build web-based user interfaces (UI) for data science, machine learning, and AI applications.

💡 Why use Streamlit?

No HTML/CSS needed · Write pure Python · Instantly see the UI in your browser · Perfect for demos and prototypes of AI models

Python — Streamlit example

pip install streamlit

# app.py
import streamlit as st

st.title("My AI App")
user_input = st.text_input("Ask me anything:")
if user_input:
    st.write("You asked:", user_input)

# Run
streamlit run app.py

Artificial Intelligence Tutorial

1 Data Science

🔑 Common Tools in Data Science

2 What is Artificial Intelligence?

💡 Example

Python vs Java for AI

3 Machine Learning (ML)

🔑 Example Library

4 Types of Machine Learning

Supervised Machine Learning

Unsupervised Machine Learning

Reinforcement Learning

Ensemble Learning

Neural Network

Semi-Supervised ML

5 Deep Learning

🔑 Deep Learning Frameworks

6 Generative AI (Gen AI)

💡 Examples

7 Agent AI

🔑 Framework

8 Hugging Face

🔑 What it offers

9 Embeddings & Vectors

What is an Embedding?

What is a Vector?

💡 Key Insight

10 Tokenization

🔑 Token Types

11 Vectorization

💡 Why is Vectorization Multidimensional?

12 Transformers

Positional Encoding

Self-Attention in Transformers

💡 Example: "The bank by the river"

HNSW — Hierarchical Navigable Small World

Vector databases using HNSW

ANN — Approximate Nearest Neighbor

13 Database Types

RDBMS — Relational Databases

NoSQL Databases

NoSQL Sub-types

Graph Databases

14 Vector Databases

🔑 ChromaDB

15 Vector Mathematics for AI Engineers

1. Cosine Similarity (used for embeddings)

Used in

2. Euclidean Distance

Used in AI

3. Dot Product

Used in AI

16 What is RAG?

💡 Simple Formula

17 RAG Architecture

Pipeline 1 — Ingestion (Offline / Batch)

Document Chunking (300–500 tokens)

Embedding Model (OpenAI / Hugging Face)

Pipeline 2 — Query (Online / Real-time)

Step-by-step Explained

18 RAG Challenges

⚠️ Shortfalls & Challenges of RAG

19 What is an AI Agent?

💡 Examples

20 Types of AI Agents

How an AI Agent Flows

🔑 Core Rules for Building Agent AI

21 Multi-Agent Systems (MAS)

Single Agent System

Multi-Agent System (MAS)

💡 Why Multi-Agent Systems are Important

Building a Multi-Agent System

22 Orchestration System

🔑 What Orchestration Manages

💡 In short

23 GCP & Cloud Computing

What is Cloud Computing?

Examples

What is GCP?

24 FastAPI for AI