🧠 AI Research Intelligence Laboratory — Multi-Agent Collaboration + RAG Reasoning
🎯 Purpose
Design an end-to-end AI reasoning workflow combining collaborative multi-agent research (CrewAI) and retrieval-augmented summarization (LangChain + vector store).
The laboratory is divided into two independent tasks, each focused on distinct reasoning pipelines:
Task 1: Multi-Agent Research Team (CrewAI + local LLM)
Task 2: Wikipedia-based RAG Summarizer (LangChain + ChromaDB)
Each task can be completed independently.
No paid LLM APIs are required — all models are open-source and locally executable (e.g., Mistral 7B, Llama 3 8B, Phi-3-mini).
📘 Repository Names
Task 1: multiagent_research-lab
Task 2: rag_wikipedia-lab
🗂️ Create two separate GitHub repositories — one per task. Each repo should contain a notebooks/, src/, and data/ directory.
🧠 EXERCISE 1 — “AI Research Team” (CrewAI + LangChain + Hugging Face Inference API)
🎯 Purpose
Simulate a multi-agent research collaboration where autonomous AI agents gather, analyze, and synthesize information about an AI-related topic using open-source frameworks and the Hugging Face Inference API.
Each agent acts as part of a “virtual research lab” working to produce a coherent research summary.
📘 Repository Name
multi-agent_research-lab
🧩 Structure
Objective
Create a three-agent workflow that simulates collaborative research around a chosen AI topic (e.g., “Impact of Synthetic Data in Healthcare” or “Bias in LLMs”).
Agents will communicate using CrewAI (or LangChain Agents) and rely on Hugging Face Inference API models for reasoning and summarization.
🧠 Agents and Roles
Agent | Responsibility | Tools / Functions
-- | -- | --
Researcher Agent | Conducts information search online and retrieves relevant text sources. | Web search tool (e.g., DuckDuckGo Search API or Tavily), text retrieval, document parsing
Writer Agent | Synthesizes retrieved knowledge into a 500-word structured summary (Markdown format). | Hugging Face Inference API for summarization
Reviewer Agent | Evaluates coherence, factuality, and structure of the final summary, suggesting corrections. | Text analysis with Hugging Face sentiment/classification model
⚙️ Environment
Python 3.10+
Frameworks: CrewAI, LangChain, Hugging Face Hub
Editor: VSCode or Google Colab
No local LLMs (inference handled via Hugging Face Inference API)
🧰 Tasks
0️⃣ Setup
Install required libraries:
pip install crewai langchain huggingface_hub duckduckgo-search chromadb pandas
Configure Hugging Face token:
from huggingface_hub import login
login("YOUR_HF_TOKEN")
1️⃣ Define the Agents
Create three agents within CrewAI or LangChain, defining:
Role / Goal / (Tools / APIs) / Memory (if applicable)
Example:
from crewai import Agent
researcher = Agent(
name="Researcher",
goal="Find reliable web sources about the impact of synthetic data in healthcare.",
tools=["duckduckgo-search"],
)
writer = Agent(
name="Writer",
goal="Write a coherent 500-word research summary using retrieved sources.",
llm="HuggingFaceH4/zephyr-7b-beta", # via Hugging Face API
)
reviewer = Agent(
name="Reviewer",
goal="Evaluate and correct factual inconsistencies and coherence issues.",
llm="microsoft/deberta-v3-small"
)
2️⃣ Workflow
Define communication cycles:
Researcher → performs search → returns snippets.
Writer → generates first draft using those snippets.
Reviewer → critiques and refines the text.
Writer → finalizes Markdown report.
Each agent should send messages back and forth using CrewAI’s coordination logic or LangChain’s agent loop.
3️⃣ Tools
Use the DuckDuckGo Search Tool (or Tavily if available) for gathering open-access content:
from langchain_community.tools import DuckDuckGoSearchRun
search_tool = DuckDuckGoSearchRun()
results = search_tool.run("Impact of synthetic data in healthcare site:medium.com OR site:researchgate.net")
No BeautifulSoup is needed; extract titles and summaries from search results directly.
4️⃣ Final Output
The Writer Agent generates:
Reviewer edits should be reflected in the final version.
5️⃣ Evaluation (Rubric)
Criterion | Points
-- | --
Correct setup and configuration (CrewAI + Hugging Face) | 4 pts
Functional multi-agent collaboration (communication cycles working) | 6 pts
Researcher retrieves meaningful text data | 3 pts
Writer generates coherent, structured text via Hugging Face API | 3 pts
Reviewer produces factuality & coherence feedback | 2 pts
Markdown summary well-structured and readable | 2 pts
Total: 20 pts
Total: 20 pts
🧮 Deliverables
-
/src/agents.py — all agent definitions
-
/notebooks/workflow_demo.ipynb — end-to-end execution
-
research_summary.md — final report
-
requirements.txt — reproducible environment
🛠️ Technical Requirements
⏱️ Duration
8 hours total
🔁 Recommended Workflow
-
Define agents and roles.
-
Implement search → writing → reviewing loop.
-
Generate and store Markdown report.
-
Present outputs and discuss team collaboration performance.
🧮 Task 2 — Wikipedia-based RAG Summarizer (LangChain + ChromaDB)
🎯 Objective
Build a retrieval-augmented summarization system using open-source Wikipedia data.
Use LangChain + ChromaDB + SentenceTransformers to query, embed, and summarize factual content from Wikipedia without multi-agent coordination.
⚙️ Steps
0️⃣ Environment Setup
Install:
pip install wikipedia-api sentence-transformers chromadb langchain transformers torch pandas
1️⃣ Dataset Creation
Fetch Wikipedia content:
import wikipediaapi
wiki = wikipediaapi.Wikipedia('en')
page = wiki.page("Federated_learning")
-
Extract main text and chunk into ~300-word segments.
-
Save to /data/wiki_corpus.csv with columns: id, title, text.
2️⃣ Embedding + Vector Store
Embed using:
from sentence_transformers import SentenceTransformer
import chromadb
model = SentenceTransformer("all-MiniLM-L6-v2")
client = chromadb.Client()
collection = client.create_collection("wiki_ai")
Upsert all chunks into ChromaDB with metadata.
3️⃣ Query Pipeline
Implement LangChain retrieval chain:
from langchain.chains import RetrievalQA
from langchain.llms import Ollama
from langchain.vectorstores import Chroma
qa = RetrievalQA.from_chain_type(
llm=Ollama(model="mistral"),
chain_type="stuff",
retriever=Chroma(...).as_retriever()
)
Query example:
qa.run("Explain federated learning challenges in healthcare.")
4️⃣ Generate and Save Summary
Combine top retrieved results into one coherent summary (400–500 words) and save as rag_summary.md.
🧮 Deliverables (Task 2)
📏 Rubric (20 pts)
Category | Description | Points
-- | -- | --
Wikipedia Data | Correct extraction and chunking | 4
Embedding + Storage | Proper embedding using SentenceTransformers and ChromaDB | 6
LangChain Pipeline | Functional retrieval and generation pipeline | 6
Final Summary | Coherence, accuracy, and factual completeness | 4
🧩 Bonus Task — Conceptual Comparison (+3 pts)
Write a short Markdown reflection comparing both approaches:
-
How did the multi-agent workflow handle ambiguity and contradictions?
-
How did the RAG approach handle factuality and retrieval coverage?
-
Which approach is better suited for open-ended vs. factual questions?
Save as /outputs/reflection.md.
🛠️ Technical Requirements
Python 3.10+
Dependencies:
crewai
langchain
sentence-transformers
chromadb
wikipedia-api
transformers
torch
pandas
numpy
markdown
Execution:
All notebooks must run fully in Google Colab or VSCode using a local LLM (Ollama or LM Studio).
🔁 Recommended Workflow
Task 1 — Multi-Agent Research
Define topic → Configure CrewAI agents → Run Researcher → Writer → Reviewer cycle → Save Markdown report
Task 2 — RAG Summarization
Load Wikipedia pages → Chunk text → Embed → Store in ChromaDB → Query with LangChain → Generate summary
Bonus
Write reflection on Multi-Agent vs RAG reasoning strengths
📤 Submission
Submit both repositories:
1️⃣ multiagent_research-lab
2️⃣ rag_wikipedia-lab
Provide GitHub URLs in the submission sheet:
👉 [Submission Excel – Repository & Dashboard Links]
Deadline: November 15
🧠 AI Research Intelligence Laboratory — Multi-Agent Collaboration + RAG Reasoning
🎯 Purpose
Design an end-to-end AI reasoning workflow combining collaborative multi-agent research (CrewAI) and retrieval-augmented summarization (LangChain + vector store).
The laboratory is divided into two independent tasks, each focused on distinct reasoning pipelines:
Task 1: Multi-Agent Research Team (CrewAI + local LLM)
Task 2: Wikipedia-based RAG Summarizer (LangChain + ChromaDB)
Each task can be completed independently.
No paid LLM APIs are required — all models are open-source and locally executable (e.g., Mistral 7B, Llama 3 8B, Phi-3-mini).
📘 Repository Names
Task 1: multiagent_research-lab
Task 2: rag_wikipedia-lab
🗂️ Create two separate GitHub repositories — one per task. Each repo should contain a notebooks/, src/, and data/ directory.
🧠 EXERCISE 1 — “AI Research Team” (CrewAI + LangChain + Hugging Face Inference API)
🎯 Purpose
Simulate a multi-agent research collaboration where autonomous AI agents gather, analyze, and synthesize information about an AI-related topic using open-source frameworks and the Hugging Face Inference API.
Each agent acts as part of a “virtual research lab” working to produce a coherent research summary.
📘 Repository Name
multi-agent_research-lab🧩 Structure
Objective
Create a three-agent workflow that simulates collaborative research around a chosen AI topic (e.g., “Impact of Synthetic Data in Healthcare” or “Bias in LLMs”).
Agents will communicate using CrewAI (or LangChain Agents) and rely on Hugging Face Inference API models for reasoning and summarization.
🧠 Agents and Roles
⚙️ Environment
Python 3.10+
Frameworks: CrewAI, LangChain, Hugging Face Hub
Editor: VSCode or Google Colab
No local LLMs (inference handled via Hugging Face Inference API)
🧰 Tasks
0️⃣ Setup
Install required libraries:
pip install crewai langchain huggingface_hub duckduckgo-search chromadb pandas
Configure Hugging Face token:
from huggingface_hub import login
login("YOUR_HF_TOKEN")
1️⃣ Define the Agents
Create three agents within CrewAI or LangChain, defining:
Role / Goal / (Tools / APIs) / Memory (if applicable)
Example:
from crewai import Agent
researcher = Agent(
name="Researcher",
goal="Find reliable web sources about the impact of synthetic data in healthcare.",
tools=["duckduckgo-search"],
)
writer = Agent(
name="Writer",
goal="Write a coherent 500-word research summary using retrieved sources.",
llm="HuggingFaceH4/zephyr-7b-beta", # via Hugging Face API
)
reviewer = Agent(
name="Reviewer",
goal="Evaluate and correct factual inconsistencies and coherence issues.",
llm="microsoft/deberta-v3-small"
)
2️⃣ Workflow
Define communication cycles:
Researcher → performs search → returns snippets.
Writer → generates first draft using those snippets.
Reviewer → critiques and refines the text.
Writer → finalizes Markdown report.
Each agent should send messages back and forth using CrewAI’s coordination logic or LangChain’s agent loop.
3️⃣ Tools
Use the DuckDuckGo Search Tool (or Tavily if available) for gathering open-access content:
No BeautifulSoup is needed; extract titles and summaries from search results directly.
4️⃣ Final Output
The Writer Agent generates:
research_summary.md(500 words)Structure:
Introduction
Key Findings
Ethical & Technical Challenges
Conclusion
Reviewer edits should be reflected in the final version.
5️⃣ Evaluation (Rubric)
Total: 20 pts
Total: 20 pts
🧮 Deliverables
/src/agents.py— all agent definitions/notebooks/workflow_demo.ipynb— end-to-end executionresearch_summary.md— final reportrequirements.txt— reproducible environment🛠️ Technical Requirements
Python 3.10+
Libraries:
No local LLMs — only Hugging Face Inference API calls.
⏱️ Duration
8 hours total
2h setup
5h implementation
1h presentation & discussion
🔁 Recommended Workflow
Define agents and roles.
Implement search → writing → reviewing loop.
Generate and store Markdown report.
Present outputs and discuss team collaboration performance.
🧮 Task 2 — Wikipedia-based RAG Summarizer (LangChain + ChromaDB)
🎯 Objective
Build a retrieval-augmented summarization system using open-source Wikipedia data.
Use LangChain + ChromaDB + SentenceTransformers to query, embed, and summarize factual content from Wikipedia without multi-agent coordination.
⚙️ Steps
0️⃣ Environment Setup
Install:
1️⃣ Dataset Creation
Fetch Wikipedia content:
Extract main text and chunk into ~300-word segments.
Save to
/data/wiki_corpus.csvwith columns:id, title, text.2️⃣ Embedding + Vector Store
Embed using:
Upsert all chunks into ChromaDB with metadata.
3️⃣ Query Pipeline
Implement LangChain retrieval chain:
Query example:
4️⃣ Generate and Save Summary
Combine top retrieved results into one coherent summary (400–500 words) and save as
rag_summary.md.🧮 Deliverables (Task 2)
/notebooks/rag_wikipedia.ipynb/data/wiki_corpus.csv/outputs/rag_summary.md/outputs/retrieval_examples.json📏 Rubric (20 pts)
🧩 Bonus Task — Conceptual Comparison (+3 pts)
Write a short Markdown reflection comparing both approaches:
How did the multi-agent workflow handle ambiguity and contradictions?
How did the RAG approach handle factuality and retrieval coverage?
Which approach is better suited for open-ended vs. factual questions?
Save as
/outputs/reflection.md.🛠️ Technical Requirements
Python 3.10+
Dependencies:
Execution:
All notebooks must run fully in Google Colab or VSCode using a local LLM (Ollama or LM Studio).
🔁 Recommended Workflow
Task 1 — Multi-Agent Research
Task 2 — RAG Summarization
Bonus
📤 Submission
Submit both repositories:
1️⃣ multiagent_research-lab
2️⃣ rag_wikipedia-lab
Provide GitHub URLs in the submission sheet:
👉 [Submission Excel – Repository & Dashboard Links]
Deadline: November 15