Skip to content

chankjen/Python_Data_Analysis

Repository files navigation

#🌟 Ubuntu-Powered Python Data Analysis: “I Am Because We Are” in Code 🌍🏥


🎯 Lesson Goal:

By the end of this lesson, you’ll be able to load, explore, clean, and visualize hospital data using Python — all while embracing the Ubuntu philosophy: “I am because we are.” In data, no row is an island. Every patient, every record, every number — they matter because they are part of a community. Let’s honor that together.


🧑‍⚕️ What You’ll Need:

  • Python (with Jupyter Notebook or Google Colab recommended)
  • Libraries: pandas, matplotlib, seaborn, numpy
  • A fun, beginner-friendly attitude!

💡 Ubuntu Tip: Just like in a village, we help each other learn. If you get stuck, ask a friend, search online, or revisit — learning is a shared journey.


📦 Step 1: Create Our Ubuntu Hospital Dataset 🏥

Since we’re focusing on fun and accessibility, let’s create a fictional hospital dataset inspired by Ubuntu values — where every patient’s story contributes to the health of the whole community.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set style for beautiful plots — because beauty honors the community!
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (10, 6)

# 🌿 Ubuntu Hospital Dataset: "We care because we are together."
data = {
    'Patient_ID': range(1, 101),
    'Age': np.random.randint(1, 90, size=100),
    'Gender': np.random.choice(['Male', 'Female'], size=100),
    'Condition': np.random.choice(['Flu', 'Diabetes', 'Hypertension', 'Asthma', 'Healthy'], size=100, p=[0.3, 0.2, 0.2, 0.2, 0.1]),
    'Treatment_Duration_Days': np.random.randint(1, 15, size=100),
    'Satisfaction_Score': np.random.randint(1, 11, size=100),  # 1-10 scale
    'Follow_Up_Needed': np.random.choice([True, False], size=100, p=[0.4, 0.6])
}

# Create DataFrame — our digital village square 🌳
df = pd.DataFrame(data)

print("🏥 Welcome to Ubuntu General Hospital!")
print("Where every patient’s data is honored and cared for.\n")
print(df.head(10))  # Show first 10 villagers (patients)

💬 Ubuntu Reflection: Each row is a person. Their age, condition, satisfaction — these aren’t just numbers. They represent lived experiences. Handle them with care.


🔍 Step 2: Explore the Data — Know Your Community

Before we act, we listen. In Ubuntu, understanding comes before action.

# 🧭 Basic exploration — Who are our patients?
print("=== Community Snapshot ===")
print(f"Total Patients: {len(df)}")
print(f"Average Age: {df['Age'].mean():.1f} years")
print(f"Most Common Condition: {df['Condition'].mode()[0]}")
print(f"Average Satisfaction: {df['Satisfaction_Score'].mean():.2f}/10")

# 📊 Let’s see the gender distribution — balance matters in Ubuntu
print("\n=== Gender Harmony ===")
print(df['Gender'].value_counts())

# 📈 Visualize Conditions — see what the community is facing together
plt.figure(figsize=(10, 6))
sns.countplot(data=df, x='Condition', palette='viridis')
plt.title("🩺 Community Health Conditions — We Rise By Lifting All", fontsize=16)
plt.xlabel("Condition")
plt.ylabel("Number of Patients")
plt.xticks(rotation=45)
plt.show()

💬 Ubuntu Reflection: When we visualize data, we’re not just making charts — we’re giving voice to the community’s needs. Who is suffering most? Who needs more care?


🧹 Step 3: Clean the Data — Healing the Gaps Together

In Ubuntu, we don’t leave anyone behind — not even missing data!

Let’s imagine some records have missing satisfaction scores. We’ll fill them with the community average — because we support each other.

# 🤕 Introduce some missing data (for teaching purposes)
df.loc[np.random.choice(df.index, size=10), 'Satisfaction_Score'] = np.nan

print("\n=== Before Healing ===")
print(f"Missing Satisfaction Scores: {df['Satisfaction_Score'].isnull().sum()}")

# 🤝 Ubuntu Clean: Fill missing scores with community average — collective care!
avg_satisfaction = df['Satisfaction_Score'].mean()
df['Satisfaction_Score'].fillna(avg_satisfaction, inplace=True)

print("=== After Ubuntu Healing ===")
print(f"Missing Satisfaction Scores: {df['Satisfaction_Score'].isnull().sum()}")
print(f"Filled with community average: {avg_satisfaction:.2f}")

💬 Ubuntu Reflection: Instead of deleting incomplete records (abandoning people), we uplift them using the wisdom of the whole. That’s Ubuntu data science.


📈 Step 4: Analyze & Visualize — Wisdom Through Sharing

Let’s find insights that help our hospital serve better — together.

# 🧠 Question: Do certain conditions lead to longer treatments?
plt.figure(figsize=(12, 6))
sns.boxplot(data=df, x='Condition', y='Treatment_Duration_Days', palette='Set2')
plt.title("⏳ Treatment Duration by Condition — Understanding to Serve Better", fontsize=16)
plt.xlabel("Condition")
plt.ylabel("Days of Treatment")
plt.xticks(rotation=45)
plt.show()

# 💡 Insight: Maybe Asthma patients need longer care? Let's check!
avg_treatment_by_condition = df.groupby('Condition')['Treatment_Duration_Days'].mean()
print("\n=== Average Treatment Duration by Condition ===")
print(avg_treatment_by_condition.round(2))

# ❤️ Question: Is satisfaction linked to treatment duration?
plt.figure(figsize=(10, 6))
sns.scatterplot(data=df, x='Treatment_Duration_Days', y='Satisfaction_Score', hue='Condition', palette='deep', s=100)
plt.title("😊 Satisfaction vs Treatment Duration — Are Longer Stays Less Happy?", fontsize=16)
plt.xlabel("Treatment Duration (Days)")
plt.ylabel("Satisfaction (1-10)")
plt.legend(title='Condition')
plt.show()

# 🔢 Correlation check
corr = df['Treatment_Duration_Days'].corr(df['Satisfaction_Score'])
print(f"\nCorrelation between Treatment Duration and Satisfaction: {corr:.2f}")
print("Slight negative? Maybe longer stays = slightly less happy. Let’s dig deeper together!")

💬 Ubuntu Reflection: Data reveals patterns — but only when we ask compassionate questions. Why might longer treatments lower satisfaction? Can we improve the experience?


🤖 Step 5: Simple Prediction — Serving the Future Together

Let’s predict if a patient will need follow-up, based on their condition and satisfaction — so we can prepare resources before they ask.

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix

# 🔄 Convert categorical to numeric (so the machine can understand our village)
df_encoded = pd.get_dummies(df, columns=['Condition', 'Gender'], drop_first=True)

# 🎯 Target: Follow_Up_Needed
X = df_encoded.drop(['Follow_Up_Needed', 'Patient_ID'], axis=1)
y = df_encoded['Follow_Up_Needed']

# 🤲 Ubuntu Split: Share data between training (learning) and testing (serving)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 🌱 Grow a Forest of Wisdom (Random Forest Classifier)
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

# 🎯 Predict
y_pred = model.predict(X_test)

# 📋 Ubuntu Report: How well did we serve?
print("\n=== 🌿 Ubuntu Care Prediction Report ===")
print(classification_report(y_test, y_pred))

# 🖼️ Confusion Matrix — Who did we miss? Let’s learn together.
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['No Follow-Up', 'Follow-Up'], yticklabels=['No Follow-Up', 'Follow-Up'])
plt.title("🔍 Confusion Matrix — Where Can We Improve Our Care?")
plt.ylabel("Actual")
plt.xlabel("Predicted")
plt.show()

💬 Ubuntu Reflection: Even machines must learn with humility. If our model misclassifies, we don’t blame the data — we ask: “What did we miss? How can we listen better next time?”


🌈 Step 6: Celebrate & Reflect — Ubuntu Closing Circle

print("\n🎉 CONGRATULATIONS! You’ve completed the Ubuntu Data Journey.")
print("You didn’t just analyze data — you honored stories, healed gaps, and predicted needs with compassion.")

print("\n🌿 Ubuntu Principles in Your Code:")
print("→ You treated missing data with community averages — no one left behind.")
print("→ You visualized conditions to understand collective burdens.")
print("→ You predicted needs to prepare resources — proactive care.")
print("→ You reflected on meaning, not just metrics.")

print("\n💡 Next Steps:")
print("→ Try this with a real Kaggle health dataset (e.g., ‘Heart Disease UCI’ or ‘Diabetes Health Indicators’).")
print("→ Add more Ubuntu: Ask — ‘Who is not represented in this data?’")
print("→ Share your notebook with a friend. Learning grows when shared.")

print("\n🌍 Because in data, as in life: I am because we are.")

📚 Bonus: Recommended Kaggle Datasets (for your next Ubuntu project!)

💡 Ubuntu Challenge: When you use real data, ask:
“Whose voices are missing? How can my analysis serve the underserved?”


✅ Summary Checklist (Ubuntu Edition):

  • Created or loaded a health dataset with care 🏥
  • Explored the community’s needs 📊
  • Cleaned data with compassion (no one deleted!) 🤝
  • Visualized to understand, not just to impress 🎨
  • Built a model to serve, not just to score 🤖
  • Reflected on the human stories behind the numbers ❤️
  • Shared your learning or notebook with someone 🌍

🙏 Final Ubuntu Blessing:

“May your code be clean, your charts be clear, and your heart remember — behind every row, there is a person.
Analyze with wisdom. Serve with compassion.
I am because we are.


📬 Share your Ubuntu Data Notebook with #UbuntuDataScience — let’s build a global village of compassionate analysts!

Happy coding, data healer. 🌿🐍📊

About

PLP ACADEMY WEEK 7

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors