Skip to content
View waltstephen's full-sized avatar

Highlights

  • Pro

Block or report waltstephen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
waltstephen/README.md

Hi there, I'm Yijia Fan (WaltStephen)! πŸ‘‹

Current Status: Research Intern at MSRA & Undergraduate at SYSU

Welcome to my GitHub profile! I am an Undergraduate Computer Science student at Sun Yat-sen University (SYSU) and a Research Intern at Microsoft Research Asia (MSRA).

My long-term goal is to build a unified world model that seamlessly integrates understanding and generation across modalities and agents.

πŸ“’ I am seeking PhD opportunities for Fall 2027. If you are interested, please contact me directly!


πŸŽ“ About Me

  • Education: B.Eng. in Computer Science, Sun Yat-sen University (Sept 2023 - Present).
  • Research Labs:
    • Microsoft Research Asia (MSRA), Shanghai ML Group.
    • HCP Lab, Sun Yat-sen University.
  • Role: Reviewer for CVPR 2026, AAAI 2026, ICLR 2026.
  • Interests beyond tech: Philosophy (Kant, Nietzsche), Literature (Tolstoy, Kafka), and the intersection of humanity and technology. πŸ“šπŸ€”

πŸ”¬ Research Interests

My research centers on advancing multi-agent collaborative intelligence by integrating Visual Language Models (VLMs) and Large Language Models (LLMs) with Reinforcement Learning (RL).

  • Multi-Agent Systems: Knowledge-aware coordination, Bayesian bandits, Game-theoretic uncertainty trading.
  • Policy Optimization: PPO, MAB, GRPO, and MAPPO-like RL frameworks.
  • Generative Models: Video generation (Hierarchical VAE), 3D generation, and Unified World Models.
  • VLM Architectures: Discretized representations (VQ), Post-training with RFT (LLaVA).

πŸ“ Selected Publications

Check out my Google Scholar for the full list.

  • [AAAI 2026] Cost-Effective Communication: An Auction-based Method for Language Agent Interaction
    Yijia Fan, Kaitong Cai, et al.
  • [AAAI 2026] 3DAlign-DAER: Dynamic Attention Policy and Efficient Retrieval Strategy for Fine-grained 3D-Text Alignment
    Yijia Fan, Jusheng Zhang, Kaitong Cai, et al.
  • [NeurIPS 2025] GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning
    Jusheng Zhang, Yijia Fan, et al.

πŸ’Ό Experience Highlights

  • Microsoft Research Asia (Shanghai ML Group) | July 2025 – Present
    • Improving video generation using hierarchical VAE and exploring next-gen discretized VLM projects.
    • Training Unified video generation/understanding models on multi-node clusters (DeepSpeed).
  • HCP Lab (SYSU) | July 2024 – Present
    • Exploring long-context LLM processing via diffusion models and curiosity-based game systems for few-label classification.

πŸ› οΈ Skills & Tools

  • Languages: Python (PyTorch), C/C++. πŸπŸ’»
  • Frameworks: PyTorch, CUDA, DeepSpeed, PyTorch Lightning. ⚑
  • Tools: Linux, LaTeX, Git. πŸ§πŸ“„

Top Languages

🌟 Let’s Connect!

Whether you’re into AI research, philosophy, or literature, feel free to reach out!

πŸ“« Email: fanyj28@mail2.sysu.edu.cn
πŸ“ Location: Guangzhou / Shanghai, China

Pinned Loading

  1. ArgusBot ArgusBot Public

    ArgusBot: A 24/7 supervisor Agent for Codex CLI and Claude Code CLI that keeps agents running, reviewing, and planning until the job is actually done.

    Python 301 28

  2. KABB KABB Public

    Forked from HCP-AI-Research-Lab/KABB

    [ICML2025] KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems

    Python

  3. Cost-Effective-Communication Cost-Effective-Communication Public

    Offical implementation of Cost-Effective Communication: Auction-based Language Agent Interaction (Fan et al., 2025).

    Python

  4. UniG2U UniG2U Public

    Forked from nssmd/UniG2U

    Python 1

  5. 3DAlign-DAER 3DAlign-DAER Public

    Offical Implementation of 3DAlign-DAER: Dynamic Attention Policy and Efficient Retrieval Strategy for Fine-grained 3D–Text Alignment at Scale

    Python 2

  6. kolmogorovArnoldFourierNetwork/KAF kolmogorovArnoldFourierNetwork/KAF Public

    KAF : Kolmogorov-Arnold Fourier Networks

    Jupyter Notebook 21 2