i pick up a new field fast and turn it into working AI. agents that actually DO the work (not demos). and i don't ship any of it until i've proven it works with independent checks, because untested AI is just a bluff.
- gstack: merged PR #1554, credited in v1.42.0.0
- Claude Code: reported and diagnosed session-bloat bug in the Read tool's binary-file serialization path (root cause analysis + fix sketch)
- Building AI agents: Claude Agent SDK, MCP, Anthropic SDK
- Checking AI quality: agreement scoring (Cohen's kappa, Kendall-tau), benchmarks (MT-Bench), AI-graded-by-AI (LLM-as-judge)
- Languages: Python, TypeScript
- LinkedIn → https://www.linkedin.com/in/mikeilog
- Website → https://mikeilog.com
- Email → mike@mikeilog.com
cooperation FTW · located on Earth




