Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
238 changes: 238 additions & 0 deletions ROADMAP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,238 @@
# `tg` Roadmap — Agent-Automatable Telegram CLI

## Vision

Turn `tg` into a single static binary that an AI agent (or any script) can drive to
automate everyday Telegram work on a **personal account**: message yourself, list and
triage chats, mark chats as read, reply, upload/download files, search history, and more.

The goal is a broad capability surface — and strong agent ergonomics — delivered as a
fast, dependency-free Go binary built on [`gotd/td`](https://github.com/gotd/td), exposed
as composable subcommands. New capability should land as `tg` subcommands.

## The central gap: user-account auth

`tg` today authenticates **only as a bot** (`Config.BotToken`, `client.Auth().Bot(...)`,
and `app.Before` hard-rejects a config without a bot token). Bots cannot:

- access **Saved Messages** / "message yourself",
- list **dialogs** (`messages.getDialogs` is unavailable to bots),
- read arbitrary chat history or mark dialogs as read the way a user can,
- see the chat/folder list, contacts, or most account state.

**Every automation use case in the request requires a user session.** So user login is
Phase 0 and gates everything else. `gotd/td` already supports this: `auth.NewFlow` with a
code-prompt + 2FA handler, and `qrlogin` for QR-based login.

## Agent-first design principles

These cut across every feature and matter as much as the features themselves:

1. **Structured output.** Global `--output json|text` (default `text` for humans, but
agents pass `--output json`). Every command emits a stable, documented JSON object on
stdout. No data on stderr; logs/progress go to stderr only.
2. **Non-interactive by default.** Never block on a TTY prompt in agent mode. Auth codes,
confirmations, and 2FA come from flags / env / a pre-established session.
3. **Stable exit codes.** `0` success, distinct non-zero codes for auth failure, peer not
found, flood-wait exceeded, permission denied — so agents can branch on them.
4. **No decorative output in machine mode.** The upload progress bar and typing-action
updates must be suppressed under `--output json`.
5. **Consistent peer resolution.** One shared `--peer` grammar everywhere: `me`/`self`,
`@username`, numeric ID, phone, `t.me/...` link. Cache resolved access-hashes in the
session dir (StringSession-style sessions have no entity cache — we must keep our own).
6. **Idempotency & safety.** Destructive actions (delete history, leave, ban, delete
account-level state) require `--yes` in agent mode and are clearly marked read-only vs
mutating.
7. **Schema versioning.** Include `"schema": 1` (or similar) in JSON so agents can adapt.
8. **Account selection.** Every command accepts a global `--account <label>` (env
`TG_ACCOUNT`), defaulting to the configured default account. Reserve and thread this
flag from day one even while only one account exists, so adding multi-account later is
additive and never changes existing command signatures.

## Phased plan

### Phase 0 — Foundations (prerequisite for everything)

- [x] **Adopt [`spf13/cobra`](https://github.com/spf13/cobra) for the CLI** (replacing the
current `urfave/cli/v2`). Do this first — every later command is a cobra subcommand, so
migrating after the surface grows is costly. Requirements:
- **Rich documentation**: every command sets `Short`, a long-form `Long`, and runnable
`Example` blocks; group related commands (`cobra.Group`) so `tg --help` reads as a map of
the surface. Generate `man` pages and Markdown docs from the command tree
(`spf13/cobra/doc`) and publish them, so help stays in sync with the code.
- **Thorough autocomplete**: ship `tg completion bash|zsh|fish|powershell` (cobra's built-in
`completion` command). Register `ValidArgsFunction` for dynamic completion of the things
agents and humans actually type — `--peer` (cached dialogs/usernames from the peer cache),
`--account` labels, `--output` values, enum flags — and mark file-taking flags
(`upload`/`download`) with `MarkFlagFilename`. Completions must work without a network
round-trip where possible (read from the local session/peer cache).
- Use `cobra.Command.RunE` (error-returning) throughout; wire global flags as persistent
flags on the root; pair with `spf13/pflag` for GNU-style `--flag`/`-f` parsing.
- [ ] **QR login (primary)**: `tg login` defaults to QR, following
`gotd/td`'s `examples/qrauth.go` + `examples/userbot`. Wiring:
- `dispatcher := tg.NewUpdateDispatcher()` and
`loggedIn := qrlogin.OnLoginToken(&dispatcher)` — the channel fires on
`tg.UpdateLoginToken` when the code is scanned.
- `client.QR().Auth(ctx, loggedIn, show)`, where `show` renders the QR to stderr via
`github.com/mdp/qrterminal/v3` **and** also prints `token.URL()` (the
`tg://login?token=...` link) so it can be opened/rendered elsewhere — agent-friendly.
- 2FA fallback: on `SESSION_PASSWORD_NEEDED`, call `client.Auth().Password(ctx, pwd)`
(retry on `auth.ErrPasswordInvalid`).
- **Requires updates enabled.** Today `app.go` sets `NoUpdates: true` and registers no
`UpdateHandler`, so `loggedIn` would never be notified. The login path must register
the dispatcher as `UpdateHandler` and not disable updates; make `NoUpdates` and the
dispatcher conditional on the command.
- UX: scan once (Settings → Devices → Link Desktop Device); the session persists in
`session.FileStorage`, so all later agent invocations are headless.
- [ ] **Phone login (fallback)**: `tg login --phone` using `auth.NewFlow` (code + 2FA), for
environments where scanning a QR is inconvenient.
- [ ] Persist the user session alongside the existing bot session. Extend `Config`/`init`
to support a user session path; relax `app.Before` so a bot token is no longer
mandatory.
- [ ] **Auth mode selection**: allow a single config to hold both; commands pick user vs
bot session (most new commands are user-only).
- [ ] **`--output json` plumbing**: a small result-writer used by all commands; move logs
and progress to stderr.
- [ ] **Peer resolver + access-hash cache** in the session directory.
- [ ] **Proxy support (global)**: a single `--proxy` flag (config + `TG_PROXY` env)
accepting a URL, threaded into `telegram.Options.Resolver`:
- `socks5://`, `socks4://`, `http(s)://` → `dcs.Plain` with a
`golang.org/x/net/proxy` dialer (already in the module graph via `x/net`; no new dep).
Honor user:password and remote-DNS.
- `tg://proxy?server=&port=&secret=` / MTProxy → native `dcs.MTProxy(addr, secret, …)`,
reusing the link + secret (hex/base64url) parsing from
`gotd/td`'s `examples/mtproxy-connect`.
- Wire it at client construction so every command benefits; per-account overrides come in
Phase 7.
- [ ] **`tg whoami`**: smallest end-to-end user-session command to validate auth.

### Phase 1 — Core agent loop (the explicit asks)

These directly cover "upload files to myself, list chats, mark chats as read, replies".

- [ ] `tg chats list` — list dialogs (paged), with unread counts, pinned/muted/archived
flags, last message preview.
- [ ] `tg messages list <peer>` / `tg history <peer>` — read recent messages.
- [ ] `tg send <peer> <text>` — already exists; extend to default `me` and JSON output.
- [ ] `tg reply <peer> <message-id> <text>` — reply to a specific message.
- [ ] `tg read <peer>` — mark a chat as read.
- [ ] `tg upload` to `me` / Saved Messages — already exists; ensure `me` peer + JSON +
silent progress.
- [ ] `tg download <peer> <message-id> [--out path]` — download media.

### Phase 2 — Messaging depth

- [ ] `edit`, `delete` (single + bulk), `delete-history`.
- [ ] `forward` (single + multiple).
- [ ] `pin` / `unpin` / `unpin-all` / `pinned`.
- [ ] Reactions: `react` / `unreact` / `reactions`.
- [ ] Drafts: `draft set` / `drafts` / `draft clear`.
- [ ] Scheduled messages: `schedule` send / list / delete.
- [ ] Search: `search <peer> <query>` and `search --global <query>`.
- [ ] Polls: `poll create`.
- [ ] Message context / links: `context`, `link`.
- [ ] Albums, voice, stickers, GIFs.

### Phase 3 — Chats & contacts

- [ ] `chat get` / `chat full`.
- [ ] `mute` / `unmute` / `archive` / `unarchive`.
- [ ] `resolve <username>`, `search-public <query>`, `subscribe`.
- [ ] Contacts: list / search / add / delete / block / unblock / blocked / import / export.

### Phase 4 — Groups & channels admin

- [ ] Create group/channel, invite, leave, participants, admins, banned.
- [ ] Admin/moderation: promote/demote, ban/unban, permissions, slow mode, admin rights, recent actions.
- [ ] Edit chat title/photo/about, invite links (export/import/join by link).
- [ ] Forum topics.

### Phase 5 — Profile & folders

- [ ] Profile: get me, update profile, profile photo set/delete, privacy get/set, user info/photos/status.
- [ ] Folders / dialog filters: list / get / create / add-chat / remove-chat / delete / reorder.

### Phase 6 — Realtime (high value for agents)

- [ ] `tg watch <peer>` — stream new messages as JSON lines; the agent's input loop.
- [ ] `tg wait` — block until a new (or settled) incoming message, with timeout.
- [ ] Backed by `gotd/td` update handlers; `NoUpdates: true` (set in `app.go` today) must
become conditional.

### Phase 7 — Multiple concurrent accounts

The `--account` selector (design principle 8) ships from Phase 0, but real multi-account —
several sessions usable, and **live simultaneously** — lands here: a `label -> client` map,
an `--account <label>` selector, and a `tg accounts` listing.

- [ ] **Config for N accounts**: named accounts in the config (or env
`TG_SESSION_<LABEL>`), each with its own `session.FileStorage` file, peer-cache, and
`floodwait.Waiter`. Keep a backward-compatible single "default" account.
- [ ] **`tg accounts`** — list configured accounts + auth status.
- [ ] **`tg login --account <label>`** — mint/refresh a session per label (QR or phone).
- [ ] **Concurrent runtime**: hold a `map[label]*telegram.Client`, each running under its
own waiter; a command resolves `--account` to one client. Clients are independent and
safe to run in parallel — one flood-wait or reconnect must not stall the others.
- [ ] **Fan-out where it makes sense**: `--account all` (or repeated `--account`) for
read/broadcast commands (e.g. `chats list`, `send`, `watch`) — results keyed by label
in JSON. Strictly opt-in; single-account stays the default.
- [ ] **Realtime across accounts**: Phase 6 `watch`/`wait` can observe multiple accounts
concurrently (one dispatcher + update loop per client), merged into one JSON stream.
- [ ] **Per-account proxy**: per-label override of the global proxy (Phase 0), so each
account can route through its own SOCKS5/HTTP/MTProxy.

## Capability groups (reference)

The surface breaks into groups: `accounts`, `chats`, `messages`, `media`, `contacts`,
`groups`, `profile`, `folders`, `events`. The phases above fold each group in: Phase 1
covers the daily-driver subset of `chats`/`messages`/`media`; Phase 2 the rest of
`messages`+`media`; Phase 3 `chats`+`contacts`; Phase 4 `groups`; Phase 5
`profile`+`folders`; Phase 6 `events`; Phase 7 generalizes the whole surface to multiple
concurrent accounts (the `accounts` group).

## Technical notes

- **CLI framework:** `spf13/cobra` (+ `spf13/pflag`), replacing `urfave/cli/v2`. Chosen for
first-class shell completion (`bash`/`zsh`/`fish`/`powershell` with dynamic
`ValidArgsFunction` hooks) and doc generation (`spf13/cobra/doc` → man + Markdown). One
`*cobra.Command` per subcommand with `RunE`; global flags are persistent flags on the root;
the existing `app`/`Before` wiring moves into `PersistentPreRunE`.
- **Library:** `gotd/td` (`telegram`, `tg`, `telegram/message`, `telegram/uploader`,
`telegram/downloader`, `telegram/query` for dialog/history pagination, `auth`,
`auth/qrlogin`, `telegram/peers` for resolution/caching).
- **QR login** adds one dependency: `github.com/mdp/qrterminal/v3` (terminal QR rendering),
matching `gotd/td`'s `examples/qrauth.go`. The QR flow is also a forcing function to
introduce an update dispatcher (`tg.NewUpdateDispatcher`) and turn off the current
`NoUpdates: true` for commands that need it — which Phase 6 (realtime) needs anyway.
- **Flood-wait** is already handled via the `floodwait.Waiter` middleware — keep it; add a
distinct exit code when retries are exhausted.
- **Proxy** is configured through `telegram.Options.Resolver` (`dcs.Resolver`), not a global
dialer: `dcs.Plain{Dial: <x/net/proxy dialer>.DialContext}` for SOCKS/HTTP, or
`dcs.MTProxy(addr, secret, …)` for MTProxy. See `gotd/td`'s `telegram/dcs/example_test.go`
and `examples/mtproxy-connect`. Each account builds its own resolver, so per-account
proxies fall out naturally in Phase 7.
- **Sessions:** keep using `session.FileStorage` (per-account file); the user session lives
beside the existing bot session. Consider an export-to-string command for portability.
- **Multi-account isolation (Phase 7):** one `telegram.Client` per label, each with its own
session file, peer-cache, and `floodwait.Waiter` — no shared mutable state between
accounts, so they run concurrently without one blocking another. The runtime owns a
`map[label]*telegram.Client`; commands select by `--account`. Single-account remains the
default and costs nothing.
- **Pagination:** dialog and history listing must page (`page`/`page_size`); use
`query.GetDialogs`/`query.Messages` iterators.
- **Output stability:** define Go structs for each command's JSON result; never leak raw
MTProto types into the contract.

## Open questions

1. **Login UX for agents:** interactive `tg login` once to mint a session, then agents reuse
it headless — or support a fully headless code-delivery path? (Recommend: interactive
one-time login, headless reuse.)
2. **Keep bot support?** Recommend yes — some flows (channel posting) are fine as a bot — but
make user the default for the agent commands.
3. **Scope of v1:** propose shipping Phases 0–1 as the first milestone, since they cover the
exact asks (message self, list chats, mark read, reply, upload).
4. **Multi-account timing:** ship single-account first but reserve `--account` from Phase 0,
so full concurrent multi-account (Phase 7) is purely additive. Should fan-out
(`--account all`) be in scope, or only explicit single-label selection?
</content>
82 changes: 53 additions & 29 deletions cmd/tg/app.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,13 @@ package main
import (
"context"
"crypto/md5" // #nosec
"errors"
"fmt"
"os"
"path/filepath"

"github.com/urfave/cli/v2"
"github.com/go-faster/errors"
"github.com/spf13/cobra"
"go.uber.org/zap"
"golang.org/x/xerrors"
"gopkg.in/yaml.v3"

"github.com/gotd/contrib/middleware/floodwait"
Expand All @@ -23,7 +22,20 @@ import (
"github.com/gotd/cli/internal/pretty"
)

// Config is the persisted CLI configuration.
type Config struct {
AppID int `yaml:"app_id"`
AppHash string `yaml:"app_hash"`
BotToken string `yaml:"bot_token"`
}

// app holds shared state and the values of the global (persistent) flags.
type app struct {
// Global flags, bound to the root command's persistent flags.
configPath string
debugInvoker bool
testServer bool

cfg Config
log *zap.Logger
waiter *floodwait.Waiter
Expand Down Expand Up @@ -56,22 +68,22 @@ func newApp() *app {
}
}

func (p *app) run(ctx context.Context, f func(ctx context.Context, api *tg.Client) error) error {
if p.debug {
p.opts.Middlewares = append(p.opts.Middlewares, pretty.Middleware())
func (a *app) run(ctx context.Context, f func(ctx context.Context, api *tg.Client) error) error {
if a.debug {
a.opts.Middlewares = append(a.opts.Middlewares, pretty.Middleware())
}

client := telegram.NewClient(p.cfg.AppID, p.cfg.AppHash, p.opts)
client := telegram.NewClient(a.cfg.AppID, a.cfg.AppHash, a.opts)

if err := p.waiter.Run(ctx, func(ctx context.Context) error {
if err := a.waiter.Run(ctx, func(ctx context.Context) error {
return client.Run(ctx, func(ctx context.Context) error {
s, err := client.Auth().Status(ctx)
if err != nil {
return xerrors.Errorf("check auth status: %w", err)
return errors.Wrap(err, "check auth status")
}
if !s.Authorized {
if _, err := client.Auth().Bot(ctx, p.cfg.BotToken); err != nil {
return xerrors.Errorf("check auth status: %w", err)
if _, err := client.Auth().Bot(ctx, a.cfg.BotToken); err != nil {
return errors.Wrap(err, "check auth status")
}
}
return f(ctx, tg.NewClient(client))
Expand All @@ -83,42 +95,54 @@ func (p *app) run(ctx context.Context, f func(ctx context.Context, api *tg.Clien
return nil
}

func (p *app) Before(c *cli.Context) error {
// HACK for init.
if c.Command.Name == "init" {
// skipConfigCommands are commands that must run without a loaded config/session.
func skipConfig(cmd *cobra.Command) bool {
for c := cmd; c != nil; c = c.Parent() {
switch c.Name() {
case "init", "docs", "completion", "help",
cobra.ShellCompRequestCmd, cobra.ShellCompNoDescRequestCmd:
return true
}
}
return false
}

// before loads config and prepares client options. It is wired as the root
// command's PersistentPreRunE, so it runs before every subcommand.
func (a *app) before(cmd *cobra.Command) error {
if skipConfig(cmd) {
return nil
}

cfgPath := c.String("config")
if cfgPath == "" {
return xerrors.Errorf("no config path provided")
if a.configPath == "" {
return errors.New("no config path provided")
}

data, err := os.ReadFile(cfgPath) // #nosec G304 // path provided via flag
data, err := os.ReadFile(a.configPath) // #nosec G304 // path provided via flag
if err != nil {
return err
}

if err := yaml.Unmarshal(data, &p.cfg); err != nil {
if err := yaml.Unmarshal(data, &a.cfg); err != nil {
return err
}

if p.cfg.BotToken == "" {
return xerrors.Errorf("no bot token provided")
if a.cfg.BotToken == "" {
return errors.New("no bot token provided")
}

// Default to same directory (near with config).
// Probably there is better way to handle this.
sessionName := fmt.Sprintf("gotd.session.%x.json", md5.Sum([]byte(p.cfg.BotToken))) // #nosec
p.opts.Logger = logzap.New(p.log.Named("tg"))
p.opts.SessionStorage = &session.FileStorage{
Path: filepath.Join(filepath.Dir(cfgPath), sessionName),
sessionName := fmt.Sprintf("gotd.session.%x.json", md5.Sum([]byte(a.cfg.BotToken))) // #nosec
a.opts.Logger = logzap.New(a.log.Named("tg"))
a.opts.SessionStorage = &session.FileStorage{
Path: filepath.Join(filepath.Dir(a.configPath), sessionName),
}
if c.Bool("test") {
p.opts.DCList = dcs.Test()
if a.testServer {
a.opts.DCList = dcs.Test()
}
if c.Bool("debug-invoker") {
p.debug = true
if a.debugInvoker {
a.debug = true
}

return nil
Expand Down
Loading
Loading