Spec: Weekly Investment Decision System¶
North Star¶
Every week, run one scheduled workflow that gathers portfolio, market, macro, and thesis data, then outputs:
- A recommendation per position/thesis:
BUY_MORE,HOLD,TRIM,EXIT. - Portfolio-level risk analysis with quantitative metrics (concentration, beta, Sharpe, drawdown, correlation).
- Performance measurement against benchmarks (SPY, QQQ).
- Income and tax awareness (dividends, realized gains, harvest candidates).
- Allocation drift detection with rebalancing suggestions.
- A prioritized action list with evidence and confidence.
This system is advisory-only (no auto-trading).
Primary Goals¶
- Make investment decision information easy to access and consistent.
- Tie recommendations to explicit, persistent thesis conditions and supporting evidence.
- Track total return: capital appreciation, dividend income, and tax impact.
- Measure portfolio performance against benchmarks over time.
- Detect allocation drift and concentration risk with policy-driven thresholds.
- Provide repeatable weekly process with strong traceability and security.
- Keep implementation reusable across CLI, REST-API, and event-based interfaces (Telegram, Slack, webhooks).
Core Weekly Workflow (Product-Level)¶
- Trigger weekly review run (scheduled and on-demand).
- Gather decision inputs (portfolio, market, macro, thesis, SEC filings, news, insider/institutional signals).
- Evaluate thesis state, portfolio risk state, and allocation drift.
- Compute performance vs. benchmarks, income summary, and tax snapshot.
- Produce recommendation set and prioritized actions.
- Persist review artifacts, accumulated state (SQLite), and audit records.
Required Outputs (Per Weekly Run)¶
artifacts/reports/weekly/YYYY-MM-DD.jsonartifacts/reports/weekly/YYYY-MM-DD.md- Structured event/audit entries (source + timestamp + lineage)
- Portfolio snapshot rows (SQLite) for longitudinal analysis
Report sections¶
- Executive summary
- Performance vs. benchmark (portfolio TWR, alpha, Sharpe, max drawdown)
- Position/thesis decisions with scorecard
- Allocation drift (target vs. actual sector weights, rebalancing suggestions)
- Income summary (dividends received, upcoming ex-dates, portfolio yield)
- Tax snapshot (YTD realized gains ST/LT, harvestable unrealized losses)
- Evidence table (source link, timestamp, extracted claim, decision it supports)
- Portfolio risk snapshot and breaches (concentration, beta, correlation flags)
- Insider and institutional signals (notable buys/sells, ownership changes)
- Ranked action items
JSON schema contract (minimum top-level keys)¶
{
"metadata": { "run_id": "...", "date": "YYYY-MM-DD", "trigger": "scheduled|manual" },
"decisions": [],
"performance": {},
"income": {},
"allocation": {},
"tax": {},
"risk_snapshot": {},
"action_items": []
}
Data Sources¶
- Portfolio and brokerage data (accounts, positions, orders, cost basis, holding periods, dividend history, realized gains/losses).
- Market data (index and sector performance, ticker quotes, earnings calendar, news, dividend yield/ex-dates).
- Macro/economic indicators (rates, inflation, labor market, VIX, sector rotation).
- SEC filings (10-K, 10-Q, 8-K, 13F, Form 4).
- Insider transactions and institutional ownership changes.
- Thesis definitions (per-symbol conditions, catalysts, exit criteria, and disconfirming signals).
- Web search (analyst coverage, breaking news via Brave Search API).
Investment Thesis Registry¶
Each tracked position has a persistent thesis file (investments/theses/<symbol>.yaml) that records:
- Entry rationale and date.
- Price target and catalyst timeline.
- Conditions for adding, trimming, and exiting.
- Disconfirming signals that would trigger review.
- Conviction tier (full, half, starter) for position sizing.
The thesis registry is the foundation for scoring alignment and generating recommendations. Without a persistent thesis, scoring is subjective session-to-session.
Intelligence Model¶
Key assumption: The BOF CLI does not call LLM APIs. All AI reasoning happens in the calling agent session (e.g., Claude Code, a custom Agent SDK app, or a human reading the output). The CLI is a structured data tool — it fetches, computes, stores, and renders. The agent interprets, evaluates, and writes back conclusions via CLI commands.
Layers¶
Layer 1 (CLI — automated): ingestion, rule-based evaluation, deterministic computation, draft report generation. Layer 2 (agent — external): the calling agent reads CLI outputs (rubric prompts, SEC filings, thesis conditions, gathered context), reasons over them, and writes structured results back via CLI commands (scores, evidence, annotations, narrative). Layer 3 (human — interactive): human-in-the-loop review and final sign-off.
Agent workflow (weekly review)¶
bof review weekly --dry-run— CLI gathers all data sources, renders rubric evaluation prompts per symbol.- Agent reads prompts and gathered context, evaluates each symbol against each rubric.
bof review score SYMBOL --metric M --score N --rationale "..."— agent records scores.bof sec read SYMBOL --type 8-K— agent reads recent filings, extracts material facts.bof thesis check SYMBOL— CLI outputs thesis conditions vs. current state; agent evaluates qualitative conditions.bof review evidence SYMBOL --source-type news --claim "..." --source-url "..." --attribution "..."— agent records structured evidence.bof review annotate --run-id ID --field executive_summary --value "..."— agent writes narrative sections.bof review finalize --run-id ID— CLI computes composites, applies action thresholds, produces final report.
Design implications¶
- No LLM SDK dependency in the BOF codebase.
- Any agent works — the CLI is agent-agnostic. A Claude Code session, a custom agent, or a human can drive the same workflow.
- Determinism preserved — composite scores, action thresholds, and risk computations are always rule-based. Agent judgment enters only through score assignment, evidence extraction, and narrative text.
- Fallback is natural — if no agent evaluates rubrics, the CLI still produces a data-complete draft report with empty scores. A human can fill them in.
Scoring rubrics¶
Rubric YAML files in investments/scoring/ define transparent, versioned evaluation dimensions with scales, anchor descriptions, and evaluation prompt templates. The CLI renders these prompts with gathered context; the agent evaluates them. Composite math (weighted average, action thresholds) is deterministic within the CLI.
Risk and Portfolio Analytics¶
Quantitative metrics¶
- Position concentration: single name %, top-5 %, HHI index.
- Sector concentration: actual vs. target allocation drift.
- Portfolio beta: weighted-average beta to SPY.
- Correlation flags: pairs of holdings with rolling correlation above threshold.
- Max drawdown: worst peak-to-trough over trailing window.
- Sharpe ratio: excess return per unit of volatility.
Qualitative signals¶
- Insider transaction direction (net buy/sell) for held symbols.
- Institutional ownership trend (increasing/decreasing).
Policy-driven thresholds¶
Risk policy (investments/risk-policy.yaml) defines breach thresholds and severity levels. Breaches surface as high-priority action items.
Performance Measurement¶
- Time-weighted return (TWR) as the standard for portfolio performance.
- Benchmark comparison (SPY, QQQ) with alpha computation.
- Historical trend analysis across weekly reports (week-over-week return, score trends).
- Accumulated state stored in SQLite for longitudinal queries.
Income and Tax Tracking¶
Income¶
- Dividend yield, ex-dates, and payout history per position.
- Income received (realized cash flow) and yield-on-cost.
- Portfolio-level yield summary.
Tax awareness¶
- Cost basis with tax-lot level visibility (from brokerage data).
- Realized gains/losses with short-term vs. long-term classification.
- Tax-loss harvesting signals (unrealized losses with wash-sale window check).
- Tax-aware rebalancing (prefer selling long-term lots, harvest short-term losses).
Asset Allocation and Rebalancing¶
- Target allocations defined in
investments/allocation.yaml(sector weights, max single-position concentration, cash reserve target). - Drift detection: actual vs. target with threshold alerts.
- Rebalancing suggestions with tax-aware trade sizing.
- Position sizing tiers by conviction level (full, half, starter) linked to thesis definitions.
Alerts and Threshold Monitoring¶
Portfolio-level event predicates wired into the existing event service:
- Price alerts: symbol crosses user-defined threshold.
- Allocation drift alerts: drift exceeds policy threshold.
- Earnings proximity alerts: N days before earnings for held symbols.
- Dividend ex-date alerts: before ex-date for income positions.
Configuration: investments/alerts.yaml.
Service Mode¶
The service mode (bof service run) is a long-running harness that orchestrates agent sessions and human interaction without manual invocation.
Components¶
- Event bus and router — async pub/sub (existing) routes triggers to handlers.
- Cron scheduler — triggers workflows on schedule (e.g., weekly review every Sunday evening). The cron tick is the entry point; the handler spawns a headless agent session to execute the full workflow.
- Telegram interface — bidirectional conversation with a human. The bot receives messages, routes them to an agent session for interpretation, and sends structured replies back. Supports interactive workflows: "show me AAPL thesis", "what changed this week?", "approve the review".
- Agent orchestrator — spawns headless agent sessions (e.g.,
claude -p) to execute multi-step workflows. Manages session lifecycle, timeouts, and output routing. The agent usesbofCLI commands as tools, reads their output, reasons, and writes back results. - Memory store — tiered memory system that gives agents continuity across sessions.
Agent orchestration¶
When a trigger fires (cron tick, Telegram message), the service:
- Assembles context: trigger payload, relevant memory tiers, current portfolio state.
- Spawns a headless agent session with a structured prompt and access to
bofCLI commands. - The agent executes the workflow (e.g., weekly review steps 1–8), calling
bofcommands as needed. - Captures agent output and routes it: report artifacts to disk, summaries to Telegram, events to audit log.
- Updates memory store with session outcomes.
Telegram as conversational interface¶
The Telegram bot supports two interaction modes:
- Proactive — service-initiated messages: weekly review summary, alert notifications, earnings reminders.
- Reactive — human-initiated queries: "how is NVDA doing?", "what's my allocation drift?", "run a review now". The service spawns an agent session to interpret the query, gather data via
bofcommands, and compose a reply.
Memory System¶
Agents need continuity across sessions. The memory system is tiered by volatility and purpose.
Short-term memory¶
- Scope: per-session, ephemeral.
- Content: current review context, in-progress conversation state, Telegram thread context.
- Storage: passed to the agent session as prompt context. Not persisted after session ends.
- Example: "User just asked about AAPL. I already fetched the quote and thesis — don't re-fetch."
Long-term memory¶
- Scope: across sessions, durable.
- Content: past review outcomes, thesis evolution over time, what worked and what didn't, patterns observed ("every time we held through earnings on X, it worked out"), cross-report trend observations.
- Storage: SQLite tables or structured files in
artifacts/memory/. Agent reads relevant entries at session start; writes new observations at session end. - Example: "Last week's review flagged TSLA concentration at 14%. User chose to hold. It's now 16%."
Values and beliefs (investment policy)¶
- Scope: rarely changes, human-edited.
- Content: investment philosophy, risk tolerance, time horizon, sector convictions, behavioral guardrails. This is the equivalent of an Investment Policy Statement (IPS).
- Storage:
investments/policy.yaml— version-controlled, human-authored. - Example constraints:
- "I am a long-term investor (5+ year horizon)."
- "I don't short stocks."
- "Maximum 10% in any single position."
- "I believe in dollar-cost averaging into conviction positions."
- "I prefer companies with growing dividends."
- How agents use it: the policy is loaded into every agent session as foundational context. It constrains recommendations — the agent should not suggest actions that violate the policy without explicitly flagging the conflict.
Delivery Interfaces¶
- CLI (primary) — interactive and scriptable.
- Service mode — long-running harness with cron triggers, agent orchestration, and Telegram interface.
Non-Goals (Initial)¶
- Autonomous trade execution.
- Intraday high-frequency decisions.
- Complex ML model-based signal generation.
- Full budget/expense tracking.
- Options or fixed income instrument analysis.
Acceptance Criteria (System-Level)¶
- Weekly job runs on schedule and on demand.
- Automated pipeline produces a complete draft report covering all report sections.
- Every tracked position has a persistent thesis; recommendations map to thesis conditions with evidence.
- Portfolio risk report includes quantitative metrics (beta, Sharpe, drawdown, concentration) and identifies threshold breaches.
- Performance section shows portfolio return vs. benchmark with alpha.
- Income section shows dividends received, upcoming ex-dates, and portfolio yield.
- Tax snapshot flags harvestable losses with correct short-term/long-term classification.
- Allocation section shows target vs. actual with drift flags and rebalancing suggestions.
- Recommendations include evidence, confidence scores, and are priority-ranked.
- Outputs are consumable across multiple interfaces.
- Service mode: cron trigger spawns a headless agent that completes a full weekly review without human intervention.
- Service mode: Telegram bot supports bidirectional conversation — human can query portfolio state and receive agent-composed replies.
- Service mode: agent sessions load investment policy and long-term memory as foundational context.
Implementation Reference¶
Detailed implementation decisions, source adapters, storage architecture, event design, and phased delivery plan are maintained in Implementation Plan.