Skip to content

Spec: Weekly Investment Decision System

North Star

Every week, run one scheduled workflow that gathers portfolio, market, macro, and thesis data, then outputs:

  1. A recommendation per position/thesis: BUY_MORE, HOLD, TRIM, EXIT.
  2. Portfolio-level risk analysis with quantitative metrics (concentration, beta, Sharpe, drawdown, correlation).
  3. Performance measurement against benchmarks (SPY, QQQ).
  4. Income and tax awareness (dividends, realized gains, harvest candidates).
  5. Allocation drift detection with rebalancing suggestions.
  6. A prioritized action list with evidence and confidence.

This system is advisory-only (no auto-trading).

Primary Goals

  1. Make investment decision information easy to access and consistent.
  2. Tie recommendations to explicit, persistent thesis conditions and supporting evidence.
  3. Track total return: capital appreciation, dividend income, and tax impact.
  4. Measure portfolio performance against benchmarks over time.
  5. Detect allocation drift and concentration risk with policy-driven thresholds.
  6. Provide repeatable weekly process with strong traceability and security.
  7. Keep implementation reusable across CLI, REST-API, and event-based interfaces (Telegram, Slack, webhooks).

Core Weekly Workflow (Product-Level)

  1. Trigger weekly review run (scheduled and on-demand).
  2. Gather decision inputs (portfolio, market, macro, thesis, SEC filings, news, insider/institutional signals).
  3. Evaluate thesis state, portfolio risk state, and allocation drift.
  4. Compute performance vs. benchmarks, income summary, and tax snapshot.
  5. Produce recommendation set and prioritized actions.
  6. Persist review artifacts, accumulated state (SQLite), and audit records.

Required Outputs (Per Weekly Run)

  1. artifacts/reports/weekly/YYYY-MM-DD.json
  2. artifacts/reports/weekly/YYYY-MM-DD.md
  3. Structured event/audit entries (source + timestamp + lineage)
  4. Portfolio snapshot rows (SQLite) for longitudinal analysis

Report sections

  1. Executive summary
  2. Performance vs. benchmark (portfolio TWR, alpha, Sharpe, max drawdown)
  3. Position/thesis decisions with scorecard
  4. Allocation drift (target vs. actual sector weights, rebalancing suggestions)
  5. Income summary (dividends received, upcoming ex-dates, portfolio yield)
  6. Tax snapshot (YTD realized gains ST/LT, harvestable unrealized losses)
  7. Evidence table (source link, timestamp, extracted claim, decision it supports)
  8. Portfolio risk snapshot and breaches (concentration, beta, correlation flags)
  9. Insider and institutional signals (notable buys/sells, ownership changes)
  10. Ranked action items

JSON schema contract (minimum top-level keys)

{
  "metadata": { "run_id": "...", "date": "YYYY-MM-DD", "trigger": "scheduled|manual" },
  "decisions": [],
  "performance": {},
  "income": {},
  "allocation": {},
  "tax": {},
  "risk_snapshot": {},
  "action_items": []
}

Data Sources

  1. Portfolio and brokerage data (accounts, positions, orders, cost basis, holding periods, dividend history, realized gains/losses).
  2. Market data (index and sector performance, ticker quotes, earnings calendar, news, dividend yield/ex-dates).
  3. Macro/economic indicators (rates, inflation, labor market, VIX, sector rotation).
  4. SEC filings (10-K, 10-Q, 8-K, 13F, Form 4).
  5. Insider transactions and institutional ownership changes.
  6. Thesis definitions (per-symbol conditions, catalysts, exit criteria, and disconfirming signals).
  7. Web search (analyst coverage, breaking news via Brave Search API).

Investment Thesis Registry

Each tracked position has a persistent thesis file (investments/theses/<symbol>.yaml) that records:

  • Entry rationale and date.
  • Price target and catalyst timeline.
  • Conditions for adding, trimming, and exiting.
  • Disconfirming signals that would trigger review.
  • Conviction tier (full, half, starter) for position sizing.

The thesis registry is the foundation for scoring alignment and generating recommendations. Without a persistent thesis, scoring is subjective session-to-session.

Intelligence Model

Key assumption: The BOF CLI does not call LLM APIs. All AI reasoning happens in the calling agent session (e.g., Claude Code, a custom Agent SDK app, or a human reading the output). The CLI is a structured data tool — it fetches, computes, stores, and renders. The agent interprets, evaluates, and writes back conclusions via CLI commands.

Layers

Layer 1 (CLI — automated): ingestion, rule-based evaluation, deterministic computation, draft report generation. Layer 2 (agent — external): the calling agent reads CLI outputs (rubric prompts, SEC filings, thesis conditions, gathered context), reasons over them, and writes structured results back via CLI commands (scores, evidence, annotations, narrative). Layer 3 (human — interactive): human-in-the-loop review and final sign-off.

Agent workflow (weekly review)

  1. bof review weekly --dry-run — CLI gathers all data sources, renders rubric evaluation prompts per symbol.
  2. Agent reads prompts and gathered context, evaluates each symbol against each rubric.
  3. bof review score SYMBOL --metric M --score N --rationale "..." — agent records scores.
  4. bof sec read SYMBOL --type 8-K — agent reads recent filings, extracts material facts.
  5. bof thesis check SYMBOL — CLI outputs thesis conditions vs. current state; agent evaluates qualitative conditions.
  6. bof review evidence SYMBOL --source-type news --claim "..." --source-url "..." --attribution "..." — agent records structured evidence.
  7. bof review annotate --run-id ID --field executive_summary --value "..." — agent writes narrative sections.
  8. bof review finalize --run-id ID — CLI computes composites, applies action thresholds, produces final report.

Design implications

  • No LLM SDK dependency in the BOF codebase.
  • Any agent works — the CLI is agent-agnostic. A Claude Code session, a custom agent, or a human can drive the same workflow.
  • Determinism preserved — composite scores, action thresholds, and risk computations are always rule-based. Agent judgment enters only through score assignment, evidence extraction, and narrative text.
  • Fallback is natural — if no agent evaluates rubrics, the CLI still produces a data-complete draft report with empty scores. A human can fill them in.

Scoring rubrics

Rubric YAML files in investments/scoring/ define transparent, versioned evaluation dimensions with scales, anchor descriptions, and evaluation prompt templates. The CLI renders these prompts with gathered context; the agent evaluates them. Composite math (weighted average, action thresholds) is deterministic within the CLI.

Risk and Portfolio Analytics

Quantitative metrics

  • Position concentration: single name %, top-5 %, HHI index.
  • Sector concentration: actual vs. target allocation drift.
  • Portfolio beta: weighted-average beta to SPY.
  • Correlation flags: pairs of holdings with rolling correlation above threshold.
  • Max drawdown: worst peak-to-trough over trailing window.
  • Sharpe ratio: excess return per unit of volatility.

Qualitative signals

  • Insider transaction direction (net buy/sell) for held symbols.
  • Institutional ownership trend (increasing/decreasing).

Policy-driven thresholds

Risk policy (investments/risk-policy.yaml) defines breach thresholds and severity levels. Breaches surface as high-priority action items.

Performance Measurement

  • Time-weighted return (TWR) as the standard for portfolio performance.
  • Benchmark comparison (SPY, QQQ) with alpha computation.
  • Historical trend analysis across weekly reports (week-over-week return, score trends).
  • Accumulated state stored in SQLite for longitudinal queries.

Income and Tax Tracking

Income

  • Dividend yield, ex-dates, and payout history per position.
  • Income received (realized cash flow) and yield-on-cost.
  • Portfolio-level yield summary.

Tax awareness

  • Cost basis with tax-lot level visibility (from brokerage data).
  • Realized gains/losses with short-term vs. long-term classification.
  • Tax-loss harvesting signals (unrealized losses with wash-sale window check).
  • Tax-aware rebalancing (prefer selling long-term lots, harvest short-term losses).

Asset Allocation and Rebalancing

  • Target allocations defined in investments/allocation.yaml (sector weights, max single-position concentration, cash reserve target).
  • Drift detection: actual vs. target with threshold alerts.
  • Rebalancing suggestions with tax-aware trade sizing.
  • Position sizing tiers by conviction level (full, half, starter) linked to thesis definitions.

Alerts and Threshold Monitoring

Portfolio-level event predicates wired into the existing event service:

  • Price alerts: symbol crosses user-defined threshold.
  • Allocation drift alerts: drift exceeds policy threshold.
  • Earnings proximity alerts: N days before earnings for held symbols.
  • Dividend ex-date alerts: before ex-date for income positions.

Configuration: investments/alerts.yaml.

Service Mode

The service mode (bof service run) is a long-running harness that orchestrates agent sessions and human interaction without manual invocation.

Components

  1. Event bus and router — async pub/sub (existing) routes triggers to handlers.
  2. Cron scheduler — triggers workflows on schedule (e.g., weekly review every Sunday evening). The cron tick is the entry point; the handler spawns a headless agent session to execute the full workflow.
  3. Telegram interface — bidirectional conversation with a human. The bot receives messages, routes them to an agent session for interpretation, and sends structured replies back. Supports interactive workflows: "show me AAPL thesis", "what changed this week?", "approve the review".
  4. Agent orchestrator — spawns headless agent sessions (e.g., claude -p) to execute multi-step workflows. Manages session lifecycle, timeouts, and output routing. The agent uses bof CLI commands as tools, reads their output, reasons, and writes back results.
  5. Memory store — tiered memory system that gives agents continuity across sessions.

Agent orchestration

When a trigger fires (cron tick, Telegram message), the service:

  1. Assembles context: trigger payload, relevant memory tiers, current portfolio state.
  2. Spawns a headless agent session with a structured prompt and access to bof CLI commands.
  3. The agent executes the workflow (e.g., weekly review steps 1–8), calling bof commands as needed.
  4. Captures agent output and routes it: report artifacts to disk, summaries to Telegram, events to audit log.
  5. Updates memory store with session outcomes.

Telegram as conversational interface

The Telegram bot supports two interaction modes:

  • Proactive — service-initiated messages: weekly review summary, alert notifications, earnings reminders.
  • Reactive — human-initiated queries: "how is NVDA doing?", "what's my allocation drift?", "run a review now". The service spawns an agent session to interpret the query, gather data via bof commands, and compose a reply.

Memory System

Agents need continuity across sessions. The memory system is tiered by volatility and purpose.

Short-term memory

  • Scope: per-session, ephemeral.
  • Content: current review context, in-progress conversation state, Telegram thread context.
  • Storage: passed to the agent session as prompt context. Not persisted after session ends.
  • Example: "User just asked about AAPL. I already fetched the quote and thesis — don't re-fetch."

Long-term memory

  • Scope: across sessions, durable.
  • Content: past review outcomes, thesis evolution over time, what worked and what didn't, patterns observed ("every time we held through earnings on X, it worked out"), cross-report trend observations.
  • Storage: SQLite tables or structured files in artifacts/memory/. Agent reads relevant entries at session start; writes new observations at session end.
  • Example: "Last week's review flagged TSLA concentration at 14%. User chose to hold. It's now 16%."

Values and beliefs (investment policy)

  • Scope: rarely changes, human-edited.
  • Content: investment philosophy, risk tolerance, time horizon, sector convictions, behavioral guardrails. This is the equivalent of an Investment Policy Statement (IPS).
  • Storage: investments/policy.yaml — version-controlled, human-authored.
  • Example constraints:
  • "I am a long-term investor (5+ year horizon)."
  • "I don't short stocks."
  • "Maximum 10% in any single position."
  • "I believe in dollar-cost averaging into conviction positions."
  • "I prefer companies with growing dividends."
  • How agents use it: the policy is loaded into every agent session as foundational context. It constrains recommendations — the agent should not suggest actions that violate the policy without explicitly flagging the conflict.

Delivery Interfaces

  1. CLI (primary) — interactive and scriptable.
  2. Service mode — long-running harness with cron triggers, agent orchestration, and Telegram interface.

Non-Goals (Initial)

  1. Autonomous trade execution.
  2. Intraday high-frequency decisions.
  3. Complex ML model-based signal generation.
  4. Full budget/expense tracking.
  5. Options or fixed income instrument analysis.

Acceptance Criteria (System-Level)

  1. Weekly job runs on schedule and on demand.
  2. Automated pipeline produces a complete draft report covering all report sections.
  3. Every tracked position has a persistent thesis; recommendations map to thesis conditions with evidence.
  4. Portfolio risk report includes quantitative metrics (beta, Sharpe, drawdown, concentration) and identifies threshold breaches.
  5. Performance section shows portfolio return vs. benchmark with alpha.
  6. Income section shows dividends received, upcoming ex-dates, and portfolio yield.
  7. Tax snapshot flags harvestable losses with correct short-term/long-term classification.
  8. Allocation section shows target vs. actual with drift flags and rebalancing suggestions.
  9. Recommendations include evidence, confidence scores, and are priority-ranked.
  10. Outputs are consumable across multiple interfaces.
  11. Service mode: cron trigger spawns a headless agent that completes a full weekly review without human intervention.
  12. Service mode: Telegram bot supports bidirectional conversation — human can query portfolio state and receive agent-composed replies.
  13. Service mode: agent sessions load investment policy and long-term memory as foundational context.

Implementation Reference

Detailed implementation decisions, source adapters, storage architecture, event design, and phased delivery plan are maintained in Implementation Plan.