Skip to content

Product Definition

North Star

Provide the most comprehensive, relevant context to help guide financial decisions for a personal portfolio. Every week, run one workflow that gathers portfolio, market, macro, and thesis data, then outputs:

  1. A recommendation per position/thesis: BUY_MORE, HOLD, TRIM, EXIT.
  2. Portfolio-level risk analysis with quantitative metrics (concentration, beta, Sharpe, drawdown, correlation).
  3. Performance measurement against benchmarks (SPY, QQQ).
  4. Income and tax awareness (dividends, realized gains, harvest candidates).
  5. Allocation drift detection with rebalancing suggestions.
  6. A prioritized action list with evidence and confidence.

This system is advisory-only (no auto-trading).

Primary Goals

  1. Make investment decision information easy to access and consistent.
  2. Tie recommendations to persistent thesis hypotheses and accumulated evidence — not one-off judgments.
  3. Track total return: capital appreciation, dividend income, and tax impact.
  4. Measure portfolio performance against benchmarks over time.
  5. Detect allocation drift and concentration risk with policy-driven thresholds.
  6. Provide repeatable review process (daily triage + weekly assessment) with strong traceability and security.
  7. Keep implementation reusable across CLI, REST-API, and event-based interfaces (Telegram, Slack, webhooks).

Core Review Workflow

One unified pipeline serves both daily triage and weekly assessment. A depth parameter (light or full) controls how much data is gathered; the agent SOP determines the depth of analysis.

Phases

Phase 1 — Gather (automation, openfin review gather --depth light|full)

Fetch data for all symbols in the derived watchlist. Persist raw inputs to data_snapshots before any reasoning. Query thesis_hypotheses for active and recently resolved hypotheses. Compute thesis health and time pressure per thesis. Build and persist thesis packets and holding packets.

Gather step light (daily) full (weekly)
Positions, market, macro, watchlist, news, quotes yes yes
Forex, commodities, earnings dates yes yes
Hypotheses from DB yes yes
Web search (Brave) yes
Prior scores, analysis signals, social signals yes

Both depths produce the same artifact structure:

{RUN_ID}/
  inputs.json          # raw gathered data
  overview.md          # formatted summary
  theses/{SLUG}.md     # thesis packets (narrative, hypotheses, per-symbol data)
  holdings/{SYM}.md    # holding packets (position, market, news context)

Phase 2 — Assess (agent, driven by SOP)

The agent reads thesis packets and holding packets, then:

  1. Assesses prior hypotheses against new data — confirm, invalidate, or keep watching. Records evidence linked to each hypothesis.
  2. Generates new hypotheses from the thesis narrative + current data. Each has a claim, invalidation criteria, and optional time horizon.
  3. (Weekly only) Scores rubric dimensions per symbol: thesis_alignment, news_sentiment, valuation_signal, social_signal.
  4. Writes results back via CLI commands (review hypothesis create/update, review evidence, review score, review annotate).

Phase 3 — Compute (automation, weekly only, openfin review finalize)

Deterministic computation from agent inputs:

  1. Composite score = weighted rubric scores.
  2. Thesis health = f(hypothesis confirmed/invalidated ratio, recency-weighted).
  3. Time pressure = f(thesis time_horizon, elapsed time).
  4. Action recommendation = f(composite, thesis_health, time_pressure, position context).
  5. Persist decisions with entry_zone, exit_trigger, sizing_note to DB.

Phase 4 — Render (automation, weekly only)

  1. report.md: per-thesis health and hypothesis lifecycle, per-symbol scores and actions, portfolio risk snapshot.
  2. Timeline events: hypothesis_created, hypothesis_confirmed, hypothesis_invalidated, evidence, score_review.

SOP differentiation

Daily (light) Weekly (full)
Purpose Morning triage — what changed, what's urgent Full assessment cycle
Hypotheses Flag approaching horizons or contradictory evidence Assess all active, confirm/invalidate, generate new
Scoring No rubric scoring Full rubric scoring per symbol
Evidence Record only if notable Record all relevant findings
Output Executive summary annotation Full annotations, scores, evidence, decisions
Finalize No finalize step Finalize → composites, actions, report.md

Required Outputs (Per Review Run)

  1. Structured artifacts (JSON) with all gathered data, decisions, and scorecards.
  2. Human-readable packets (Markdown) — thesis-level and holding-level.
  3. Structured event/audit entries (source + timestamp + lineage).
  4. Historical snapshots of all fetched data for longitudinal analysis.

Report Sections (Weekly)

  1. Executive summary
  2. Performance vs. benchmark (portfolio TWR, alpha, Sharpe, max drawdown)
  3. Thesis health and hypothesis lifecycle (confirmed, invalidated, active)
  4. Position/thesis decisions with scorecard, entry/exit zones, sizing notes
  5. Allocation drift (target vs. actual sector weights, rebalancing suggestions)
  6. Income summary (dividends received, upcoming ex-dates, portfolio yield)
  7. Tax snapshot (YTD realized gains ST/LT, harvestable unrealized losses)
  8. Evidence table (source, timestamp, claim, direction, linked hypothesis)
  9. Portfolio risk snapshot and breaches (concentration, beta, correlation flags)
  10. Ranked action items

Data Sources

  1. Portfolio and brokerage data (accounts, positions, orders, cost basis, holding periods, dividend history, realized gains/losses).
  2. Market data (index and sector performance, ticker quotes, earnings calendar, news, dividend yield/ex-dates).
  3. Macro/economic indicators (rates, inflation, labor market, VIX, sector rotation).
  4. SEC filings (10-K, 10-Q, 8-K, 13F, Form 4).
  5. Insider transactions and institutional ownership changes.
  6. Thesis definitions (story-focused, multi-symbol narratives with time horizons and AI-generated hypotheses).
  7. Web search (analyst coverage, breaking news).
  8. Historical data snapshots (stored quotes, positions, news over time for trend analysis and agent context enrichment).

Investment Thesis Registry

Theses are story-focused, not ticker-focused. Each thesis captures an investment narrative (e.g., "AI infrastructure spending extends") that implicates multiple symbols. YAML files live at ~/.openfin/theses/{slug}.yaml (user data dir, seeded from investments/theses/).

Thesis structure (YAML — human-authored)

slug: ai-compute-hardware
title: AI Compute Hardware Cycle
time_horizon: 18 months
narrative: |
  Hyperscaler and enterprise AI training/inference workloads are driving
  unprecedented demand for GPUs, custom ASICs, and HBM...
status: active
symbols: [NVDA, ASML, MRVL, TSM]
notes: ""
  • Narrative: the complete expression of the belief — what you think, why, what to watch for, what would change your mind. This is the AI's primary input for hypothesis generation. A specific narrative with falsifiable claims produces testable hypotheses.
  • Symbols: flat list of ticker strings relevant to the narrative. No role, conviction, conditions, or targets — those are portfolio-level concerns derived from positions + hypothesis health.
  • Time horizon: how long the thesis needs to play out. Waterfalls as default to hypotheses.
  • Status: active, archived, or draft.

Hypotheses (DB — AI-generated)

Falsifiable claims ("if A then B, invalidated by C") that persist across review runs and track thesis health. Scoped to a thesis, not a symbol or run — a hypothesis outlives the review that created it.

Each review, the agent: 1. Assesses active hypotheses against new data (confirm, invalidate, or keep watching). 2. Generates new hypotheses from the narrative + current data. 3. Links evidence to hypotheses, building an audit trail of what was observed and what it meant.

Thesis health (computed at query time)

Derived from hypothesis outcomes — confirmed/invalidated ratio, recency-weighted. A hypothesis invalidated this week counts more than one confirmed 8 weeks ago.

Status Condition
untested No hypotheses resolved yet
strong Confirmed ratio > 0.7
mixed 0.4–0.7
weakening < 0.4
failing < 0.2 or 3+ recent invalidations

Time pressure (computed at query time)

Derived from thesis time_horizon vs elapsed time. Modulates how aggressively invalidation translates to action urgency.

Level Horizon elapsed
early < 25%
mid 25–60%
late 60–90%
overdue > 90%

Derived watchlist

There is no manually-maintained watchlist file. The watchlist is computed as:

  1. All symbols from active theses (including a special watchlist.yaml for ad-hoc tickers).
  2. Portfolio holdings from the latest position snapshots in the database.

watchlist_sources() returns {symbol: [source_labels]} so every symbol has provenance (which thesis, portfolio, or adhoc).

Persistence model

  • YAML is authoritative for reading thesis state. Agents and humans edit YAML directly.
  • DB is append-only audit log. save_thesis() writes YAML to disk AND inserts a thesis_snapshots row. This enables historical queries: "what did this thesis look like 3 months ago?", "when did we add/remove a symbol?"
  • Hypotheses live in DB only (thesis_hypotheses table). They are AI-generated, persist across reviews, and form the basis for thesis health computation.
  • Evidence links to hypotheses via optional hypothesis_id on decision_evidence. This connects factual observations to the causal claims they support or challenge.

Relationship to scoring

The thesis registry is the foundation for scoring alignment and generating recommendations. Each symbol's scoring packet includes the thesis narrative and hypothesis context for all theses that contain it. Thesis health and time pressure modulate how scores translate to actions — same composite score, different urgency depending on whether the thesis is strong/early vs. weakening/late.

System Data Flow

Understanding the dependency chain is critical for working on any part of the system.

┌─────────────────────────────────────────────────────────────┐
│                    SOURCE OF TRUTH                           │
│                                                             │
│  YAML Theses (~/.openfin/theses/*.yaml)                    │
│  Brokerage API (positions, orders, accounts)                │
│  Market APIs (quotes, news, earnings, SEC)                  │
│  Macro APIs (Fed, Treasury, CPI)                            │
│  thesis_hypotheses (DB — AI-generated, persists across runs)│
└────────────────────────┬────────────────────────────────────┘
                         │
              ┌──────────▼──────────┐
              │  Phase 1: GATHER    │  Automated
              │  (light or full)    │  All fetches → data_snapshots
              │                     │  Hypotheses queried from DB
              │  Thesis health +    │  Thesis packets + holding packets
              │  time pressure      │  built and persisted
              └──────────┬──────────┘
                         │
          ┌──────────────┼──────────────┐
          ▼              ▼              ▼
   ┌────────────┐ ┌───────────┐ ┌──────────────┐
   │ Review     │ │ API /     │ │ Telegram     │
   │ (agent)    │ │ Frontend  │ │ Bot          │
   └─────┬──────┘ └───────────┘ └──────────────┘
         │
   ┌─────▼──────────────┐
   │  Phase 2: ASSESS   │  Agent (SOP-driven)
   │                     │  Assess hypotheses against new data
   │  Hypotheses ←───────│  Generate new hypotheses from narrative
   │  Evidence ──────────│  Link evidence to hypotheses
   │  Scores (weekly)    │  Score rubrics per symbol
   └─────┬──────────────┘
         │
   ┌─────▼──────────────┐
   │  Phase 3: COMPUTE  │  Automated (weekly only)
   │                     │  Composites, thesis health, time pressure
   │  Actions ←──────────│  f(composite, health, pressure)
   │  Decisions ─────────│  entry_zone, exit_trigger, sizing_note
   └─────┬──────────────┘
         │
   ┌─────▼──────────────┐
   │  Phase 4: RENDER   │  Automated (weekly only)
   │                     │  report.md, timeline events
   │  Decisions + Report │  hypothesis lifecycle, scores, actions
   └────────────────────┘

Key invariants

  1. Collect → Persist → Reason → Decide. All pipelines (daily/weekly/ad-hoc). Persist raw inputs before reasoning. Never reason over uncaptured data.
  2. Every fetch produces a snapshot. Quotes, positions, news, macro — all persisted to data_snapshots with batch_id for traceability.
  3. News carries symbol provenance. Each NewsArticle has a symbol field. The overview groups news by thesis; the frontend/API can filter by symbol.
  4. Scoring is per-symbol, context is per-thesis. You buy/sell tickers, but you reason about stories. Each scoring packet includes the thesis narrative and hypothesis context.
  5. The system never calls LLM APIs. The system fetches, computes, stores, renders. The agent (Claude Code, SDK app, or human) reads context and writes back scores/evidence/hypotheses.
  6. Hypotheses are autoregressive. Each review, the agent sees its prior hypotheses + new data, assesses what held up, and evolves its analysis. This builds cumulative thesis health over time.

Intelligence Model

The system does not call LLM APIs directly. All AI reasoning happens in the calling agent session (Claude Code, a custom Agent SDK app, or a human). The system is a structured data tool — it fetches, computes, stores, and renders. The agent interprets, evaluates, and writes back conclusions.

Layers

  • Layer 1 (automated): data ingestion, hypothesis query, thesis health/time pressure computation, packet building, deterministic composite math, report rendering.
  • Layer 2 (agent): reads thesis packets (narrative + hypotheses + data), assesses hypotheses against new evidence, generates new hypotheses, scores rubric dimensions, writes structured results back via CLI commands.
  • Layer 3 (human): reviews reports, edits thesis narratives, provides final sign-off on actions.

Autoregressive Review

The review workflow is autoregressive — each run, the agent sees its prior hypotheses and reasoning alongside new data. This creates a feedback loop:

  1. Week N: Agent reads thesis narrative, generates hypotheses H1–H3.
  2. Week N+1: Agent sees H1–H3 + new data. Confirms H1 (evidence linked), keeps H2 active, generates H4.
  3. Week N+2: Agent sees H2, H4 + new data. Invalidates H2 (resolution recorded), thesis health shifts from strong to mixed.

Thesis health accumulates over time. The human writes the narrative; the AI generates and tracks the testable claims. Neither operates in a vacuum — the narrative evolves as hypotheses confirm or fail, and the AI's analysis deepens as it builds history.

Scoring Rubrics

Rubric definitions provide transparent, versioned evaluation dimensions with scales, anchor descriptions, and evaluation prompt templates. The system renders prompts with gathered context; the agent evaluates them. Composite math (weighted average, action thresholds) is deterministic. Thesis health and time pressure modulate how composites translate to action urgency.

Design Implications

  • No LLM SDK dependency in the codebase.
  • Any agent can drive the workflow — the system is agent-agnostic.
  • Determinism preserved — composite scores, action thresholds, thesis health, and risk computations are always rule-based. Agent judgment enters only through hypothesis assessment, score assignment, evidence extraction, and narrative text.
  • Fallback is natural — without agent evaluation, the system still produces a data-complete draft report with empty scores. A human can fill them in.

Risk and Portfolio Analytics

Quantitative Metrics

  • Position concentration: single name %, top-5 %, HHI index.
  • Sector concentration: actual vs. target allocation drift.
  • Portfolio beta: weighted-average beta to SPY.
  • Correlation flags: pairs of holdings with rolling correlation above threshold.
  • Max drawdown: worst peak-to-trough over trailing window.
  • Sharpe ratio: excess return per unit of volatility.

Qualitative Signals

  • Insider transaction direction (net buy/sell) for held symbols.
  • Institutional ownership trend (increasing/decreasing).
  • These surface as evidence linked to hypotheses — the agent assesses their significance in context rather than as standalone alerts.

Policy-Driven Thresholds

A risk policy defines breach thresholds and severity levels. Breaches surface as high-priority action items.

Performance Measurement

  • Time-weighted return (TWR) as the standard for portfolio performance.
  • Benchmark comparison (SPY, QQQ) with alpha computation.
  • Historical trend analysis across weekly reports (week-over-week return, score trends).
  • Accumulated state stored for longitudinal queries.

Income and Tax Tracking

Income

  • Dividend yield, ex-dates, and payout history per position.
  • Income received (realized cash flow) and yield-on-cost.
  • Portfolio-level yield summary.

Tax Awareness

  • Cost basis with tax-lot level visibility.
  • Realized gains/losses with short-term vs. long-term classification.
  • Tax-loss harvesting signals (unrealized losses with wash-sale window check).
  • Tax-aware rebalancing (prefer selling long-term lots, harvest short-term losses).

Asset Allocation and Rebalancing

  • Target allocations by sector (weights, max single-position concentration, cash reserve target).
  • Drift detection: actual vs. target with threshold alerts.
  • Rebalancing suggestions with tax-aware trade sizing.
  • Position sizing informed by thesis health, time pressure, and hypothesis outcomes.

Alerts and Threshold Monitoring

Portfolio-level event predicates wired into the event service:

  • Price alerts: symbol crosses user-defined threshold.
  • Allocation drift alerts: drift exceeds policy threshold.
  • Earnings proximity alerts: N days before earnings for held symbols.
  • Dividend ex-date alerts: before ex-date for income positions.

Service Mode

A long-running harness that orchestrates agent sessions and human interaction without manual invocation.

Agent Orchestration

When a trigger fires (cron tick, Telegram message, alert), the service:

  1. Assembles context: trigger payload, investment policy, relevant memory, current portfolio state.
  2. Spawns a headless agent session with the assembled context and access to CLI commands as tools.
  3. The agent executes the workflow autonomously — gathering data, reasoning, writing back scores/evidence/narrative.
  4. Captures output and routes it: report artifacts to disk, summaries to Telegram, events to audit log.
  5. Updates memory with session outcomes.

Telegram Interface

Supports two interaction modes:

  • Proactive — service-initiated messages: weekly review summary, alert notifications, earnings reminders.
  • Reactive — human-initiated queries: "how is NVDA doing?", "what's my allocation drift?", "run a review now". The service spawns an agent session to gather data and compose a reply.

Memory System

Agents need continuity across sessions. Memory is tiered by volatility and purpose.

Short-Term Memory

  • Per-session, ephemeral.
  • Current review context, in-progress conversation state, Telegram thread context.
  • Passed to the agent session as prompt context. Not persisted after session ends.

Long-Term Memory

  • Across sessions, durable.
  • Past review outcomes, thesis evolution, cross-week observations, patterns ("AAPL has beaten earnings 4 quarters in a row"), what worked and what didn't.
  • Agent reads relevant entries at session start; writes new observations at session end.
  • Observations older than a configurable window are surfaced with lower relevance or archived.

Investment Policy (Values and Beliefs)

  • Rarely changes, human-edited.
  • Investment philosophy, risk tolerance, time horizon, sector convictions, behavioral guardrails.
  • Equivalent to an Investment Policy Statement (IPS) in traditional advisory.
  • Loaded into every agent session as foundational context — constrains what the agent should recommend.

Delivery Interfaces

  1. CLI (primary) — interactive and scriptable. Agent drives the review via CLI commands.
  2. Web API + frontend — FastAPI serving the same models, React frontend for browsing reviews, theses, timelines.
  3. Service mode — long-running harness with cron triggers, agent orchestration, and Telegram interface.

Non-Goals

  1. Autonomous trade execution.
  2. Intraday high-frequency decisions.
  3. Complex ML model-based signal generation.
  4. Full budget/expense tracking.
  5. Options or fixed income instrument analysis.

Acceptance Criteria (System-Level)

  1. Review pipeline runs on schedule (daily light, weekly full) and on demand.
  2. Gather phase produces thesis packets and holding packets with all data context, persisted hypotheses, thesis health, and time pressure.
  3. Every tracked position has a persistent thesis with AI-generated hypotheses; recommendations map to hypothesis health and accumulated evidence.
  4. Hypothesis lifecycle is tracked across reviews — created, confirmed, invalidated, revised — with linked evidence.
  5. Portfolio risk report includes quantitative metrics (beta, Sharpe, drawdown, concentration) and identifies threshold breaches.
  6. Performance section shows portfolio return vs. benchmark with alpha.
  7. Income section shows dividends received, upcoming ex-dates, and portfolio yield.
  8. Tax snapshot flags harvestable losses with correct short-term/long-term classification.
  9. Allocation section shows target vs. actual with drift flags and rebalancing suggestions.
  10. Recommendations include evidence, confidence scores, entry/exit zones, and are priority-ranked.
  11. Outputs are consumable across multiple interfaces (CLI, API, Telegram).
  12. Service mode: cron trigger spawns a headless agent that completes a full weekly review without human intervention.
  13. Service mode: Telegram bot supports bidirectional conversation — human can query portfolio state and receive agent-composed replies.
  14. Service mode: agent sessions load investment policy and long-term memory as foundational context.

Implementation Reference

Architecture and codebase structure are in Architecture.