Skip to content

Data Model

This document describes what data the system manages, how it's represented in memory (Pydantic models), and how it's persisted (SQLite database + filesystem artifacts).

Domain Concepts

Term Definition Authored by Persisted in
Thesis A falsifiable investment belief with a narrative, symbols, and time horizon Human YAML (~/.openfin/theses/)
Hypothesis A causal, falsifiable claim: "if A then B, invalidated by C" AI DB (thesis_hypotheses)
Evidence A factual claim from data (news, filing, price action) with direction AI DB (decision_evidence)
Score Per-symbol rubric rating (thesis_alignment, news_sentiment, etc.) AI DB (rubric_scores)
Thesis health Derived from hypothesis outcomes (confirmed/invalidated ratio, recency-weighted) Automation Computed at query time
Time pressure Derived from thesis time_horizon vs elapsed time Automation Computed at query time
Decision Action recommendation informed by scores, thesis health, time pressure, position context Automation + AI DB + filesystem

In-Memory Models

All in-memory data shapes are Pydantic models in src/openfin/models/. They are the single source of truth for data structure — DB schemas and API responses derive from them.

Market Data (models/market.py)

Model Purpose Key fields
QuoteSnapshot Ticker valuation snapshot symbol, price, pe_ratio, forward_pe, peg_ratio, beta, ev_ebitda_ttm, 52w range
IndexPerformance Major index price + change symbol, price, period_change_pct
SectorPerformance Sector ETF performance name, symbol, period_change_pct
WatchlistPerformance Per-symbol price and changes symbol, price, day_change_pct, week_change_pct
NewsArticle News headline with provenance title, publisher, link, symbol, published_at
SearchResult Web search result title, description, url, source, query
EarningsEntry Upcoming earnings date symbol, earnings_date

Brokerage Data (models/brokerage.py)

Model Purpose Key fields
Position Current holding symbol, quantity, market_value, avg_price, day_change_pct, total_gain_pct, first_bought
Order Trade order symbol, action, quantity, price, status, submitted_at, filled_at
BrokerageAccount Account summary id, cash_balance, total_value, currency

Macro Data (models/macro.py)

Model Purpose Key fields
MacroIndicator FRED economic indicator series_id, name, value, unit, observation_date

Analysis Models (models/analysis.py)

Model Purpose Key fields
AnalysisDimension A single scored dimension name, score [-1,1], weight [0,1], explanation, raw_metrics
RiskFlag Risk condition category, description, confidence_penalty [0,1]
StockAnalysisSignal Stock analysis output ticker, recommendation (BUY/HOLD/SELL), confidence, final_score, dimensions[], risk_flags[]

Social Signal Models (models/social.py)

Model Purpose Key fields
SocialSignalSummary Aggregated social signals per ticker buy/sell/short counts, confidence breakdown, channel_data[]
ChannelSignal Per-channel signal data channel_name, win_rate, signal_count, dominant_action
RadarItem Notable non-watchlist ticker Social signal activity summary

Review Pipeline Models (models/report.py)

These models define the artifacts produced by the review pipeline. Each maps to a JSON file on disk and may also write through to DB tables.

Context Packets (Phase 1 output — what the agent reads)

ThesisPacket                         HoldingPacket
├── slug, title, narrative           ├── symbol, quantity, market_value
├── time_horizon                     ├── avg_price, cost_basis, pnl
├── thesis_health, time_pressure     ├── total_gain_pct, portfolio_weight_pct
├── active_hypotheses: list[str]     ├── tax_status
├── resolved_hypotheses: list[str]   ├── market_context, news_context
├── upcoming_earnings: list[str]     ├── search_context, prior_scores_context
└── symbols: list[ThesisPacketSymbol]├── analysis_signal_context
    ├── ticker                       ├── social_signal_context
    ├── market_context               └── thesis_refs: list[str]
    ├── news_context
    ├── search_context
    ├── position_context
    ├── prior_scores_context
    ├── analysis_signal_context
    └── social_signal_context

Context fields are pre-rendered text strings — the agent reads them directly without further lookups.

Scoring & Evidence

Model Purpose Key fields
RubricScore Single metric score metric, score (float), rationale
SymbolScorecard All scores for one symbol symbol, scores[], composite_score, composite_confidence
ScoringEntry Score within ReviewScoring score (float), rationale
EvidenceRecord Factual claim from data source_type, claim, direction (confirming/disconfirming/neutral), source_url, attribution
Decision Action recommendation symbol, action (BUY_MORE/HOLD/TRIM/EXIT), confidence, scorecard, evidence[], entry_zone, exit_trigger, sizing_note

Review Artifacts (one JSON file each)

Model Phase File Purpose
ReviewInputs 1 inputs.json Raw market data, positions, indices, macro, news, search, macro_events
ReviewContext 1 context.json Per-symbol SymbolScoringContext + rubric prompts
ReviewScoring 2-3 scoring.json scores: {symbol: {metric: ScoringEntry}}, evidence: {symbol: [EvidenceRecord]}
ReviewDecisions 6 decisions.json decisions[], risk_snapshot
ReviewSummary 1+5 summary.json Narratives (executive_summary, portfolio_summary, etc.), action_items[]
WeeklyReport reconstituted Composite of all artifacts via from_artifacts()

Database Persistence

Local SQLite (artifacts/openfin.db) in WAL mode. Managed by SQLAlchemy ORM with Alembic migrations. All DB writes are best-effort — CLI commands behave identically whether or not the database is available.

Schema

erDiagram review_runs ||--o{ rubric_scores : "run_id" review_runs ||--o{ decision_evidence : "run_id" review_runs ||--o{ review_annotations : "run_id" thesis_hypotheses }o--o{ decision_evidence : "hypothesis_id" data_snapshots { int id PK string data_type "quote, position, news, etc." string symbol "nullable" text payload "JSON with _v" datetime retrieved_at string batch_id "groups rows from a single fetch" } thesis_snapshots { int id PK string slug int version string status string title text payload "full thesis YAML as JSON" text symbols_csv datetime created_at } thesis_hypotheses { string id PK "uuid" string thesis_slug string run_id "nullable, provenance" text claim "if A then B" text invalidation "unless C" string status "active, confirmed, invalidated, revised" string time_horizon "nullable" datetime created_at datetime resolved_at "nullable" text resolution "nullable, what happened" } review_runs { int id PK string run_id UK date run_date string trigger "cli, telegram, cron" string status "collecting, scoring, finalized" datetime started_at datetime finalized_at "nullable" } rubric_scores { int id PK string run_id FK date run_date string symbol string metric float score text rationale float composite_score float composite_confidence string action "BUY_MORE, HOLD, TRIM, EXIT" datetime created_at } decision_evidence { int id PK string run_id FK string symbol string source_type text claim text source_url string attribution string direction "confirming, disconfirming, neutral" string hypothesis_id "nullable, FK to thesis_hypotheses" datetime created_at } review_annotations { int id PK string run_id FK string field "e.g. executive_summary, obs:AAPL" string symbol "nullable, extracted from obs:TICKER" string thesis_slug "nullable, extracted from obs:slug" text value datetime created_at }

Tables

Table Purpose Key columns Write pattern
data_snapshots Timestamped API fetch results data_type, symbol, payload (JSON), batch_id Append-only, one row per fetched item
thesis_snapshots Point-in-time thesis YAML snapshots slug, status, payload (JSON), symbols_csv Append-only on thesis save
thesis_hypotheses AI-generated falsifiable claims thesis_slug, claim, invalidation, status Created by AI, updated across runs
review_runs Review session lifecycle run_id, status, trigger Created at gather, finalized at end
rubric_scores Per-metric scores from reviews symbol, metric, score, composite_score, action Written per-score, composites backfilled at finalize
decision_evidence Source citations backing decisions symbol, source_type, claim, direction, hypothesis_id Append-only per review
review_annotations Agent observations and summaries run_id, field, value, symbol, thesis_slug Append-only, last-write-wins on read

Indexes

Table Index Columns
data_snapshots ix_data_snapshots_type_symbol_time data_type, symbol, retrieved_at
data_snapshots ix_data_snapshots_retrieved_at retrieved_at
data_snapshots ix_data_snapshots_batch_id batch_id
rubric_scores ix_rubric_scores_symbol_date symbol, run_date
rubric_scores ix_rubric_scores_run_id run_id
decision_evidence ix_decision_evidence_run_id run_id
decision_evidence ix_decision_evidence_symbol symbol
review_runs ix_review_runs_run_id run_id
review_runs ix_review_runs_run_date run_date
review_annotations ix_review_annotations_run_id run_id
review_annotations ix_review_annotations_symbol symbol
review_annotations ix_review_annotations_thesis_slug thesis_slug
thesis_snapshots ix_thesis_snapshots_slug_created slug, created_at
thesis_hypotheses ix_thesis_hypotheses_slug thesis_slug
thesis_hypotheses ix_thesis_hypotheses_status status

Pydantic → DB Serialization

Data snapshots: model → JSON blob

Every API fetch is stored as a data_snapshots row. The Pydantic model is the in-memory shape; the DB stores it as a versioned JSON blob:

Pydantic model (e.g. Position)
    │ .model_dump_json()
    ▼
JSON string
    │ stamp(data_type, json) → adds _v field
    ▼
DataSnapshot row (data_type, symbol, payload=stamped_json, batch_id)

On read, upgrade(data_type, payload) walks the version chain, strips _v, returns a clean dict for Model.model_validate().

data_type Pydantic model Source symbol column
quote QuoteSnapshot Finnhub Ticker
position Position SnapTrade Ticker
order Order SnapTrade Ticker
news NewsArticle Finnhub Ticker
macro MacroSnapshot FRED Indicator name
market_index IndexPerformance Finnhub Index name
market_sector SectorPerformance Finnhub Sector name
search SearchResult Brave Search Query string
watchlist WatchlistPerformance Local config Ticker
earnings EarningsEntry Finnhub Ticker
forex Finnhub Currency pair
commodity Finnhub Commodity name
stock_analysis StockAnalysisSignal Computed Ticker
social_signal SocialSignalSummary Discord Ticker
social_radar RadarItem Discord Ticker

Scores: SymbolScorecardrubric_scores rows

SymbolScorecard
    ├── symbol
    ├── scores: [RubricScore(metric, score, rationale), ...]
    ├── composite_score
    └── composite_confidence
        │ save_scorecard()
        ▼
    One RubricScoreRow per metric
    (composite fields backfilled at finalize)

Individual scores are written immediately during scoring (with action=PENDING). At finalize, composite scores and the final action are backfilled.

Evidence: EvidenceRecorddecision_evidence rows

EvidenceRecord
    ├── source_type (sec_filing, news, market_data, macro, thesis)
    ├── claim, source_url, attribution
    ├── direction (confirming, disconfirming, neutral)
    └── retrieved_at
        │ save_evidence(run_id, symbol, evidence[], hypothesis_id)
        ▼
    DecisionEvidence row per record
    (hypothesis_id links to thesis_hypotheses)

Annotations: text → review_annotations rows

Annotation (field="obs:AAPL", value="Price broke support")
    │ save_annotation() extracts target from prefix
    ▼
ReviewAnnotationRow
    ├── run_id, field, value
    ├── symbol = "AAPL" (extracted from "obs:")
    └── thesis_slug = null

The obs: prefix convention: obs:AAPL extracts symbol=AAPL; obs:oil-price-shock (contains hyphens) extracts thesis_slug=oil-price-shock.

Hypotheses: CLI → thesis_hypotheses rows

Hypotheses are created by the AI via CLI commands during reviews, persisted directly as DB rows. They are not Pydantic models — they are ORM objects (ThesisHypothesis) that persist across review runs. Hypothesis status evolves over multiple reviews as the AI assesses them against new data.

Filesystem Artifacts

Review artifact tree

artifacts/reports/weekly/{DATE}/{RUN_ID}/
├── metadata.json          # run_id, date, trigger
├── inputs.json            # ReviewInputs: raw market data, positions, macro_events
├── context.json           # ReviewContext: per-symbol scoring contexts + rubric prompts
├── summary.json           # ReviewSummary: narratives, action items, portfolio context
├── scoring.json           # ReviewScoring: rubric scores + evidence per symbol
├── observations.json      # Accumulated obs: annotations (filesystem audit trail)
├── decisions.json         # ReviewDecisions: composites, actions, risk snapshot
├── overview.md            # Formatted summary with catalyst calendar
├── report.md              # Final rendered report
├── theses/
│   └── {SLUG}.md          # ThesisPacket rendered as markdown
└── holdings/
    └── {SYMBOL}.md        # HoldingPacket rendered as markdown

Dual-write pattern

The review pipeline uses filesystem as source of truth with best-effort DB write-through:

  • Filesystem — all review artifacts (JSON + markdown) are written to the run directory
  • DB write-through — scores, evidence, annotations, and run status are also written to DB tables (swallows exceptions with a warning if DB is unavailable)
  • observations.json — accumulates obs: field annotations during the review, persisted to DB as review_annotations rows

Thesis YAML

Theses live in ~/.openfin/theses/{slug}.yaml (user data directory). Each save also inserts a thesis_snapshots row for audit history.

slug: ai-compute-hardware
title: AI Compute Hardware Cycle
time_horizon: 18 months
narrative: |
  Hyperscaler and enterprise AI training/inference workloads are driving
  unprecedented demand for GPUs, custom ASICs, and HBM...
status: active
symbols: [NVDA, ASML, MRVL, TSM]

Macro calendar

Manually-maintained at investments/macro_calendar.yaml. Filtered to next 30 days at gather time, rendered in overview.md with countdown labels.

Payload Versioning

Every JSON payload carries a _v field (schema version) managed by db/payload_version.py.

Write path: stamp(data_type, json) injects _v with the current version.

Read path: upgrade(data_type, raw) walks the upgrade chain from the payload's version to current, then strips _v for model validation.

Current versions:

Data type Version Notes
position, order, quote, market_index, market_sector, watchlist, earnings, search, macro, forex, commodity, stock_analysis, social_signal, social_radar 1 Initial shape
news 2 Added published_at timestamp
review_inputs 3 v2 added portfolio_context; v3 added macro_events
review_context 3 v2 added prior_scores_context, analysis_signal_context; v3 added social_signal_context
review_summary 2 Added portfolio_context
review_scoring 1
review_decisions 1

Upgrade functions are registered with @register_upgrade(data_type, from_version=N) and chain forward: v0 → v1 → v2 → current. Missing upgrades assume forward-compatible (new fields get Pydantic defaults, removed fields are ignored by model_validate).

Computed Values (not stored)

These are derived at query time from stored data:

Value Formula States
Thesis health f(confirmed, invalidated, active, recency_weights) untested, strong (>0.7), mixed (0.4–0.7), weakening (<0.4), failing (<0.2)
Time pressure f(thesis.time_horizon, thesis.created, now) early (<25%), mid (25–60%), late (60–90%), overdue (>90%)
Composite score Weighted rubric average (thesis_alignment 35%, news_sentiment 25%, valuation_signal 25%, social_signal 15%) 0–10 scale
Action f(composite, thesis_health, time_pressure) BUY_MORE, HOLD, TRIM, EXIT

CLI Reference

Database commands (openfin tools db)

Command Purpose
tools db snapshots Summary of snapshot types with counts and date ranges
tools db history DATA_TYPE Query historical snapshots (options: --symbol, --since, --until, --limit, --json)
tools db latest-snapshot DATA_TYPE Most recent snapshot for a type (options: --symbol, --json)
tools db score-history SYMBOL Rubric score history across runs (options: --metric, --since, --limit, --json)
tools db latest SYMBOLS Latest composite score for symbols
tools db evidence SYMBOL Evidence records for a symbol (options: --run-id, --limit)
tools db tables List tables and row counts
tools db query SQL Read-only SQL (SELECT only)
tools db migrate Run Alembic migrations

Review commands (openfin review)

Command Purpose
review list Recent review runs
review show RUN_ID Metadata, annotations, scores for a run
review lookup TARGET Observations for a ticker or thesis slug across runs
review annotate Write an annotation (options: --run-id, --field, --value)
review hypothesis create SLUG Create hypothesis (options: --claim, --invalidation, --run-id, --horizon)
review hypothesis update ID Update hypothesis status (options: --status, --resolution, --run-id)

Direct SQLite access

sqlite3 artifacts/openfin.db ".schema data_snapshots"
sqlite3 -header -column artifacts/openfin.db \
  "SELECT data_type, COUNT(*) FROM data_snapshots GROUP BY data_type;"

Migrations

Schema changes use Alembic. Migration scripts live in src/openfin/migrations/versions/.

openfin tools db migrate                            # apply pending
alembic revision --autogenerate -m "description"    # create new

Design Notes

  • WAL mode — concurrent read safety; multiple CLI invocations can read while a write is in progress.
  • Best-effort persistenceFinanceApplication._store_snapshots() catches all exceptions. The CLI never fails due to a database error.
  • Batch IDs — each save_snapshots() call generates a UUID, letting you reconstruct exact data available at any point in time.
  • Write-through scoring — individual rubric scores written immediately during scoring, composites backfilled at finalize. Partial progress preserved if interrupted.
  • Hypothesis persistence — hypotheses outlive the review run that created them. run_id is provenance (which review generated it), but the hypothesis is scoped to a thesis and evolves across runs.
  • Annotation conventionobs: prefix for per-target observations. Non-prefixed fields (executive_summary, etc.) also write through to summary.json.
  • Thesis snapshotssave_thesis() inserts a snapshot row on each YAML write, creating an audit trail of how investment stories evolve.