Data Model¶

This document describes what data the system manages, how it's represented in memory (Pydantic models), and how it's persisted (SQLite database + filesystem artifacts).

Domain Concepts¶

Term	Definition	Authored by	Persisted in
Thesis	A falsifiable investment belief with a narrative, symbols, and time horizon	Human	YAML (`~/.openfin/theses/`)
Hypothesis	A causal, falsifiable claim: "if A then B, invalidated by C"	AI	DB (`thesis_hypotheses`)
Evidence	A factual claim from data (news, filing, price action) with direction	AI	DB (`decision_evidence`)
Score	Per-symbol rubric rating (thesis_alignment, news_sentiment, etc.)	AI	DB (`rubric_scores`)
Thesis health	Derived from hypothesis outcomes (confirmed/invalidated ratio, recency-weighted)	Automation	Computed at query time
Time pressure	Derived from thesis time_horizon vs elapsed time	Automation	Computed at query time
Decision	Action recommendation informed by scores, thesis health, time pressure, position context	Automation + AI	DB + filesystem

In-Memory Models¶

All in-memory data shapes are Pydantic models in src/openfin/models/. They are the single source of truth for data structure — DB schemas and API responses derive from them.

Market Data (`models/market.py`)¶

Model	Purpose	Key fields
`QuoteSnapshot`	Ticker valuation snapshot	`symbol`, `price`, `pe_ratio`, `forward_pe`, `peg_ratio`, `beta`, `ev_ebitda_ttm`, `52w range`
`IndexPerformance`	Major index price + change	`symbol`, `price`, `period_change_pct`
`SectorPerformance`	Sector ETF performance	`name`, `symbol`, `period_change_pct`
`WatchlistPerformance`	Per-symbol price and changes	`symbol`, `price`, `day_change_pct`, `week_change_pct`
`NewsArticle`	News headline with provenance	`title`, `publisher`, `link`, `symbol`, `published_at`
`SearchResult`	Web search result	`title`, `description`, `url`, `source`, `query`
`EarningsEntry`	Upcoming earnings date	`symbol`, `earnings_date`

Brokerage Data (`models/brokerage.py`)¶

Model	Purpose	Key fields
`Position`	Current holding	`symbol`, `quantity`, `market_value`, `avg_price`, `day_change_pct`, `total_gain_pct`, `first_bought`
`Order`	Trade order	`symbol`, `action`, `quantity`, `price`, `status`, `submitted_at`, `filled_at`
`BrokerageAccount`	Account summary	`id`, `cash_balance`, `total_value`, `currency`

Macro Data (`models/macro.py`)¶

Model	Purpose	Key fields
`MacroIndicator`	FRED economic indicator	`series_id`, `name`, `value`, `unit`, `observation_date`

Analysis Models (`models/analysis.py`)¶

Model	Purpose	Key fields
`AnalysisDimension`	A single scored dimension	`name`, `score [-1,1]`, `weight [0,1]`, `explanation`, `raw_metrics`
`RiskFlag`	Risk condition	`category`, `description`, `confidence_penalty [0,1]`
`StockAnalysisSignal`	Stock analysis output	`ticker`, `recommendation` (BUY/HOLD/SELL), `confidence`, `final_score`, `dimensions[]`, `risk_flags[]`

Social Signal Models (`models/social.py`)¶

Model	Purpose	Key fields
`SocialSignalSummary`	Aggregated social signals per ticker	`buy/sell/short counts`, `confidence breakdown`, `channel_data[]`
`ChannelSignal`	Per-channel signal data	`channel_name`, `win_rate`, `signal_count`, `dominant_action`
`RadarItem`	Notable non-watchlist ticker	Social signal activity summary

Review Pipeline Models (`models/report.py`)¶

These models define the artifacts produced by the review pipeline. Each maps to a JSON file on disk and may also write through to DB tables.

Context Packets (Phase 1 output — what the agent reads)¶

ThesisPacket                         HoldingPacket
├── slug, title, narrative           ├── symbol, quantity, market_value
├── time_horizon                     ├── avg_price, cost_basis, pnl
├── thesis_health, time_pressure     ├── total_gain_pct, portfolio_weight_pct
├── active_hypotheses: list[str]     ├── tax_status
├── resolved_hypotheses: list[str]   ├── market_context, news_context
├── upcoming_earnings: list[str]     ├── search_context, prior_scores_context
└── symbols: list[ThesisPacketSymbol]├── analysis_signal_context
    ├── ticker                       ├── social_signal_context
    ├── market_context               └── thesis_refs: list[str]
    ├── news_context
    ├── search_context
    ├── position_context
    ├── prior_scores_context
    ├── analysis_signal_context
    └── social_signal_context

Context fields are pre-rendered text strings — the agent reads them directly without further lookups.

Scoring & Evidence¶

Model	Purpose	Key fields
`RubricScore`	Single metric score	`metric`, `score` (float), `rationale`
`SymbolScorecard`	All scores for one symbol	`symbol`, `scores[]`, `composite_score`, `composite_confidence`
`ScoringEntry`	Score within `ReviewScoring`	`score` (float), `rationale`
`EvidenceRecord`	Factual claim from data	`source_type`, `claim`, `direction` (confirming/disconfirming/neutral), `source_url`, `attribution`
`Decision`	Action recommendation	`symbol`, `action` (BUY_MORE/HOLD/TRIM/EXIT), `confidence`, `scorecard`, `evidence[]`, `entry_zone`, `exit_trigger`, `sizing_note`

Review Artifacts (one JSON file each)¶

Model	Phase	File	Purpose
`ReviewInputs`	1	`inputs.json`	Raw market data, positions, indices, macro, news, search, `macro_events`
`ReviewContext`	1	`context.json`	Per-symbol `SymbolScoringContext` + rubric prompts
`ReviewScoring`	2-3	`scoring.json`	`scores: {symbol: {metric: ScoringEntry}}`, `evidence: {symbol: [EvidenceRecord]}`
`ReviewDecisions`	6	`decisions.json`	`decisions[]`, `risk_snapshot`
`ReviewSummary`	1+5	`summary.json`	Narratives (`executive_summary`, `portfolio_summary`, etc.), `action_items[]`
`WeeklyReport`	—	reconstituted	Composite of all artifacts via `from_artifacts()`

Database Persistence¶

Local SQLite (artifacts/openfin.db) in WAL mode. Managed by SQLAlchemy ORM with Alembic migrations. All DB writes are best-effort — CLI commands behave identically whether or not the database is available.

Schema¶

erDiagram review_runs ||--o{ rubric_scores : "run_id" review_runs ||--o{ decision_evidence : "run_id" review_runs ||--o{ review_annotations : "run_id" thesis_hypotheses }o--o{ decision_evidence : "hypothesis_id" data_snapshots { int id PK string data_type "quote, position, news, etc." string symbol "nullable" text payload "JSON with _v" datetime retrieved_at string batch_id "groups rows from a single fetch" } thesis_snapshots { int id PK string slug int version string status string title text payload "full thesis YAML as JSON" text symbols_csv datetime created_at } thesis_hypotheses { string id PK "uuid" string thesis_slug string run_id "nullable, provenance" text claim "if A then B" text invalidation "unless C" string status "active, confirmed, invalidated, revised" string time_horizon "nullable" datetime created_at datetime resolved_at "nullable" text resolution "nullable, what happened" } review_runs { int id PK string run_id UK date run_date string trigger "cli, telegram, cron" string status "collecting, scoring, finalized" datetime started_at datetime finalized_at "nullable" } rubric_scores { int id PK string run_id FK date run_date string symbol string metric float score text rationale float composite_score float composite_confidence string action "BUY_MORE, HOLD, TRIM, EXIT" datetime created_at } decision_evidence { int id PK string run_id FK string symbol string source_type text claim text source_url string attribution string direction "confirming, disconfirming, neutral" string hypothesis_id "nullable, FK to thesis_hypotheses" datetime created_at } review_annotations { int id PK string run_id FK string field "e.g. executive_summary, obs:AAPL" string symbol "nullable, extracted from obs:TICKER" string thesis_slug "nullable, extracted from obs:slug" text value datetime created_at }

Tables¶

Table	Purpose	Key columns	Write pattern
`data_snapshots`	Timestamped API fetch results	`data_type`, `symbol`, `payload` (JSON), `batch_id`	Append-only, one row per fetched item
`thesis_snapshots`	Point-in-time thesis YAML snapshots	`slug`, `status`, `payload` (JSON), `symbols_csv`	Append-only on thesis save
`thesis_hypotheses`	AI-generated falsifiable claims	`thesis_slug`, `claim`, `invalidation`, `status`	Created by AI, updated across runs
`review_runs`	Review session lifecycle	`run_id`, `status`, `trigger`	Created at gather, finalized at end
`rubric_scores`	Per-metric scores from reviews	`symbol`, `metric`, `score`, `composite_score`, `action`	Written per-score, composites backfilled at finalize
`decision_evidence`	Source citations backing decisions	`symbol`, `source_type`, `claim`, `direction`, `hypothesis_id`	Append-only per review
`review_annotations`	Agent observations and summaries	`run_id`, `field`, `value`, `symbol`, `thesis_slug`	Append-only, last-write-wins on read

Indexes¶

Table	Index	Columns
`data_snapshots`	`ix_data_snapshots_type_symbol_time`	`data_type`, `symbol`, `retrieved_at`
`data_snapshots`	`ix_data_snapshots_retrieved_at`	`retrieved_at`
`data_snapshots`	`ix_data_snapshots_batch_id`	`batch_id`
`rubric_scores`	`ix_rubric_scores_symbol_date`	`symbol`, `run_date`
`rubric_scores`	`ix_rubric_scores_run_id`	`run_id`
`decision_evidence`	`ix_decision_evidence_run_id`	`run_id`
`decision_evidence`	`ix_decision_evidence_symbol`	`symbol`
`review_runs`	`ix_review_runs_run_id`	`run_id`
`review_runs`	`ix_review_runs_run_date`	`run_date`
`review_annotations`	`ix_review_annotations_run_id`	`run_id`
`review_annotations`	`ix_review_annotations_symbol`	`symbol`
`review_annotations`	`ix_review_annotations_thesis_slug`	`thesis_slug`
`thesis_snapshots`	`ix_thesis_snapshots_slug_created`	`slug`, `created_at`
`thesis_hypotheses`	`ix_thesis_hypotheses_slug`	`thesis_slug`
`thesis_hypotheses`	`ix_thesis_hypotheses_status`	`status`

Pydantic → DB Serialization¶

Data snapshots: model → JSON blob¶

Every API fetch is stored as a data_snapshots row. The Pydantic model is the in-memory shape; the DB stores it as a versioned JSON blob:

Pydantic model (e.g. Position)
    │ .model_dump_json()
    ▼
JSON string
    │ stamp(data_type, json) → adds _v field
    ▼
DataSnapshot row (data_type, symbol, payload=stamped_json, batch_id)

On read, upgrade(data_type, payload) walks the version chain, strips _v, returns a clean dict for Model.model_validate().

`data_type`	Pydantic model	Source	`symbol` column
`quote`	`QuoteSnapshot`	Finnhub	Ticker
`position`	`Position`	SnapTrade	Ticker
`order`	`Order`	SnapTrade	Ticker
`news`	`NewsArticle`	Finnhub	Ticker
`macro`	`MacroSnapshot`	FRED	Indicator name
`market_index`	`IndexPerformance`	Finnhub	Index name
`market_sector`	`SectorPerformance`	Finnhub	Sector name
`search`	`SearchResult`	Brave Search	Query string
`watchlist`	`WatchlistPerformance`	Local config	Ticker
`earnings`	`EarningsEntry`	Finnhub	Ticker
`forex`	—	Finnhub	Currency pair
`commodity`	—	Finnhub	Commodity name
`stock_analysis`	`StockAnalysisSignal`	Computed	Ticker
`social_signal`	`SocialSignalSummary`	Discord	Ticker
`social_radar`	`RadarItem`	Discord	Ticker

Scores: `SymbolScorecard` → `rubric_scores` rows¶

SymbolScorecard
    ├── symbol
    ├── scores: [RubricScore(metric, score, rationale), ...]
    ├── composite_score
    └── composite_confidence
        │ save_scorecard()
        ▼
    One RubricScoreRow per metric
    (composite fields backfilled at finalize)

Individual scores are written immediately during scoring (with action=PENDING). At finalize, composite scores and the final action are backfilled.

Evidence: `EvidenceRecord` → `decision_evidence` rows¶

EvidenceRecord
    ├── source_type (sec_filing, news, market_data, macro, thesis)
    ├── claim, source_url, attribution
    ├── direction (confirming, disconfirming, neutral)
    └── retrieved_at
        │ save_evidence(run_id, symbol, evidence[], hypothesis_id)
        ▼
    DecisionEvidence row per record
    (hypothesis_id links to thesis_hypotheses)

Annotations: text → `review_annotations` rows¶

Annotation (field="obs:AAPL", value="Price broke support")
    │ save_annotation() extracts target from prefix
    ▼
ReviewAnnotationRow
    ├── run_id, field, value
    ├── symbol = "AAPL" (extracted from "obs:")
    └── thesis_slug = null

The obs: prefix convention: obs:AAPL extracts symbol=AAPL; obs:oil-price-shock (contains hyphens) extracts thesis_slug=oil-price-shock.

Hypotheses: CLI → `thesis_hypotheses` rows¶

Hypotheses are created by the AI via CLI commands during reviews, persisted directly as DB rows. They are not Pydantic models — they are ORM objects (ThesisHypothesis) that persist across review runs. Hypothesis status evolves over multiple reviews as the AI assesses them against new data.

Filesystem Artifacts¶

Review artifact tree¶

artifacts/reports/weekly/{DATE}/{RUN_ID}/
├── metadata.json          # run_id, date, trigger
├── inputs.json            # ReviewInputs: raw market data, positions, macro_events
├── context.json           # ReviewContext: per-symbol scoring contexts + rubric prompts
├── summary.json           # ReviewSummary: narratives, action items, portfolio context
├── scoring.json           # ReviewScoring: rubric scores + evidence per symbol
├── observations.json      # Accumulated obs: annotations (filesystem audit trail)
├── decisions.json         # ReviewDecisions: composites, actions, risk snapshot
├── overview.md            # Formatted summary with catalyst calendar
├── report.md              # Final rendered report
├── theses/
│   └── {SLUG}.md          # ThesisPacket rendered as markdown
└── holdings/
    └── {SYMBOL}.md        # HoldingPacket rendered as markdown

Dual-write pattern¶

The review pipeline uses filesystem as source of truth with best-effort DB write-through:

Filesystem — all review artifacts (JSON + markdown) are written to the run directory
DB write-through — scores, evidence, annotations, and run status are also written to DB tables (swallows exceptions with a warning if DB is unavailable)
observations.json — accumulates obs: field annotations during the review, persisted to DB as review_annotations rows

Thesis YAML¶

Theses live in ~/.openfin/theses/{slug}.yaml (user data directory). Each save also inserts a thesis_snapshots row for audit history.

slug: ai-compute-hardware
title: AI Compute Hardware Cycle
time_horizon: 18 months
narrative: |
  Hyperscaler and enterprise AI training/inference workloads are driving
  unprecedented demand for GPUs, custom ASICs, and HBM...
status: active
symbols: [NVDA, ASML, MRVL, TSM]

Macro calendar¶

Manually-maintained at investments/macro_calendar.yaml. Filtered to next 30 days at gather time, rendered in overview.md with countdown labels.

Payload Versioning¶

Every JSON payload carries a _v field (schema version) managed by db/payload_version.py.

Write path: stamp(data_type, json) injects _v with the current version.

Read path: upgrade(data_type, raw) walks the upgrade chain from the payload's version to current, then strips _v for model validation.

Current versions:

Data type	Version	Notes
`position`, `order`, `quote`, `market_index`, `market_sector`, `watchlist`, `earnings`, `search`, `macro`, `forex`, `commodity`, `stock_analysis`, `social_signal`, `social_radar`	1	Initial shape
`news`	2	Added `published_at` timestamp
`review_inputs`	3	v2 added `portfolio_context`; v3 added `macro_events`
`review_context`	3	v2 added `prior_scores_context`, `analysis_signal_context`; v3 added `social_signal_context`
`review_summary`	2	Added `portfolio_context`
`review_scoring`	1	—
`review_decisions`	1	—

Upgrade functions are registered with @register_upgrade(data_type, from_version=N) and chain forward: v0 → v1 → v2 → current. Missing upgrades assume forward-compatible (new fields get Pydantic defaults, removed fields are ignored by model_validate).

Computed Values (not stored)¶

These are derived at query time from stored data:

Value	Formula	States
Thesis health	`f(confirmed, invalidated, active, recency_weights)`	untested, strong (>0.7), mixed (0.4–0.7), weakening (<0.4), failing (<0.2)
Time pressure	`f(thesis.time_horizon, thesis.created, now)`	early (<25%), mid (25–60%), late (60–90%), overdue (>90%)
Composite score	Weighted rubric average (thesis_alignment 35%, news_sentiment 25%, valuation_signal 25%, social_signal 15%)	0–10 scale
Action	`f(composite, thesis_health, time_pressure)`	BUY_MORE, HOLD, TRIM, EXIT

CLI Reference¶

Database commands (`openfin tools db`)¶

Command	Purpose
`tools db snapshots`	Summary of snapshot types with counts and date ranges
`tools db history DATA_TYPE`	Query historical snapshots (options: `--symbol`, `--since`, `--until`, `--limit`, `--json`)
`tools db latest-snapshot DATA_TYPE`	Most recent snapshot for a type (options: `--symbol`, `--json`)
`tools db score-history SYMBOL`	Rubric score history across runs (options: `--metric`, `--since`, `--limit`, `--json`)
`tools db latest SYMBOLS`	Latest composite score for symbols
`tools db evidence SYMBOL`	Evidence records for a symbol (options: `--run-id`, `--limit`)
`tools db tables`	List tables and row counts
`tools db query SQL`	Read-only SQL (SELECT only)
`tools db migrate`	Run Alembic migrations

Review commands (`openfin review`)¶

Command	Purpose
`review list`	Recent review runs
`review show RUN_ID`	Metadata, annotations, scores for a run
`review lookup TARGET`	Observations for a ticker or thesis slug across runs
`review annotate`	Write an annotation (options: `--run-id`, `--field`, `--value`)
`review hypothesis create SLUG`	Create hypothesis (options: `--claim`, `--invalidation`, `--run-id`, `--horizon`)
`review hypothesis update ID`	Update hypothesis status (options: `--status`, `--resolution`, `--run-id`)

Direct SQLite access¶

sqlite3 artifacts/openfin.db ".schema data_snapshots"
sqlite3 -header -column artifacts/openfin.db \
  "SELECT data_type, COUNT(*) FROM data_snapshots GROUP BY data_type;"

Migrations¶

Schema changes use Alembic. Migration scripts live in src/openfin/migrations/versions/.

openfin tools db migrate                            # apply pending
alembic revision --autogenerate -m "description"    # create new

Design Notes¶

WAL mode — concurrent read safety; multiple CLI invocations can read while a write is in progress.
Best-effort persistence — FinanceApplication._store_snapshots() catches all exceptions. The CLI never fails due to a database error.
Batch IDs — each save_snapshots() call generates a UUID, letting you reconstruct exact data available at any point in time.
Write-through scoring — individual rubric scores written immediately during scoring, composites backfilled at finalize. Partial progress preserved if interrupted.
Hypothesis persistence — hypotheses outlive the review run that created them. run_id is provenance (which review generated it), but the hypothesis is scoped to a thesis and evolves across runs.
Annotation convention — obs: prefix for per-target observations. Non-prefixed fields (executive_summary, etc.) also write through to summary.json.
Thesis snapshots — save_thesis() inserts a snapshot row on each YAML write, creating an audit trail of how investment stories evolve.