Data Model¶
This document describes what data the system manages, how it's represented in memory (Pydantic models), and how it's persisted (SQLite database + filesystem artifacts).
Domain Concepts¶
| Term | Definition | Authored by | Persisted in |
|---|---|---|---|
| Thesis | A falsifiable investment belief with a narrative, symbols, and time horizon | Human | YAML (~/.openfin/theses/) |
| Hypothesis | A causal, falsifiable claim: "if A then B, invalidated by C" | AI | DB (thesis_hypotheses) |
| Evidence | A factual claim from data (news, filing, price action) with direction | AI | DB (decision_evidence) |
| Score | Per-symbol rubric rating (thesis_alignment, news_sentiment, etc.) | AI | DB (rubric_scores) |
| Thesis health | Derived from hypothesis outcomes (confirmed/invalidated ratio, recency-weighted) | Automation | Computed at query time |
| Time pressure | Derived from thesis time_horizon vs elapsed time | Automation | Computed at query time |
| Decision | Action recommendation informed by scores, thesis health, time pressure, position context | Automation + AI | DB + filesystem |
In-Memory Models¶
All in-memory data shapes are Pydantic models in src/openfin/models/. They are the single source of truth for data structure — DB schemas and API responses derive from them.
Market Data (models/market.py)¶
| Model | Purpose | Key fields |
|---|---|---|
QuoteSnapshot |
Ticker valuation snapshot | symbol, price, pe_ratio, forward_pe, peg_ratio, beta, ev_ebitda_ttm, 52w range |
IndexPerformance |
Major index price + change | symbol, price, period_change_pct |
SectorPerformance |
Sector ETF performance | name, symbol, period_change_pct |
WatchlistPerformance |
Per-symbol price and changes | symbol, price, day_change_pct, week_change_pct |
NewsArticle |
News headline with provenance | title, publisher, link, symbol, published_at |
SearchResult |
Web search result | title, description, url, source, query |
EarningsEntry |
Upcoming earnings date | symbol, earnings_date |
Brokerage Data (models/brokerage.py)¶
| Model | Purpose | Key fields |
|---|---|---|
Position |
Current holding | symbol, quantity, market_value, avg_price, day_change_pct, total_gain_pct, first_bought |
Order |
Trade order | symbol, action, quantity, price, status, submitted_at, filled_at |
BrokerageAccount |
Account summary | id, cash_balance, total_value, currency |
Macro Data (models/macro.py)¶
| Model | Purpose | Key fields |
|---|---|---|
MacroIndicator |
FRED economic indicator | series_id, name, value, unit, observation_date |
Analysis Models (models/analysis.py)¶
| Model | Purpose | Key fields |
|---|---|---|
AnalysisDimension |
A single scored dimension | name, score [-1,1], weight [0,1], explanation, raw_metrics |
RiskFlag |
Risk condition | category, description, confidence_penalty [0,1] |
StockAnalysisSignal |
Stock analysis output | ticker, recommendation (BUY/HOLD/SELL), confidence, final_score, dimensions[], risk_flags[] |
Social Signal Models (models/social.py)¶
| Model | Purpose | Key fields |
|---|---|---|
SocialSignalSummary |
Aggregated social signals per ticker | buy/sell/short counts, confidence breakdown, channel_data[] |
ChannelSignal |
Per-channel signal data | channel_name, win_rate, signal_count, dominant_action |
RadarItem |
Notable non-watchlist ticker | Social signal activity summary |
Review Pipeline Models (models/report.py)¶
These models define the artifacts produced by the review pipeline. Each maps to a JSON file on disk and may also write through to DB tables.
Context Packets (Phase 1 output — what the agent reads)¶
ThesisPacket HoldingPacket
├── slug, title, narrative ├── symbol, quantity, market_value
├── time_horizon ├── avg_price, cost_basis, pnl
├── thesis_health, time_pressure ├── total_gain_pct, portfolio_weight_pct
├── active_hypotheses: list[str] ├── tax_status
├── resolved_hypotheses: list[str] ├── market_context, news_context
├── upcoming_earnings: list[str] ├── search_context, prior_scores_context
└── symbols: list[ThesisPacketSymbol]├── analysis_signal_context
├── ticker ├── social_signal_context
├── market_context └── thesis_refs: list[str]
├── news_context
├── search_context
├── position_context
├── prior_scores_context
├── analysis_signal_context
└── social_signal_context
Context fields are pre-rendered text strings — the agent reads them directly without further lookups.
Scoring & Evidence¶
| Model | Purpose | Key fields |
|---|---|---|
RubricScore |
Single metric score | metric, score (float), rationale |
SymbolScorecard |
All scores for one symbol | symbol, scores[], composite_score, composite_confidence |
ScoringEntry |
Score within ReviewScoring |
score (float), rationale |
EvidenceRecord |
Factual claim from data | source_type, claim, direction (confirming/disconfirming/neutral), source_url, attribution |
Decision |
Action recommendation | symbol, action (BUY_MORE/HOLD/TRIM/EXIT), confidence, scorecard, evidence[], entry_zone, exit_trigger, sizing_note |
Review Artifacts (one JSON file each)¶
| Model | Phase | File | Purpose |
|---|---|---|---|
ReviewInputs |
1 | inputs.json |
Raw market data, positions, indices, macro, news, search, macro_events |
ReviewContext |
1 | context.json |
Per-symbol SymbolScoringContext + rubric prompts |
ReviewScoring |
2-3 | scoring.json |
scores: {symbol: {metric: ScoringEntry}}, evidence: {symbol: [EvidenceRecord]} |
ReviewDecisions |
6 | decisions.json |
decisions[], risk_snapshot |
ReviewSummary |
1+5 | summary.json |
Narratives (executive_summary, portfolio_summary, etc.), action_items[] |
WeeklyReport |
— | reconstituted | Composite of all artifacts via from_artifacts() |
Database Persistence¶
Local SQLite (artifacts/openfin.db) in WAL mode. Managed by SQLAlchemy ORM with Alembic migrations. All DB writes are best-effort — CLI commands behave identically whether or not the database is available.
Schema¶
Tables¶
| Table | Purpose | Key columns | Write pattern |
|---|---|---|---|
data_snapshots |
Timestamped API fetch results | data_type, symbol, payload (JSON), batch_id |
Append-only, one row per fetched item |
thesis_snapshots |
Point-in-time thesis YAML snapshots | slug, status, payload (JSON), symbols_csv |
Append-only on thesis save |
thesis_hypotheses |
AI-generated falsifiable claims | thesis_slug, claim, invalidation, status |
Created by AI, updated across runs |
review_runs |
Review session lifecycle | run_id, status, trigger |
Created at gather, finalized at end |
rubric_scores |
Per-metric scores from reviews | symbol, metric, score, composite_score, action |
Written per-score, composites backfilled at finalize |
decision_evidence |
Source citations backing decisions | symbol, source_type, claim, direction, hypothesis_id |
Append-only per review |
review_annotations |
Agent observations and summaries | run_id, field, value, symbol, thesis_slug |
Append-only, last-write-wins on read |
Indexes¶
| Table | Index | Columns |
|---|---|---|
data_snapshots |
ix_data_snapshots_type_symbol_time |
data_type, symbol, retrieved_at |
data_snapshots |
ix_data_snapshots_retrieved_at |
retrieved_at |
data_snapshots |
ix_data_snapshots_batch_id |
batch_id |
rubric_scores |
ix_rubric_scores_symbol_date |
symbol, run_date |
rubric_scores |
ix_rubric_scores_run_id |
run_id |
decision_evidence |
ix_decision_evidence_run_id |
run_id |
decision_evidence |
ix_decision_evidence_symbol |
symbol |
review_runs |
ix_review_runs_run_id |
run_id |
review_runs |
ix_review_runs_run_date |
run_date |
review_annotations |
ix_review_annotations_run_id |
run_id |
review_annotations |
ix_review_annotations_symbol |
symbol |
review_annotations |
ix_review_annotations_thesis_slug |
thesis_slug |
thesis_snapshots |
ix_thesis_snapshots_slug_created |
slug, created_at |
thesis_hypotheses |
ix_thesis_hypotheses_slug |
thesis_slug |
thesis_hypotheses |
ix_thesis_hypotheses_status |
status |
Pydantic → DB Serialization¶
Data snapshots: model → JSON blob¶
Every API fetch is stored as a data_snapshots row. The Pydantic model is the in-memory shape; the DB stores it as a versioned JSON blob:
Pydantic model (e.g. Position)
│ .model_dump_json()
▼
JSON string
│ stamp(data_type, json) → adds _v field
▼
DataSnapshot row (data_type, symbol, payload=stamped_json, batch_id)
On read, upgrade(data_type, payload) walks the version chain, strips _v, returns a clean dict for Model.model_validate().
data_type |
Pydantic model | Source | symbol column |
|---|---|---|---|
quote |
QuoteSnapshot |
Finnhub | Ticker |
position |
Position |
SnapTrade | Ticker |
order |
Order |
SnapTrade | Ticker |
news |
NewsArticle |
Finnhub | Ticker |
macro |
MacroSnapshot |
FRED | Indicator name |
market_index |
IndexPerformance |
Finnhub | Index name |
market_sector |
SectorPerformance |
Finnhub | Sector name |
search |
SearchResult |
Brave Search | Query string |
watchlist |
WatchlistPerformance |
Local config | Ticker |
earnings |
EarningsEntry |
Finnhub | Ticker |
forex |
— | Finnhub | Currency pair |
commodity |
— | Finnhub | Commodity name |
stock_analysis |
StockAnalysisSignal |
Computed | Ticker |
social_signal |
SocialSignalSummary |
Discord | Ticker |
social_radar |
RadarItem |
Discord | Ticker |
Scores: SymbolScorecard → rubric_scores rows¶
SymbolScorecard
├── symbol
├── scores: [RubricScore(metric, score, rationale), ...]
├── composite_score
└── composite_confidence
│ save_scorecard()
▼
One RubricScoreRow per metric
(composite fields backfilled at finalize)
Individual scores are written immediately during scoring (with action=PENDING). At finalize, composite scores and the final action are backfilled.
Evidence: EvidenceRecord → decision_evidence rows¶
EvidenceRecord
├── source_type (sec_filing, news, market_data, macro, thesis)
├── claim, source_url, attribution
├── direction (confirming, disconfirming, neutral)
└── retrieved_at
│ save_evidence(run_id, symbol, evidence[], hypothesis_id)
▼
DecisionEvidence row per record
(hypothesis_id links to thesis_hypotheses)
Annotations: text → review_annotations rows¶
Annotation (field="obs:AAPL", value="Price broke support")
│ save_annotation() extracts target from prefix
▼
ReviewAnnotationRow
├── run_id, field, value
├── symbol = "AAPL" (extracted from "obs:")
└── thesis_slug = null
The obs: prefix convention: obs:AAPL extracts symbol=AAPL; obs:oil-price-shock (contains hyphens) extracts thesis_slug=oil-price-shock.
Hypotheses: CLI → thesis_hypotheses rows¶
Hypotheses are created by the AI via CLI commands during reviews, persisted directly as DB rows. They are not Pydantic models — they are ORM objects (ThesisHypothesis) that persist across review runs. Hypothesis status evolves over multiple reviews as the AI assesses them against new data.
Filesystem Artifacts¶
Review artifact tree¶
artifacts/reports/weekly/{DATE}/{RUN_ID}/
├── metadata.json # run_id, date, trigger
├── inputs.json # ReviewInputs: raw market data, positions, macro_events
├── context.json # ReviewContext: per-symbol scoring contexts + rubric prompts
├── summary.json # ReviewSummary: narratives, action items, portfolio context
├── scoring.json # ReviewScoring: rubric scores + evidence per symbol
├── observations.json # Accumulated obs: annotations (filesystem audit trail)
├── decisions.json # ReviewDecisions: composites, actions, risk snapshot
├── overview.md # Formatted summary with catalyst calendar
├── report.md # Final rendered report
├── theses/
│ └── {SLUG}.md # ThesisPacket rendered as markdown
└── holdings/
└── {SYMBOL}.md # HoldingPacket rendered as markdown
Dual-write pattern¶
The review pipeline uses filesystem as source of truth with best-effort DB write-through:
- Filesystem — all review artifacts (JSON + markdown) are written to the run directory
- DB write-through — scores, evidence, annotations, and run status are also written to DB tables (swallows exceptions with a warning if DB is unavailable)
observations.json— accumulatesobs:field annotations during the review, persisted to DB asreview_annotationsrows
Thesis YAML¶
Theses live in ~/.openfin/theses/{slug}.yaml (user data directory). Each save also inserts a thesis_snapshots row for audit history.
slug: ai-compute-hardware
title: AI Compute Hardware Cycle
time_horizon: 18 months
narrative: |
Hyperscaler and enterprise AI training/inference workloads are driving
unprecedented demand for GPUs, custom ASICs, and HBM...
status: active
symbols: [NVDA, ASML, MRVL, TSM]
Macro calendar¶
Manually-maintained at investments/macro_calendar.yaml. Filtered to next 30 days at gather time, rendered in overview.md with countdown labels.
Payload Versioning¶
Every JSON payload carries a _v field (schema version) managed by db/payload_version.py.
Write path: stamp(data_type, json) injects _v with the current version.
Read path: upgrade(data_type, raw) walks the upgrade chain from the payload's version to current, then strips _v for model validation.
Current versions:
| Data type | Version | Notes |
|---|---|---|
position, order, quote, market_index, market_sector, watchlist, earnings, search, macro, forex, commodity, stock_analysis, social_signal, social_radar |
1 | Initial shape |
news |
2 | Added published_at timestamp |
review_inputs |
3 | v2 added portfolio_context; v3 added macro_events |
review_context |
3 | v2 added prior_scores_context, analysis_signal_context; v3 added social_signal_context |
review_summary |
2 | Added portfolio_context |
review_scoring |
1 | — |
review_decisions |
1 | — |
Upgrade functions are registered with @register_upgrade(data_type, from_version=N) and chain forward: v0 → v1 → v2 → current. Missing upgrades assume forward-compatible (new fields get Pydantic defaults, removed fields are ignored by model_validate).
Computed Values (not stored)¶
These are derived at query time from stored data:
| Value | Formula | States |
|---|---|---|
| Thesis health | f(confirmed, invalidated, active, recency_weights) |
untested, strong (>0.7), mixed (0.4–0.7), weakening (<0.4), failing (<0.2) |
| Time pressure | f(thesis.time_horizon, thesis.created, now) |
early (<25%), mid (25–60%), late (60–90%), overdue (>90%) |
| Composite score | Weighted rubric average (thesis_alignment 35%, news_sentiment 25%, valuation_signal 25%, social_signal 15%) | 0–10 scale |
| Action | f(composite, thesis_health, time_pressure) |
BUY_MORE, HOLD, TRIM, EXIT |
CLI Reference¶
Database commands (openfin tools db)¶
| Command | Purpose |
|---|---|
tools db snapshots |
Summary of snapshot types with counts and date ranges |
tools db history DATA_TYPE |
Query historical snapshots (options: --symbol, --since, --until, --limit, --json) |
tools db latest-snapshot DATA_TYPE |
Most recent snapshot for a type (options: --symbol, --json) |
tools db score-history SYMBOL |
Rubric score history across runs (options: --metric, --since, --limit, --json) |
tools db latest SYMBOLS |
Latest composite score for symbols |
tools db evidence SYMBOL |
Evidence records for a symbol (options: --run-id, --limit) |
tools db tables |
List tables and row counts |
tools db query SQL |
Read-only SQL (SELECT only) |
tools db migrate |
Run Alembic migrations |
Review commands (openfin review)¶
| Command | Purpose |
|---|---|
review list |
Recent review runs |
review show RUN_ID |
Metadata, annotations, scores for a run |
review lookup TARGET |
Observations for a ticker or thesis slug across runs |
review annotate |
Write an annotation (options: --run-id, --field, --value) |
review hypothesis create SLUG |
Create hypothesis (options: --claim, --invalidation, --run-id, --horizon) |
review hypothesis update ID |
Update hypothesis status (options: --status, --resolution, --run-id) |
Direct SQLite access¶
sqlite3 artifacts/openfin.db ".schema data_snapshots"
sqlite3 -header -column artifacts/openfin.db \
"SELECT data_type, COUNT(*) FROM data_snapshots GROUP BY data_type;"
Migrations¶
Schema changes use Alembic. Migration scripts live in src/openfin/migrations/versions/.
openfin tools db migrate # apply pending
alembic revision --autogenerate -m "description" # create new
Design Notes¶
- WAL mode — concurrent read safety; multiple CLI invocations can read while a write is in progress.
- Best-effort persistence —
FinanceApplication._store_snapshots()catches all exceptions. The CLI never fails due to a database error. - Batch IDs — each
save_snapshots()call generates a UUID, letting you reconstruct exact data available at any point in time. - Write-through scoring — individual rubric scores written immediately during scoring, composites backfilled at finalize. Partial progress preserved if interrupted.
- Hypothesis persistence — hypotheses outlive the review run that created them.
run_idis provenance (which review generated it), but the hypothesis is scoped to a thesis and evolves across runs. - Annotation convention —
obs:prefix for per-target observations. Non-prefixed fields (executive_summary, etc.) also write through tosummary.json. - Thesis snapshots —
save_thesis()inserts a snapshot row on each YAML write, creating an audit trail of how investment stories evolve.