Skip to content

System Verification SOP

When to Run

  • After thesis model or watchlist changes
  • After DB schema migrations
  • After CLI command changes
  • After weekly review or daily brief pipeline changes
  • Before cutting a release or merging a large PR

Automated vs. Manual

Phases 2, 2b, 4, 4b, and 5 are fully automated (no live APIs):

uv run python scripts/system-verification.py          # all automated phases
uv run python scripts/system-verification.py --phase 4 # single phase
uv run python scripts/system-verification.py --list    # list phases

Phases 3, 3b, and 6 require live API keys and are manual/agent-driven (below).

Phase 1 — Unit Tests

Run the full test suite. All tests must pass (pre-existing failures excluded).

uv run pytest --tb=short -q

If the full suite hangs (some integration tests can be slow), run the core modules:

uv run pytest tests/test_thesis.py tests/test_watchlist.py tests/test_db.py tests/test_api_schemas.py tests/test_brokerage.py -v --tb=short

Pass criteria: zero new failures.

Phase 2 — Thesis System [automated]

Automated: uv run python scripts/system-verification.py --phase 2

Checks: load theses, symbol collection (plain string lists), time_horizon field present, adhoc add/remove roundtrip (temp dir), watchlist derivation with provenance.

For manual spot-checks or when debugging:

openfin thesis list                    # Should show all theses with symbols
openfin thesis show <any-slug>         # Should display thesis details (narrative, symbols, time_horizon)
openfin thesis status <any-slug>       # Should display health, time pressure, active hypotheses
openfin thesis symbols                 # Should show all symbols with thesis provenance
openfin watchlist list                 # Should show derived watchlist with Source column
openfin watchlist sources              # Should show provenance for every symbol

Pass criteria: zero failures in automated script. CLI commands return expected output with no tracebacks.

Phase 2b — Hypothesis System [automated]

Automated: uv run python scripts/system-verification.py --phase 2b

Checks (in-memory DB): hypothesis CRUD (create, read, list, update), status transitions (active → confirmed/invalidated), resolved_at timestamp, thesis health computation (recency-weighted confirmed/invalidated ratio), time pressure computation from horizon string parsing.

For manual spot-checks:

openfin review hypothesis create <slug> --claim "If A then B" --invalidation "Unless C"
openfin review hypothesis list <slug>
openfin review hypothesis update <id> --status confirmed --resolution "A happened"

Pass criteria: zero failures in automated script. CLI commands create/update/list hypotheses without errors.

Phase 3 — Data Pipeline [manual — live APIs]

Verify market data fetching and snapshot persistence.

openfin init                           # All API keys valid
openfin market quote AAPL              # Should return a quote
openfin research news AAPL --limit 3   # Should return news articles
openfin market overview                # Should return indices + sectors

Verify snapshots landed in DB:

openfin tools db snapshots              # Should show recent snapshot counts

Pass criteria: data commands return results, no empty responses for known-good symbols.

Phase 3b — Stock Analysis [manual — live APIs]

Verify the automated stock analysis pipeline.

openfin analysis stock AAPL              # Should return BUY/HOLD/SELL with 8 dimensions
openfin analysis stock AAPL,MSFT --json  # Should return valid JSON with signals array

Verify snapshot persistence:

openfin tools db snapshots               # Should include stock_analysis type

Check output for: - Recommendation (BUY/HOLD/SELL) with confidence percentage - At least 4 scored dimensions (some may be unavailable depending on ticker) - Risk flags section (if any timing/overbought/geopolitical risks detected) - Supporting points and caveats present - No tracebacks

Pass criteria: analysis completes for a known-good ticker (e.g. AAPL), produces a recommendation with multiple dimensions scored, snapshot persisted to DB.

Phase 3c — Social Signals [manual — live APIs]

Verify Discord social signal data fetching and persistence. Requires Cloudflare D1 credentials ([cloudflare] in ~/.openfin/credentials.toml).

openfin social signals --symbols NVDA       # Should return signal counts + channel table
openfin social signals --symbols NVDA --json # Should return valid JSON with SocialSignalSummary
openfin social radar                         # Should return non-watchlist tickers passing quality filter
openfin social radar --json                  # Should return valid JSON array of RadarItems

Verify snapshots landed in DB:

openfin tools db snapshots                   # Should include social_signal and social_radar types

Verify graceful degradation (temporarily unset credentials):

CLOUDFLARE_API_TOKEN= openfin social signals --symbols NVDA
# Should print warning and "No social signal data found", no crash

Pass criteria: signals/radar commands return data for known-active tickers, snapshots persisted, graceful degradation on missing credentials.

Phase 4 — Review Full Depth (Dry Run) [automated]

Automated: uv run python scripts/system-verification.py --phase 4

Uses the unified ReviewOrchestrator with depth="full". Checks (mocked app, no live APIs): - PortfolioContext populated: total value, cost basis, P&L, weights sum ~100% - portfolio_summary enriched: avg price, P&L, weight, gain fields present - RiskSnapshot has real metrics (not placeholder): concentration, volatility flags - Thesis packets: market context populated, position enriched with P&L/weight, hypothesis fields present (time_horizon, thesis_health, time_pressure) - Holding packets: avg_price, cost_basis, pnl, weight, tax_status all non-empty - Symbol scoring contexts: position enriched with P&L/weight - Overview markdown: Portfolio Snapshot and Risk Snapshot sections present - Persistence: inputs.json, context.json, summary.json all have _v, portfolio_context - Artifact dirs: holdings/, context/ (deprecated) created

For a live data dry run (hits real APIs):

openfin review gather --depth full --dry-run
# or the alias:
openfin review weekly --dry-run

For a full persisted run with artifact inspection:

openfin review gather --depth full --force
ls artifacts/reports/weekly/$(date +%Y-%m-%d)/*/

Pass criteria: zero failures in automated script. Live dry run shows real data in all sections.

Phase 4b — Review Light Depth (Dry Run) [automated]

Automated: uv run python scripts/system-verification.py --phase 4b

Uses the unified ReviewOrchestrator with depth="light". Checks (mocked app, no live APIs): - Returns WeeklyReport (same type as full depth) - Portfolio context populated - Thesis packets present - Overview markdown has Portfolio Snapshot - Search results empty (not gathered in light depth)

For a live data dry run:

openfin review gather --depth light --dry-run
# or the alias:
openfin review daily --dry-run

Pass criteria: zero failures in automated script. Live dry run has all sections populated.

Phase 5 — Payload Versioning [automated] + DB Integrity [manual]

Automated: uv run python scripts/system-verification.py --phase 5

Checks (pure logic, no DB): - stamp() injects _v matching CURRENT_VERSIONS - upgrade() strips _v, preserves original fields - Pre-versioning payloads (no _v) upgrade gracefully - Review artifact upgrades: v0→v3 adds portfolio_context, prior_scores_context, analysis_signal_context, social_signal_context - Upgraded payloads validate against Pydantic models (ReviewInputs, ReviewContext, ReviewSummary) - Stamp→upgrade roundtrip is identity - Every data type with version >1 has an upgrade chain from v0

Manual — DB schema and data integrity:

uv run alembic current             # Should show latest revision
uv run alembic check               # Should report no pending migrations

Spot-check thesis snapshots:

sqlite3 artifacts/openfin.db "SELECT slug, COUNT(*) FROM thesis_snapshots GROUP BY slug;"

Verify payload versioning in DB — recent snapshots should carry _v:

sqlite3 artifacts/openfin.db "SELECT data_type, json_extract(payload, '$._v') as v FROM data_snapshots ORDER BY retrieved_at DESC LIMIT 5;"
# Should show _v=1 (or current version) for recent rows; NULL for pre-versioning rows is expected

Pass criteria: zero failures in automated script. Alembic reports head revision, recent data_snapshots have _v.

Phase 6 — API [manual — if applicable]

Only run if API/frontend changes were made.

uv run pytest tests/test_api.py tests/test_api_routes.py tests/test_api_schemas.py --tb=short

Verify OpenAPI spec is in sync with models:

uv run python -c "from openfin.api.main import app; import json; print(json.dumps(app.openapi(), indent=2))" > /dev/null

Pass criteria: API tests pass, OpenAPI spec generates without errors.

Quick Smoke Test (Minimum)

When time is limited, run the automated checks + one live API command:

uv run python scripts/system-verification.py    # all automated phases (no APIs)
uv run pytest tests/test_thesis.py tests/test_watchlist.py -q  # core unit tests
openfin analysis stock AAPL                      # one live API check

If all three pass, the core system is healthy.

For a full live-data check, also run:

openfin review gather --depth full --dry-run     # Check: Portfolio Snapshot, Risk Snapshot, theses/holdings packets
openfin review gather --depth light --dry-run    # Check: light-depth produces same artifact structure
openfin thesis status <any-slug>                 # Check: health + time pressure + hypotheses
openfin social signals --symbols NVDA            # Check: signal counts + channel table (if Cloudflare D1 configured)