System Verification SOP¶
When to Run¶
- After thesis model or watchlist changes
- After DB schema migrations
- After CLI command changes
- After weekly review or daily brief pipeline changes
- Before cutting a release or merging a large PR
Automated vs. Manual¶
Phases 2, 2b, 4, 4b, and 5 are fully automated (no live APIs):
uv run python scripts/system-verification.py # all automated phases
uv run python scripts/system-verification.py --phase 4 # single phase
uv run python scripts/system-verification.py --list # list phases
Phases 3, 3b, and 6 require live API keys and are manual/agent-driven (below).
Phase 1 — Unit Tests¶
Run the full test suite. All tests must pass (pre-existing failures excluded).
uv run pytest --tb=short -q
If the full suite hangs (some integration tests can be slow), run the core modules:
uv run pytest tests/test_thesis.py tests/test_watchlist.py tests/test_db.py tests/test_api_schemas.py tests/test_brokerage.py -v --tb=short
Pass criteria: zero new failures.
Phase 2 — Thesis System [automated]¶
Automated: uv run python scripts/system-verification.py --phase 2
Checks: load theses, symbol collection (plain string lists), time_horizon field present, adhoc add/remove roundtrip (temp dir), watchlist derivation with provenance.
For manual spot-checks or when debugging:
openfin thesis list # Should show all theses with symbols
openfin thesis show <any-slug> # Should display thesis details (narrative, symbols, time_horizon)
openfin thesis status <any-slug> # Should display health, time pressure, active hypotheses
openfin thesis symbols # Should show all symbols with thesis provenance
openfin watchlist list # Should show derived watchlist with Source column
openfin watchlist sources # Should show provenance for every symbol
Pass criteria: zero failures in automated script. CLI commands return expected output with no tracebacks.
Phase 2b — Hypothesis System [automated]¶
Automated: uv run python scripts/system-verification.py --phase 2b
Checks (in-memory DB): hypothesis CRUD (create, read, list, update), status transitions (active → confirmed/invalidated), resolved_at timestamp, thesis health computation (recency-weighted confirmed/invalidated ratio), time pressure computation from horizon string parsing.
For manual spot-checks:
openfin review hypothesis create <slug> --claim "If A then B" --invalidation "Unless C"
openfin review hypothesis list <slug>
openfin review hypothesis update <id> --status confirmed --resolution "A happened"
Pass criteria: zero failures in automated script. CLI commands create/update/list hypotheses without errors.
Phase 3 — Data Pipeline [manual — live APIs]¶
Verify market data fetching and snapshot persistence.
openfin init # All API keys valid
openfin market quote AAPL # Should return a quote
openfin research news AAPL --limit 3 # Should return news articles
openfin market overview # Should return indices + sectors
Verify snapshots landed in DB:
openfin tools db snapshots # Should show recent snapshot counts
Pass criteria: data commands return results, no empty responses for known-good symbols.
Phase 3b — Stock Analysis [manual — live APIs]¶
Verify the automated stock analysis pipeline.
openfin analysis stock AAPL # Should return BUY/HOLD/SELL with 8 dimensions
openfin analysis stock AAPL,MSFT --json # Should return valid JSON with signals array
Verify snapshot persistence:
openfin tools db snapshots # Should include stock_analysis type
Check output for: - Recommendation (BUY/HOLD/SELL) with confidence percentage - At least 4 scored dimensions (some may be unavailable depending on ticker) - Risk flags section (if any timing/overbought/geopolitical risks detected) - Supporting points and caveats present - No tracebacks
Pass criteria: analysis completes for a known-good ticker (e.g. AAPL), produces a recommendation with multiple dimensions scored, snapshot persisted to DB.
Phase 3c — Social Signals [manual — live APIs]¶
Verify Discord social signal data fetching and persistence. Requires Cloudflare D1 credentials ([cloudflare] in ~/.openfin/credentials.toml).
openfin social signals --symbols NVDA # Should return signal counts + channel table
openfin social signals --symbols NVDA --json # Should return valid JSON with SocialSignalSummary
openfin social radar # Should return non-watchlist tickers passing quality filter
openfin social radar --json # Should return valid JSON array of RadarItems
Verify snapshots landed in DB:
openfin tools db snapshots # Should include social_signal and social_radar types
Verify graceful degradation (temporarily unset credentials):
CLOUDFLARE_API_TOKEN= openfin social signals --symbols NVDA
# Should print warning and "No social signal data found", no crash
Pass criteria: signals/radar commands return data for known-active tickers, snapshots persisted, graceful degradation on missing credentials.
Phase 4 — Review Full Depth (Dry Run) [automated]¶
Automated: uv run python scripts/system-verification.py --phase 4
Uses the unified ReviewOrchestrator with depth="full". Checks (mocked app, no live APIs):
- PortfolioContext populated: total value, cost basis, P&L, weights sum ~100%
- portfolio_summary enriched: avg price, P&L, weight, gain fields present
- RiskSnapshot has real metrics (not placeholder): concentration, volatility flags
- Thesis packets: market context populated, position enriched with P&L/weight, hypothesis fields present (time_horizon, thesis_health, time_pressure)
- Holding packets: avg_price, cost_basis, pnl, weight, tax_status all non-empty
- Symbol scoring contexts: position enriched with P&L/weight
- Overview markdown: Portfolio Snapshot and Risk Snapshot sections present
- Persistence: inputs.json, context.json, summary.json all have _v, portfolio_context
- Artifact dirs: holdings/, context/ (deprecated) created
For a live data dry run (hits real APIs):
openfin review gather --depth full --dry-run
# or the alias:
openfin review weekly --dry-run
For a full persisted run with artifact inspection:
openfin review gather --depth full --force
ls artifacts/reports/weekly/$(date +%Y-%m-%d)/*/
Pass criteria: zero failures in automated script. Live dry run shows real data in all sections.
Phase 4b — Review Light Depth (Dry Run) [automated]¶
Automated: uv run python scripts/system-verification.py --phase 4b
Uses the unified ReviewOrchestrator with depth="light". Checks (mocked app, no live APIs):
- Returns WeeklyReport (same type as full depth)
- Portfolio context populated
- Thesis packets present
- Overview markdown has Portfolio Snapshot
- Search results empty (not gathered in light depth)
For a live data dry run:
openfin review gather --depth light --dry-run
# or the alias:
openfin review daily --dry-run
Pass criteria: zero failures in automated script. Live dry run has all sections populated.
Phase 5 — Payload Versioning [automated] + DB Integrity [manual]¶
Automated: uv run python scripts/system-verification.py --phase 5
Checks (pure logic, no DB):
- stamp() injects _v matching CURRENT_VERSIONS
- upgrade() strips _v, preserves original fields
- Pre-versioning payloads (no _v) upgrade gracefully
- Review artifact upgrades: v0→v3 adds portfolio_context, prior_scores_context, analysis_signal_context, social_signal_context
- Upgraded payloads validate against Pydantic models (ReviewInputs, ReviewContext, ReviewSummary)
- Stamp→upgrade roundtrip is identity
- Every data type with version >1 has an upgrade chain from v0
Manual — DB schema and data integrity:
uv run alembic current # Should show latest revision
uv run alembic check # Should report no pending migrations
Spot-check thesis snapshots:
sqlite3 artifacts/openfin.db "SELECT slug, COUNT(*) FROM thesis_snapshots GROUP BY slug;"
Verify payload versioning in DB — recent snapshots should carry _v:
sqlite3 artifacts/openfin.db "SELECT data_type, json_extract(payload, '$._v') as v FROM data_snapshots ORDER BY retrieved_at DESC LIMIT 5;"
# Should show _v=1 (or current version) for recent rows; NULL for pre-versioning rows is expected
Pass criteria: zero failures in automated script. Alembic reports head revision, recent data_snapshots have _v.
Phase 6 — API [manual — if applicable]¶
Only run if API/frontend changes were made.
uv run pytest tests/test_api.py tests/test_api_routes.py tests/test_api_schemas.py --tb=short
Verify OpenAPI spec is in sync with models:
uv run python -c "from openfin.api.main import app; import json; print(json.dumps(app.openapi(), indent=2))" > /dev/null
Pass criteria: API tests pass, OpenAPI spec generates without errors.
Quick Smoke Test (Minimum)¶
When time is limited, run the automated checks + one live API command:
uv run python scripts/system-verification.py # all automated phases (no APIs)
uv run pytest tests/test_thesis.py tests/test_watchlist.py -q # core unit tests
openfin analysis stock AAPL # one live API check
If all three pass, the core system is healthy.
For a full live-data check, also run:
openfin review gather --depth full --dry-run # Check: Portfolio Snapshot, Risk Snapshot, theses/holdings packets
openfin review gather --depth light --dry-run # Check: light-depth produces same artifact structure
openfin thesis status <any-slug> # Check: health + time pressure + hypotheses
openfin social signals --symbols NVDA # Check: signal counts + channel table (if Cloudflare D1 configured)