Backtesting Engine
The backtesting engine (arb_bot/backtest/) answers one question:
"If this strategy and these exit parameters had been live over a historical window,
what would have happened?" It powers exit-parameter tuning, full strategy replay,
and parallel grid search — exposed through a CLI, async API jobs, and the
/backtest dashboard tab.
Philosophy — replay the live code, never re-implement it
A backtest is only trustworthy if it exercises the same code that trades live.
This engine never re-writes scanner or exit logic. Instead it drives the real
DoubleCalendarScanner, StretchedDoubleCalendarScanner,
SkewedDCSScanner, DirectionalDiagonalScanner,
IronCondorScanner, and MasterDCScanner classes, and the real
evaluate_dc_exit() / evaluate_dcs_exit() / evaluate_ddc_exit() /
evaluate_ic_exit() helpers from execution/position_monitor.py.
Two small shims make that possible without touching a line of live strategy code:
| Shim | What it does |
|---|---|
MockDhanClient |
Returns historical option chains and LTPs in the exact DhanHQ response shape scanners expect, so scanners run unmodified against the past. Security IDs are deterministic (hash of symbol/strike/expiry/type) so they are stable across processes. |
patch_simulation_environment() |
For each simulated cycle it patches now_ist() to the historical instant and temporarily patches the validated Config overrides, then restores everything on exit. Scanners and exit helpers read Config directly, so a swept parameter changes a real live decision. |
- Historical
get_expiries()resolves against the simulated date, not today. Configpatches change live exit decisions and are fully restored after every cycle.- Scanner signals are converted into the same persisted trade dicts live execution builds — raw signals are never fed to exit helpers.
- P&L comes from the live
dc_pnl()/dcs_pnl()/ddc_pnl()/ic_pnl()helpers using current historical leg prices. Moving spot without moving option prices does not change P&L. - Calendar position state is updated before exit evaluation, mirroring the live monitor-before-scan cycle.
The idea — how one historical day is simulated
The engine walks every trading day in [start, end] and, on each day, runs the live cycle order:
Exit reasons returned by the live helpers as human strings are normalized to stable enum-like codes for
reporting: PROFIT_TARGET, PNL_STOP, TENT_BREAK, TIME_STOP,
IV_STOP, SHORT_STRIKE_BREACH, TRAILING_STOP, plus END_OF_RANGE.
Data fidelity ladder
Leg prices come from a CompositeSource that tries each source in priority order and falls back
cleanly — the actual source used is recorded on every OptionPrice.
| # | Source | Resolution | Provides |
|---|---|---|---|
| 1 | SQLiteSnapshotSource (iv_snapshots) | ~5 min, recent dates | Spot, ATM IV, and near-expiry smile LTPs (highest fidelity). |
| 2 | NSEBhavCopySource (bhavcopy_ohlc) | End-of-day, months/years | NIFTY/BANKNIFTY option closes, opening spot estimate, closing spot/IV estimate, and an open-to-close minimum realized range from the NSE UDiFF F&O common bhavcopy. |
| 3 | option_pricer (Black-Scholes) | Synthetic | Last-resort fill estimate from spot + ATM IV when no recorded price exists. |
Dhan live quotes are intentionally excluded — the broker API does not serve historical option-chain quotes.
Far-expiry and out-of-smile legs that the IV snapshot cannot cover fall through to bhavcopy or Black-Scholes
rather than being faked. Bhavcopy has no intraday quote timestamps, so it is never exposed to a morning cycle:
bhavcopy-only days replay at 15:30 IST using closing observations. This prevents same-day EOD prices
from leaking into a simulated 09:20 entry.
DDC_DISABLED and DCS_SKEW_DISABLED remain unchanged globally and are overridden only
inside an isolated DDC, DCS_SKEW, or MDC backtest worker.
Using it — CLI
Download historical bhavcopy data, run a single backtest, or sweep a parameter grid:
# 1. Inspect one archive header, then bulk-download a range
python -m arb_bot.backtest download --date 2026-06-03 --inspect-header
python -m arb_bot.backtest download --start 2026-01-01 --end today
python -m arb_bot.backtest download --status
# 2. Single run with a pinned exit parameter
python -m arb_bot.backtest run \
--strategy DC --start 2026-05-18 --end 2026-06-07 \
--param DC_PROFIT_TARGET_PCT=0.28
# 3. Grid search — defaults, or mix swept (start:end:step) and pinned params
python -m arb_bot.backtest grid-search --strategy MDC --start 2026-05-18 --end 2026-06-07 --use-defaults
python -m arb_bot.backtest grid-search --strategy DC --start 2026-05-18 --end 2026-06-07 \
--param DC_STOP_TENT_WIDTH=1.10:1.50:0.10 \
--param DC_PROFIT_TARGET_PCT=0.32 \
--rank-by sharpe
Using it — dashboard & API
The /backtest tab runs the same engine asynchronously. Pick
Single run or Grid search, choose a strategy and date range, add parameter
rows (scalar for single runs, comma-separated lists for grids), and submit. The page polls every two seconds
and renders metrics cards, a cumulative P&L curve, and a trade table; in grid mode, clicking a ranked row
loads that combination's trades.
| Method | Path | Description |
|---|---|---|
| POST | /api/backtest/run | Start a single run; returns run_id. |
| GET | /api/backtest/runs/{run_id} | Trades + metrics; poll until status: complete. |
| POST | /api/backtest/grid-search | Start a grid search; returns job_id. |
| GET | /api/backtest/grid-search/{job_id} | Ranked results; poll until complete. |
ProcessPoolExecutor worker. Because the
engine temporarily mutates Config class attributes, process isolation guarantees one job's overrides
can never leak into another job or into the live bot. Workers receive the database db_path (a string),
never a live source or SQLite connection. An unknown parameter key raises ValueError before any
simulation starts. Workers are started with the spawn multiprocessing context, never
fork: forking the threaded uvicorn dashboard would inherit held locks (e.g. the logging lock) and the
first log call inside a worker would deadlock, leaving the job stuck in running. A result timeout turns
a genuinely stuck worker into an error rather than an indefinite hang.
Reading the results
Each completed trade in BacktestTrade includes summary metrics and full per-leg detail:
| Field | Meaning |
|---|---|
net_pnl | Total P&L after entry + close transaction costs (default ranking metric). |
win_rate / profit_factor | Wins ÷ trades; gross wins ÷ gross losses. |
max_drawdown | Largest peak-to-trough drop on the cumulative equity curve. |
sharpe | Mean ÷ std-dev of trade P&Ls — omitted when fewer than 5 trades. |
mfe / mae | Max favorable / adverse excursion per trade — how far it ran for and against you. |
low_sample | Flag set when a combination has < 3 trades; such rows are excluded from grid ranking. |
entry_spot | NIFTY spot at entry. For IC, the ATM strike is used as a proxy when a snapshot is unavailable. |
entry_iv | ATM IV at entry: near_iv for DC/DCS/DCS_SKEW, atm_iv for IC/DDC. |
legs |
Per-leg list: [{label, strike, expiry, option_type, side, entry_price, exit_price}].
IC → 4 legs; DC → 4; DCS/DCS_SKEW → 8 (4 ATM + 4 OTM); DDC → 2 (short near, long far).
exit_price is the LTP at the day the trade was closed.
|
LOW_SAMPLE and ranked last, and a whole grid producing very few total trades is a sign the window
or filters are too tight. Prefer parameters that are good and well-sampled, and confirm out-of-sample
before changing a live *_DRY_RUN flag.
Scope
This MVP is historical EOD/5-min replay. Bhavcopy realized range is only the estimated opening-to-closing spot
move, not the true intraday high-low range. Out of scope (future work): walk-forward validation, intraday
tick-level simulation, BOX-spread backtesting (already covered by backtest_recovery.py), and live
paper trading through the engine.