Backtesting

Backtesting Engine

The backtesting engine (arb_bot/backtest/) answers one question: "If this strategy and these exit parameters had been live over a historical window, what would have happened?" It powers exit-parameter tuning, full strategy replay, and parallel grid search — exposed through a CLI, async API jobs, and the /backtest dashboard tab.

Philosophy — replay the live code, never re-implement it

A backtest is only trustworthy if it exercises the same code that trades live. This engine never re-writes scanner or exit logic. Instead it drives the real DoubleCalendarScanner, StretchedDoubleCalendarScanner, SkewedDCSScanner, DirectionalDiagonalScanner, IronCondorScanner, and MasterDCScanner classes, and the real evaluate_dc_exit() / evaluate_dcs_exit() / evaluate_ddc_exit() / evaluate_ic_exit() helpers from execution/position_monitor.py.

Two small shims make that possible without touching a line of live strategy code:

ShimWhat it does
MockDhanClient Returns historical option chains and LTPs in the exact DhanHQ response shape scanners expect, so scanners run unmodified against the past. Security IDs are deterministic (hash of symbol/strike/expiry/type) so they are stable across processes.
patch_simulation_environment() For each simulated cycle it patches now_ist() to the historical instant and temporarily patches the validated Config overrides, then restores everything on exit. Scanners and exit helpers read Config directly, so a swept parameter changes a real live decision.
Correctness guarantees (each covered by a test)
  • Historical get_expiries() resolves against the simulated date, not today.
  • Config patches change live exit decisions and are fully restored after every cycle.
  • Scanner signals are converted into the same persisted trade dicts live execution builds — raw signals are never fed to exit helpers.
  • P&L comes from the live dc_pnl()/dcs_pnl()/ddc_pnl()/ic_pnl() helpers using current historical leg prices. Moving spot without moving option prices does not change P&L.
  • Calendar position state is updated before exit evaluation, mirroring the live monitor-before-scan cycle.

The idea — how one historical day is simulated

The engine walks every trading day in [start, end] and, on each day, runs the live cycle order:

1. patch time + config └── snapshot-backed day → 09:20 IST · bhavcopy-only day → 15:30 IST · Config overrides applied for this cycle only 2. monitor open trades └── reprice every leg from the data source · update_calendar_position_state() · live exit helper · close on decision 3. scan for a new entry └── the same scanner instance runs against surviving open trades · a signal becomes a simulated trade 4. end of range └── any trade still open after the last replayed day is force-closed with reason END_OF_RANGE

Exit reasons returned by the live helpers as human strings are normalized to stable enum-like codes for reporting: PROFIT_TARGET, PNL_STOP, TENT_BREAK, TIME_STOP, IV_STOP, SHORT_STRIKE_BREACH, TRAILING_STOP, plus END_OF_RANGE.

Data fidelity ladder

Leg prices come from a CompositeSource that tries each source in priority order and falls back cleanly — the actual source used is recorded on every OptionPrice.

#SourceResolutionProvides
1SQLiteSnapshotSource (iv_snapshots)~5 min, recent datesSpot, ATM IV, and near-expiry smile LTPs (highest fidelity).
2NSEBhavCopySource (bhavcopy_ohlc)End-of-day, months/yearsNIFTY/BANKNIFTY option closes, opening spot estimate, closing spot/IV estimate, and an open-to-close minimum realized range from the NSE UDiFF F&O common bhavcopy.
3option_pricer (Black-Scholes)SyntheticLast-resort fill estimate from spot + ATM IV when no recorded price exists.

Dhan live quotes are intentionally excluded — the broker API does not serve historical option-chain quotes. Far-expiry and out-of-smile legs that the IV snapshot cannot cover fall through to bhavcopy or Black-Scholes rather than being faked. Bhavcopy has no intraday quote timestamps, so it is never exposed to a morning cycle: bhavcopy-only days replay at 15:30 IST using closing observations. This prevents same-day EOD prices from leaking into a simulated 09:20 entry.

Directional and MDC replay inputs DDC receives the historical current-day opening spot when the source provides it; yesterday's close is never substituted as today's open. MDC receives the same day-open plus the source's historical realized range so its DCS, DCS_SKEW, DDC, and IC regime branches are not forced into the zero-range DC branch. Live safety flags DDC_DISABLED and DCS_SKEW_DISABLED remain unchanged globally and are overridden only inside an isolated DDC, DCS_SKEW, or MDC backtest worker.

Using it — CLI

Download historical bhavcopy data, run a single backtest, or sweep a parameter grid:

# 1. Inspect one archive header, then bulk-download a range
python -m arb_bot.backtest download --date 2026-06-03 --inspect-header
python -m arb_bot.backtest download --start 2026-01-01 --end today
python -m arb_bot.backtest download --status

# 2. Single run with a pinned exit parameter
python -m arb_bot.backtest run \
  --strategy DC --start 2026-05-18 --end 2026-06-07 \
  --param DC_PROFIT_TARGET_PCT=0.28

# 3. Grid search — defaults, or mix swept (start:end:step) and pinned params
python -m arb_bot.backtest grid-search --strategy MDC --start 2026-05-18 --end 2026-06-07 --use-defaults
python -m arb_bot.backtest grid-search --strategy DC --start 2026-05-18 --end 2026-06-07 \
  --param DC_STOP_TENT_WIDTH=1.10:1.50:0.10 \
  --param DC_PROFIT_TARGET_PCT=0.32 \
  --rank-by sharpe

Using it — dashboard & API

The /backtest tab runs the same engine asynchronously. Pick Single run or Grid search, choose a strategy and date range, add parameter rows (scalar for single runs, comma-separated lists for grids), and submit. The page polls every two seconds and renders metrics cards, a cumulative P&L curve, and a trade table; in grid mode, clicking a ranked row loads that combination's trades.

MethodPathDescription
POST/api/backtest/runStart a single run; returns run_id.
GET/api/backtest/runs/{run_id}Trades + metrics; poll until status: complete.
POST/api/backtest/grid-searchStart a grid search; returns job_id.
GET/api/backtest/grid-search/{job_id}Ranked results; poll until complete.
Worker-process isolation Every run and every grid combination executes in its own ProcessPoolExecutor worker. Because the engine temporarily mutates Config class attributes, process isolation guarantees one job's overrides can never leak into another job or into the live bot. Workers receive the database db_path (a string), never a live source or SQLite connection. An unknown parameter key raises ValueError before any simulation starts. Workers are started with the spawn multiprocessing context, never fork: forking the threaded uvicorn dashboard would inherit held locks (e.g. the logging lock) and the first log call inside a worker would deadlock, leaving the job stuck in running. A result timeout turns a genuinely stuck worker into an error rather than an indefinite hang.

Reading the results

Each completed trade in BacktestTrade includes summary metrics and full per-leg detail:

FieldMeaning
net_pnlTotal P&L after entry + close transaction costs (default ranking metric).
win_rate / profit_factorWins ÷ trades; gross wins ÷ gross losses.
max_drawdownLargest peak-to-trough drop on the cumulative equity curve.
sharpeMean ÷ std-dev of trade P&Ls — omitted when fewer than 5 trades.
mfe / maeMax favorable / adverse excursion per trade — how far it ran for and against you.
low_sampleFlag set when a combination has < 3 trades; such rows are excluded from grid ranking.
entry_spotNIFTY spot at entry. For IC, the ATM strike is used as a proxy when a snapshot is unavailable.
entry_ivATM IV at entry: near_iv for DC/DCS/DCS_SKEW, atm_iv for IC/DDC.
legs Per-leg list: [{label, strike, expiry, option_type, side, entry_price, exit_price}]. IC → 4 legs; DC → 4; DCS/DCS_SKEW → 8 (4 ATM + 4 OTM); DDC → 2 (short near, long far). exit_price is the LTP at the day the trade was closed.
Guard against overfitting A grid combination that wins on two trades is noise, not signal. Combinations below three trades are flagged LOW_SAMPLE and ranked last, and a whole grid producing very few total trades is a sign the window or filters are too tight. Prefer parameters that are good and well-sampled, and confirm out-of-sample before changing a live *_DRY_RUN flag.

Scope

This MVP is historical EOD/5-min replay. Bhavcopy realized range is only the estimated opening-to-closing spot move, not the true intraday high-low range. Out of scope (future work): walk-forward validation, intraday tick-level simulation, BOX-spread backtesting (already covered by backtest_recovery.py), and live paper trading through the engine.