finance datagen¶
Standard financial data generation
Overview¶
finance-datagen produces synthetic financial time series for
testing, demos, and benchmarking the rest of the finance-* stack
without relying on real market data. The numerical core is implemented
in Rust and emits Apache Arrow RecordBatch values; the Python layer
wraps each generator so the public API returns polars.DataFrame
objects.
All public generator classes inherit from DataGenerator, a pydantic
base model that validates typed parameters on construction. Use
.generate() for the table output, or next(generator) for one-shot
iterator-style use. Convenience functions such as generate_prices(...),
generate_gbm(...), and generate_signal(...) instantiate the matching model
and return .generate().
Generators¶
Price models (Rust core)¶
Symbol |
Model |
Output columns |
|---|---|---|
|
Geometric Brownian Motion (log-Euler) |
|
|
Heston (1993) stochastic volatility (full-truncation Euler) |
|
|
GARCH(1,1) returns |
|
|
OHLCV synthesis from any close series |
|
Price-path convenience wrappers are also exported as generate_prices,
generate_gbm, generate_heston, and generate_garch. generate_prices is a
plain alias for generate_gbm for examples and tests that want a model-neutral
name.
Python generators¶
Symbol |
Output |
|---|---|
|
Long-form |
|
Wide |
|
|
|
Long-form position panel |
|
Transaction log with enum-backed side/position-effect labels and explicit costs |
|
Enum-backed order fixtures with side, order type, status, and time-in-force |
|
Enum-backed execution fixtures for simulated fills |
|
Correlated multi-asset GBM panel |
|
Markov regime-switching price path |
|
Participation-rate impact curves with temporary, permanent, and total impact in bps |
|
PCA-style factor loadings, factor returns, and specific variance |
|
Barra-style enum-backed sector/style loadings plus specific variance |
|
Symmetric positive semidefinite factor covariance matrix |
|
Positive idiosyncratic variance vector |
Every Python generator has a matching generate_* convenience wrapper,
including the legacy generate_signal, generate_factor_loadings, and
generate_benchmark functions.
All Rust generators accept an optional seed: int for bit-reproducible
output across platforms (ChaCha8 RNG); the Python generators accept a
seed for numpy.random.default_rng.
Portfolio, transaction, order, execution, and market-model generators
also support enum-backed metadata columns where applicable, including
currency, exchange, region, instrument_type, market_type, and
venue_type. Portfolio and transaction generators can use
finance-dates.Calendar exchange calendars so generated dates and
timestamps align with actual business days and session hours.
Quick start¶
from finance_datagen import OrdersGenerator, generate_prices, generate_signal, ohlc_from_close
closes = generate_prices(symbol="ACME", seed=0)
bars = ohlc_from_close(closes["price"], symbol="ACME", seed=0)
signal = generate_signal(n_dates=20, n_assets=50, seed=0)
orders = OrdersGenerator(n_dates=3, n_assets=5, orders_per_day=10, exchange="XNYS", currency="USD", seed=0).generate()
See the Data page for model math, parameter ranges, and output schemas, and the API page for a complete function-level reference.
Architecture¶
The Rust core (rust/src/) is polars-free: every generator builds
an arrow_array::RecordBatch and returns it through the
Arrow C Data Interface
PyCapsule via pyo3-arrow. The Python wrappers call
polars.from_arrow(batch) on the receiving end. This keeps the
polars-rs and polars-py codebases on opposite sides of a stable ABI
boundary, avoiding the binary-incompatibility issues that come with
linking polars from both Rust and CPython.