finance datagen

Standard financial data generation

Build Status codecov License PyPI

Overview

finance-datagen produces synthetic financial time series for testing, demos, and benchmarking the rest of the finance-* stack without relying on real market data. The numerical core is implemented in Rust and emits Apache Arrow RecordBatch values; the Python layer wraps each generator so the public API returns polars.DataFrame objects.

All public generator classes inherit from DataGenerator, a pydantic base model that validates typed parameters on construction. Use .generate() for the table output, or next(generator) for one-shot iterator-style use. Convenience functions such as generate_prices(...), generate_gbm(...), and generate_signal(...) instantiate the matching model and return .generate().

Generators

Price models (Rust core)

Symbol

Model

Output columns

GBMGenerator

Geometric Brownian Motion (log-Euler)

timestamp, symbol, price

HestonGenerator

Heston (1993) stochastic volatility (full-truncation Euler)

timestamp, symbol, price, variance

GARCHGenerator

GARCH(1,1) returns

timestamp, symbol, price, return, sigma

ohlc_from_close

OHLCV synthesis from any close series

timestamp, symbol, open, high, low, close, volume

Price-path convenience wrappers are also exported as generate_prices, generate_gbm, generate_heston, and generate_garch. generate_prices is a plain alias for generate_gbm for examples and tests that want a model-neutral name.

Python generators

Symbol

Output

SignalGenerator

Long-form [date, symbol, signal, fwd_returns] with target Pearson IC

FactorLoadingsGenerator

Wide [symbol, market, value, momentum, size, quality] Barra-style loadings

BenchmarkGenerator

[date, benchmark] Gaussian benchmark return series

PositionsGenerator

Long-form position panel [date, symbol, price, quantity, market_value, weight]

TransactionsGenerator

Transaction log with enum-backed side/position-effect labels and explicit costs

OrdersGenerator

Enum-backed order fixtures with side, order type, status, and time-in-force

ExecutionsGenerator

Enum-backed execution fixtures for simulated fills

MultiAssetGBMGenerator

Correlated multi-asset GBM panel [timestamp, symbol, price, return]

RegimeSwitchingGenerator

Markov regime-switching price path [timestamp, symbol, price, return, regime]

MarketImpactCurveGenerator

Participation-rate impact curves with temporary, permanent, and total impact in bps

StatisticalRiskModelGenerator

PCA-style factor loadings, factor returns, and specific variance

FundamentalRiskModelGenerator

Barra-style enum-backed sector/style loadings plus specific variance

FactorCovarianceGenerator

Symmetric positive semidefinite factor covariance matrix

SpecificVarianceGenerator

Positive idiosyncratic variance vector

Every Python generator has a matching generate_* convenience wrapper, including the legacy generate_signal, generate_factor_loadings, and generate_benchmark functions.

All Rust generators accept an optional seed: int for bit-reproducible output across platforms (ChaCha8 RNG); the Python generators accept a seed for numpy.random.default_rng.

Portfolio, transaction, order, execution, and market-model generators also support enum-backed metadata columns where applicable, including currency, exchange, region, instrument_type, market_type, and venue_type. Portfolio and transaction generators can use finance-dates.Calendar exchange calendars so generated dates and timestamps align with actual business days and session hours.

Quick start

from finance_datagen import OrdersGenerator, generate_prices, generate_signal, ohlc_from_close

closes = generate_prices(symbol="ACME", seed=0)
bars   = ohlc_from_close(closes["price"], symbol="ACME", seed=0)
signal = generate_signal(n_dates=20, n_assets=50, seed=0)
orders = OrdersGenerator(n_dates=3, n_assets=5, orders_per_day=10, exchange="XNYS", currency="USD", seed=0).generate()

See the Data page for model math, parameter ranges, and output schemas, and the API page for a complete function-level reference.

Architecture

The Rust core (rust/src/) is polars-free: every generator builds an arrow_array::RecordBatch and returns it through the Arrow C Data Interface PyCapsule via pyo3-arrow. The Python wrappers call polars.from_arrow(batch) on the receiving end. This keeps the polars-rs and polars-py codebases on opposite sides of a stable ABI boundary, avoiding the binary-incompatibility issues that come with linking polars from both Rust and CPython.