```{toctree} --- caption: "" maxdepth: 2 hidden: true --- docs/src/DATA.md docs/src/API.md ``` # finance datagen Standard financial data generation [![Build Status](https://github.com/prettygoodcapital/finance-datagen/actions/workflows/build.yaml/badge.svg?branch=main&event=push)](https://github.com/prettygoodcapital/finance-datagen/actions/workflows/build.yaml) [![codecov](https://codecov.io/gh/prettygoodcapital/finance-datagen/branch/main/graph/badge.svg)](https://codecov.io/gh/prettygoodcapital/finance-datagen) [![License](https://img.shields.io/github/license/prettygoodcapital/finance-datagen)](https://github.com/prettygoodcapital/finance-datagen) [![PyPI](https://img.shields.io/pypi/v/finance-datagen.svg)](https://pypi.python.org/pypi/finance-datagen) ## Overview `finance-datagen` produces **synthetic** financial time series for testing, demos, and benchmarking the rest of the `finance-*` stack without relying on real market data. The numerical core is implemented in Rust and emits Apache Arrow `RecordBatch` values; the Python layer wraps each generator so the public API returns `polars.DataFrame` objects. All public generator classes inherit from `DataGenerator`, a pydantic base model that validates typed parameters on construction. Use `.generate()` for the table output, or `next(generator)` for one-shot iterator-style use. Convenience functions such as `generate_prices(...)`, `generate_gbm(...)`, and `generate_signal(...)` instantiate the matching model and return `.generate()`. ### Generators #### Price models (Rust core) | Symbol | Model | Output columns | | ----------------- | ----------------------------------------------------------- | --------------------------------------------------- | | `GBMGenerator` | Geometric Brownian Motion (log-Euler) | `timestamp, symbol, price` | | `HestonGenerator` | Heston (1993) stochastic volatility (full-truncation Euler) | `timestamp, symbol, price, variance` | | `GARCHGenerator` | GARCH(1,1) returns | `timestamp, symbol, price, return, sigma` | | `ohlc_from_close` | OHLCV synthesis from any close series | `timestamp, symbol, open, high, low, close, volume` | Price-path convenience wrappers are also exported as `generate_prices`, `generate_gbm`, `generate_heston`, and `generate_garch`. `generate_prices` is a plain alias for `generate_gbm` for examples and tests that want a model-neutral name. #### Python generators | Symbol | Output | | ------------------------------- | ----------------------------------------------------------------------------------- | | `SignalGenerator` | Long-form `[date, symbol, signal, fwd_returns]` with target Pearson IC | | `FactorLoadingsGenerator` | Wide `[symbol, market, value, momentum, size, quality]` Barra-style loadings | | `BenchmarkGenerator` | `[date, benchmark]` Gaussian benchmark return series | | `PositionsGenerator` | Long-form position panel `[date, symbol, price, quantity, market_value, weight]` | | `TransactionsGenerator` | Transaction log with enum-backed side/position-effect labels and explicit costs | | `OrdersGenerator` | Enum-backed order fixtures with side, order type, status, and time-in-force | | `ExecutionsGenerator` | Enum-backed execution fixtures for simulated fills | | `MultiAssetGBMGenerator` | Correlated multi-asset GBM panel `[timestamp, symbol, price, return]` | | `RegimeSwitchingGenerator` | Markov regime-switching price path `[timestamp, symbol, price, return, regime]` | | `MarketImpactCurveGenerator` | Participation-rate impact curves with temporary, permanent, and total impact in bps | | `StatisticalRiskModelGenerator` | PCA-style factor loadings, factor returns, and specific variance | | `FundamentalRiskModelGenerator` | Barra-style enum-backed sector/style loadings plus specific variance | | `FactorCovarianceGenerator` | Symmetric positive semidefinite factor covariance matrix | | `SpecificVarianceGenerator` | Positive idiosyncratic variance vector | Every Python generator has a matching `generate_*` convenience wrapper, including the legacy `generate_signal`, `generate_factor_loadings`, and `generate_benchmark` functions. All Rust generators accept an optional `seed: int` for bit-reproducible output across platforms (ChaCha8 RNG); the Python generators accept a `seed` for `numpy.random.default_rng`. Portfolio, transaction, order, execution, and market-model generators also support enum-backed metadata columns where applicable, including `currency`, `exchange`, `region`, `instrument_type`, `market_type`, and `venue_type`. Portfolio and transaction generators can use `finance-dates.Calendar` exchange calendars so generated dates and timestamps align with actual business days and session hours. ### Quick start ```python from finance_datagen import OrdersGenerator, generate_prices, generate_signal, ohlc_from_close closes = generate_prices(symbol="ACME", seed=0) bars = ohlc_from_close(closes["price"], symbol="ACME", seed=0) signal = generate_signal(n_dates=20, n_assets=50, seed=0) orders = OrdersGenerator(n_dates=3, n_assets=5, orders_per_day=10, exchange="XNYS", currency="USD", seed=0).generate() ``` See the [Data](docs/src/DATA.md) page for model math, parameter ranges, and output schemas, and the [API](docs/src/API.md) page for a complete function-level reference. ### Architecture The Rust core (`rust/src/`) is **polars-free**: every generator builds an `arrow_array::RecordBatch` and returns it through the [Arrow C Data Interface](https://arrow.apache.org/docs/format/CDataInterface.html) PyCapsule via `pyo3-arrow`. The Python wrappers call `polars.from_arrow(batch)` on the receiving end. This keeps the polars-rs and polars-py codebases on opposite sides of a stable ABI boundary, avoiding the binary-incompatibility issues that come with linking polars from both Rust and CPython.