API¶
finance-datagen exposes the following public symbols at the top of
the package:
from finance_datagen import (
DataGenerator,
# Rust-backed price-path generators
GBMGenerator,
HestonGenerator,
GARCHGenerator,
ohlc_from_close,
generate_prices,
generate_gbm,
generate_heston,
generate_garch,
# Python cross-sectional generators
SignalGenerator,
FactorLoadingsGenerator,
BenchmarkGenerator,
generate_signal,
generate_factor_loadings,
generate_benchmark,
# Python portfolio, market, and risk-model generators
PositionsGenerator,
TransactionsGenerator,
OrdersGenerator,
ExecutionsGenerator,
MultiAssetGBMGenerator,
RegimeSwitchingGenerator,
MarketImpactCurveGenerator,
StatisticalRiskModelGenerator,
FundamentalRiskModelGenerator,
FactorCovarianceGenerator,
SpecificVarianceGenerator,
generate_positions,
generate_transactions,
generate_orders,
generate_executions,
generate_multi_asset_gbm,
generate_regime_switching,
generate_market_impact_curve,
generate_statistical_risk_model,
generate_fundamental_risk_model,
generate_factor_covariance,
generate_specific_variance,
)
Each generator class inherits from DataGenerator, a pydantic base
model. Instantiate with typed parameters, then call .generate() to
obtain the synthetic output. next(generator) is also supported as a
one-shot iterator convenience. The generate_* functions are thin
wrappers that instantiate the matching model for validation and return
.generate().
For the precise math, parameter ranges, and output schemas of every model, see the Data page.
Quick start¶
from finance_datagen import generate_prices, ohlc_from_close
# 1 year of daily log-normal closes, deterministic.
prices = generate_prices(symbol="ACME", seed=0)
# Synthesize OHLCV bars around the closes.
bars = ohlc_from_close(prices["price"], symbol="ACME", seed=0)
print(bars.head())
Reference¶
- class finance_datagen.DataGenerator[source]¶
Bases:
BaseModel,Generic[OutputT],ABCPydantic base class for table-generating models.
- class finance_datagen.GBMGenerator(*, s0: Annotated[float, Gt(gt=0)] = 100.0, mu: float = 0.05, sigma: Annotated[float, Ge(ge=0)] = 0.2, dt: Annotated[float, Gt(gt=0)] = 0.003968253968253968, n_steps: Annotated[int, Gt(gt=0)] = 252, symbol: str = 'SYM', start_ms: int = 0, step_ms: int = 86400000, seed: int | None = None, currency: str | None = None, exchange: str | None = None, include_region: bool = False, instrument_type: str | None = None, market_type: str | None = None, venue_type: str | None = None)[source]¶
Bases:
DataGenerator[DataFrame]Geometric Brownian Motion price generator.
Discretizes the SDE \(dS_t = \mu S_t\, dt + \sigma S_t\, dW_t\) exactly in log-space. Returns a polars
DataFramewith columns[timestamp, symbol, price]of lengthn_steps + 1.- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class finance_datagen.HestonGenerator(*, s0: Annotated[float, Gt(gt=0)] = 100.0, v0: Annotated[float, Ge(ge=0)] = 0.04, mu: float = 0.05, kappa: Annotated[float, Ge(ge=0)] = 2.0, theta: Annotated[float, Ge(ge=0)] = 0.04, xi: Annotated[float, Ge(ge=0)] = 0.3, rho: Annotated[float, Ge(ge=-1.0), Le(le=1.0)] = -0.7, dt: Annotated[float, Gt(gt=0)] = 0.003968253968253968, n_steps: Annotated[int, Gt(gt=0)] = 252, symbol: str = 'SYM', start_ms: int = 0, step_ms: int = 86400000, seed: int | None = None, currency: str | None = None, exchange: str | None = None, include_region: bool = False, instrument_type: str | None = None, market_type: str | None = None, venue_type: str | None = None)[source]¶
Bases:
DataGenerator[DataFrame]Heston stochastic-volatility price generator.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class finance_datagen.GARCHGenerator(*, s0: Annotated[float, Gt(gt=0)] = 100.0, mu: float = 0.0, omega: Annotated[float, Ge(ge=0)] = 1e-06, alpha: Annotated[float, Ge(ge=0)] = 0.05, beta: Annotated[float, Ge(ge=0)] = 0.9, n_steps: Annotated[int, Gt(gt=0)] = 252, symbol: str = 'SYM', start_ms: int = 0, step_ms: int = 86400000, seed: int | None = None, currency: str | None = None, exchange: str | None = None, include_region: bool = False, instrument_type: str | None = None, market_type: str | None = None, venue_type: str | None = None)[source]¶
Bases:
DataGenerator[DataFrame]GARCH(1,1) discrete-time return generator.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- finance_datagen.ohlc_from_close(close, intrabar_vol: float = 0.005, base_volume: float = 1000000.0, vol_factor: float = 50000000.0, symbol: str = 'SYM', start_ms: int = 0, step_ms: int = 86400000, seed: int | None = None) DataFrame[source]¶
Construct an OHLCV bar series from a close-price series.
- finance_datagen.generate_prices(s0: float = 100.0, mu: float = 0.05, sigma: float = 0.2, dt: float = 0.003968253968253968, n_steps: int = 252, symbol: str = 'SYM', start_ms: int = 0, step_ms: int = 86400000, seed: int | None = None, currency: str | None = None, exchange: str | None = None, include_region: bool = False, instrument_type: str | None = None, market_type: str | None = None, venue_type: str | None = None) DataFrame¶
Generate a synthetic price path using Geometric Brownian Motion.
- finance_datagen.generate_gbm(s0: float = 100.0, mu: float = 0.05, sigma: float = 0.2, dt: float = 0.003968253968253968, n_steps: int = 252, symbol: str = 'SYM', start_ms: int = 0, step_ms: int = 86400000, seed: int | None = None, currency: str | None = None, exchange: str | None = None, include_region: bool = False, instrument_type: str | None = None, market_type: str | None = None, venue_type: str | None = None) DataFrame[source]¶
Generate a synthetic price path using Geometric Brownian Motion.
- finance_datagen.generate_heston(s0: float = 100.0, v0: float = 0.04, mu: float = 0.05, kappa: float = 2.0, theta: float = 0.04, xi: float = 0.3, rho: float = -0.7, dt: float = 0.003968253968253968, n_steps: int = 252, symbol: str = 'SYM', start_ms: int = 0, step_ms: int = 86400000, seed: int | None = None, currency: str | None = None, exchange: str | None = None, include_region: bool = False, instrument_type: str | None = None, market_type: str | None = None, venue_type: str | None = None) DataFrame[source]¶
Generate a Heston price path.
- finance_datagen.generate_garch(s0: float = 100.0, mu: float = 0.0, omega: float = 1e-06, alpha: float = 0.05, beta: float = 0.9, n_steps: int = 252, symbol: str = 'SYM', start_ms: int = 0, step_ms: int = 86400000, seed: int | None = None, currency: str | None = None, exchange: str | None = None, include_region: bool = False, instrument_type: str | None = None, market_type: str | None = None, venue_type: str | None = None) DataFrame[source]¶
Generate a GARCH price and return path.
- class finance_datagen.SignalGenerator(*, n_dates: Annotated[int, Gt(gt=0)] = 252, n_assets: Annotated[int, Gt(gt=1)] = 50, ic: Annotated[float, Gt(gt=-1.0), Lt(lt=1.0)] = 0.05, return_vol: Annotated[float, Gt(gt=0)] = 0.02, seed: int | None = None, start: date | None = None, symbols: tuple[str, ...] | None = None)[source]¶
Bases:
DataGenerator[DataFrame]Generate a long-form signal and forward-return panel.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class finance_datagen.FactorLoadingsGenerator(*, n_assets: Annotated[int, Gt(gt=1)] = 50, factors: tuple[str, ...] = ('market', 'value', 'momentum', 'size', 'quality'), seed: int | None = None, symbols: tuple[str, ...] | None = None)[source]¶
Bases:
DataGenerator[DataFrame]Generate Barra-style factor loadings.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class finance_datagen.BenchmarkGenerator(*, n_dates: Annotated[int, Gt(gt=0)] = 252, annual_return: float = 0.08, annual_vol: Annotated[float, Ge(ge=0)] = 0.16, periods_per_year: Annotated[int, Gt(gt=0)] = 252, seed: int | None = None, start: date | None = None)[source]¶
Bases:
DataGenerator[DataFrame]Generate an independent Gaussian benchmark return series.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- finance_datagen.generate_signal(n_dates: int = 252, n_assets: int = 50, ic: float = 0.05, return_vol: float = 0.02, seed: int | None = None, start: date | None = None, symbols: Sequence[str] | None = None) DataFrame[source]¶
Generate a long-form panel
[date, symbol, signal, fwd_returns].
- finance_datagen.generate_factor_loadings(n_assets: int = 50, factors: Sequence[str] = ('market', 'value', 'momentum', 'size', 'quality'), seed: int | None = None, symbols: Sequence[str] | None = None) DataFrame[source]¶
Generate Barra-style factor loadings.
- finance_datagen.generate_benchmark(n_dates: int = 252, annual_return: float = 0.08, annual_vol: float = 0.16, periods_per_year: int = 252, seed: int | None = None, start: date | None = None) DataFrame[source]¶
Generate a benchmark return series.
- class finance_datagen.PositionsGenerator(*, n_dates: Annotated[int, Gt(gt=0)] = 252, n_assets: Annotated[int, Gt(gt=0)] = 50, portfolio_value: Annotated[float, Gt(gt=0)] = 1000000.0, gross_exposure: Annotated[float, Gt(gt=0)] = 1.0, average_price: Annotated[float, Gt(gt=0)] = 100.0, price_vol: Annotated[float, Ge(ge=0)] = 0.02, seed: int | None = None, start: date | None = None, symbols: tuple[str, ...] | None = None, currency: str | None = None, exchange: str | None = None, include_region: bool = False)[source]¶
Bases:
DataGenerator[DataFrame]Generate a long-form synthetic positions table.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class finance_datagen.TransactionsGenerator(*, n_dates: Annotated[int, Gt(gt=0)] = 252, n_assets: Annotated[int, Gt(gt=0)] = 50, trades_per_day: Annotated[int, Gt(gt=0)] = 25, average_price: Annotated[float, Gt(gt=0)] = 100.0, price_vol: Annotated[float, Ge(ge=0)] = 0.25, max_amount: Annotated[int, Gt(gt=0)] = 1000, commission: Annotated[float, Ge(ge=0)] = 1.0, fee_bps: Annotated[float, Ge(ge=0)] = 0.2, bps: Annotated[float, Ge(ge=0)] = 5.0, seed: int | None = None, start: date | None = None, symbols: tuple[str, ...] | None = None, currency: str | None = None, exchange: str | None = None, include_region: bool = False)[source]¶
Bases:
DataGenerator[DataFrame]Generate a synthetic transaction log for post-trade tests.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class finance_datagen.OrdersGenerator(*, n_dates: Annotated[int, Gt(gt=0)] = 252, n_assets: Annotated[int, Gt(gt=0)] = 50, orders_per_day: Annotated[int, Gt(gt=0)] = 25, average_price: Annotated[float, Gt(gt=0)] = 100.0, price_vol: Annotated[float, Ge(ge=0)] = 0.2, max_quantity: Annotated[int, Gt(gt=0)] = 1000, seed: int | None = None, start: date | None = None, symbols: tuple[str, ...] | None = None, exchange: str | None = None, currency: str | None = None, include_region: bool = False)[source]¶
Bases:
DataGenerator[DataFrame]Generate enum-backed synthetic order fixtures.
- generate() DataFrame[source]¶
Return
[timestamp, symbol, order_id, side, order_type, quantity, limit_price, order_status, time_in_force].
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class finance_datagen.ExecutionsGenerator(*, n_dates: Annotated[int, Gt(gt=0)] = 252, n_assets: Annotated[int, Gt(gt=0)] = 50, executions_per_day: Annotated[int, Gt(gt=0)] = 30, average_price: Annotated[float, Gt(gt=0)] = 100.0, price_vol: Annotated[float, Ge(ge=0)] = 0.2, max_quantity: Annotated[int, Gt(gt=0)] = 1000, seed: int | None = None, start: date | None = None, symbols: tuple[str, ...] | None = None, exchange: str | None = None, currency: str | None = None, include_region: bool = False)[source]¶
Bases:
DataGenerator[DataFrame]Generate synthetic execution fixtures tied to synthetic orders.
- generate() DataFrame[source]¶
Return
[timestamp, order_id, symbol, side, price, quantity, liquidity_flag].
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class finance_datagen.MultiAssetGBMGenerator(*, n_steps: Annotated[int, Gt(gt=0)] = 252, n_assets: Annotated[int, Gt(gt=0)] = 10, s0: float | tuple[float, ...] = 100.0, mu: float | tuple[float, ...] = 0.05, sigma: float | tuple[float, ...] = 0.2, dt: Annotated[float, Gt(gt=0)] = 0.003968253968253968, rho: Annotated[float, Gt(gt=-1.0), Lt(lt=1.0)] = 0.3, corr: tuple[tuple[float, ...], ...] | None = None, symbols: tuple[str, ...] | None = None, start_ms: int = 0, step_ms: int = 86400000, seed: int | None = None, instrument_type: str | None = None, market_type: str | None = None, venue_type: str | None = None)[source]¶
Bases:
DataGenerator[DataFrame]Generate correlated multi-asset GBM paths in long form.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class finance_datagen.RegimeSwitchingGenerator(*, s0: Annotated[float, Gt(gt=0)] = 100.0, n_steps: Annotated[int, Gt(gt=0)] = 252, transition_matrix: tuple[tuple[float, ...], ...] = ((0.95, 0.05), (0.1, 0.9)), regime_mu: tuple[float, ...] = (0.0004, -0.0003), regime_sigma: tuple[float, ...] = (0.008, 0.025), symbol: str = 'SYM', start_ms: int = 0, step_ms: int = 86400000, seed: int | None = None, instrument_type: str | None = None, market_type: str | None = None, venue_type: str | None = None)[source]¶
Bases:
DataGenerator[DataFrame]Generate a single price path with Markov switching return regimes.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class finance_datagen.MarketImpactCurveGenerator(*, n_assets: Annotated[int, FieldInfo(annotation=NoneType, required=True, metadata=[Gt(gt=0)])] | None = None, symbols: tuple[str, ...] | None = None, participation_rates: tuple[Annotated[float, FieldInfo(annotation=NoneType, required=True, metadata=[Gt(gt=0)])], ...] = (0.01, 0.042222222222222223, 0.07444444444444444, 0.10666666666666666, 0.1388888888888889, 0.1711111111111111, 0.20333333333333334, 0.23555555555555557, 0.2677777777777778, 0.3), average_adv: Annotated[float, Gt(gt=0)] = 1000000.0, average_volatility: Annotated[float, Gt(gt=0)] = 0.02, temporary_impact_coef: Annotated[float, Ge(ge=0)] = 0.5, permanent_impact_coef: Annotated[float, Ge(ge=0)] = 0.1, seed: int | None = None, market_type: str | None = None, venue_type: str | None = None)[source]¶
Bases:
DataGenerator[DataFrame]Generate Almgren-Chriss-style impact curves by participation rate.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class finance_datagen.StatisticalRiskModelGenerator(*, n_dates: Annotated[int, Gt(gt=1)] = 252, n_assets: Annotated[int, Gt(gt=1)] = 50, n_factors: Annotated[int, Gt(gt=0)] = 5, factor_vol: Annotated[float, Gt(gt=0)] = 0.01, idiosyncratic_vol: Annotated[float, Gt(gt=0)] = 0.015, seed: int | None = None, start: date | None = None, symbols: tuple[str, ...] | None = None)[source]¶
Bases:
DataGenerator[dict[str, DataFrame]]Generate PCA-style statistical factor model components.
- generate() dict[str, DataFrame][source]¶
Return factor loadings, factor returns, and specific variance.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class finance_datagen.FundamentalRiskModelGenerator(*, n_assets: Annotated[int, Gt(gt=1)] = 50, sectors: tuple[str, ...] = ('Energy', 'Materials', 'Industrials', 'ConsumerDiscretionary', 'ConsumerStaples', 'HealthCare', 'Financials', 'InformationTechnology', 'CommunicationServices', 'Utilities', 'RealEstate'), style_factors: tuple[str, ...] = ('value', 'momentum', 'size', 'quality', 'low_vol', 'growth'), seed: int | None = None, symbols: tuple[str, ...] | None = None)[source]¶
Bases:
DataGenerator[DataFrame]Generate Barra-style sector and style-factor exposure data.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class finance_datagen.FactorCovarianceGenerator(*, factors: tuple[str, ...] = ('market', 'sector', 'value', 'momentum', 'size', 'quality', 'low_vol', 'growth'), factor_vol: Annotated[float, Gt(gt=0)] = 0.16, eigen_decay: Annotated[float, Gt(gt=0.0), Le(le=1.0)] = 0.75, base_corr: Annotated[float, Gt(gt=-1.0), Lt(lt=1.0)] = 0.25, seed: int | None = None)[source]¶
Bases:
DataGenerator[DataFrame]Generate a symmetric positive semidefinite factor covariance matrix.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class finance_datagen.SpecificVarianceGenerator(*, n_assets: Annotated[int, Gt(gt=0)] = 50, target_vol: Annotated[float, Gt(gt=0)] = 0.25, dispersion: Annotated[float, Ge(ge=0)] = 0.35, seed: int | None = None, symbols: tuple[str, ...] | None = None)[source]¶
Bases:
DataGenerator[DataFrame]Generate a positive idiosyncratic variance vector.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- finance_datagen.generate_positions(n_dates: int = 252, n_assets: int = 50, portfolio_value: float = 1000000.0, gross_exposure: float = 1.0, average_price: float = 100.0, price_vol: float = 0.02, seed: int | None = None, start: date | None = None, symbols: Sequence[str] | None = None, currency: str | None = None, exchange: str | None = None, include_region: bool = False) DataFrame[source]¶
Generate a synthetic positions table.
- finance_datagen.generate_transactions(n_dates: int = 252, n_assets: int = 50, trades_per_day: int = 25, average_price: float = 100.0, price_vol: float = 0.25, max_amount: int = 1000, commission: float = 1.0, fee_bps: float = 0.2, bps: float = 5.0, seed: int | None = None, start: date | None = None, symbols: Sequence[str] | None = None, currency: str | None = None, exchange: str | None = None, include_region: bool = False) DataFrame[source]¶
Generate a synthetic transaction log.
- finance_datagen.generate_orders(n_dates: int = 252, n_assets: int = 50, orders_per_day: int = 25, average_price: float = 100.0, price_vol: float = 0.2, max_quantity: int = 1000, seed: int | None = None, start: date | None = None, symbols: Sequence[str] | None = None, currency: str | None = None, exchange: str | None = None, include_region: bool = False) DataFrame[source]¶
Generate synthetic order fixtures.
- finance_datagen.generate_executions(n_dates: int = 252, n_assets: int = 50, executions_per_day: int = 30, average_price: float = 100.0, price_vol: float = 0.2, max_quantity: int = 1000, seed: int | None = None, start: date | None = None, symbols: Sequence[str] | None = None, currency: str | None = None, exchange: str | None = None, include_region: bool = False) DataFrame[source]¶
Generate synthetic execution fixtures.
- finance_datagen.generate_multi_asset_gbm(n_steps: int = 252, n_assets: int = 10, s0: float | Sequence[float] = 100.0, mu: float | Sequence[float] = 0.05, sigma: float | Sequence[float] = 0.2, dt: float = 0.003968253968253968, rho: float = 0.3, corr: Sequence[Sequence[float]] | None = None, symbols: Sequence[str] | None = None, start_ms: int = 0, step_ms: int = 86400000, seed: int | None = None, instrument_type: str | None = None, market_type: str | None = None, venue_type: str | None = None) DataFrame[source]¶
Generate correlated multi-asset GBM paths in long form.
- finance_datagen.generate_regime_switching(s0: float = 100.0, n_steps: int = 252, transition_matrix: Sequence[Sequence[float]] = ((0.95, 0.05), (0.1, 0.9)), regime_mu: Sequence[float] = (0.0004, -0.0003), regime_sigma: Sequence[float] = (0.008, 0.025), symbol: str = 'SYM', start_ms: int = 0, step_ms: int = 86400000, seed: int | None = None, instrument_type: str | None = None, market_type: str | None = None, venue_type: str | None = None) DataFrame[source]¶
Generate a single price path with Markov switching return regimes.
- finance_datagen.generate_market_impact_curve(n_assets: int | None = None, symbols: Sequence[str] | None = None, participation_rates: Sequence[float] = (0.01, 0.042222222222222223, 0.07444444444444444, 0.10666666666666666, 0.1388888888888889, 0.1711111111111111, 0.20333333333333334, 0.23555555555555557, 0.2677777777777778, 0.3), average_adv: float = 1000000.0, average_volatility: float = 0.02, temporary_impact_coef: float = 0.5, permanent_impact_coef: float = 0.1, seed: int | None = None, market_type: str | None = None, venue_type: str | None = None) DataFrame[source]¶
Generate market-impact curves by participation rate.
- finance_datagen.generate_statistical_risk_model(n_dates: int = 252, n_assets: int = 50, n_factors: int = 5, factor_vol: float = 0.01, idiosyncratic_vol: float = 0.015, seed: int | None = None, start: date | None = None, symbols: Sequence[str] | None = None) dict[str, DataFrame][source]¶
Generate PCA-style statistical factor model components.
- finance_datagen.generate_fundamental_risk_model(n_assets: int = 50, sectors: Sequence[str] = ('Energy', 'Materials', 'Industrials', 'ConsumerDiscretionary', 'ConsumerStaples', 'HealthCare', 'Financials', 'InformationTechnology', 'CommunicationServices', 'Utilities', 'RealEstate'), style_factors: Sequence[str] = ('value', 'momentum', 'size', 'quality', 'low_vol', 'growth'), seed: int | None = None, symbols: Sequence[str] | None = None) DataFrame[source]¶
Generate Barra-style sector and style-factor exposure data.
- finance_datagen.generate_factor_covariance(factors: Sequence[str] = ('market', 'sector', 'value', 'momentum', 'size', 'quality', 'low_vol', 'growth'), factor_vol: float = 0.16, eigen_decay: float = 0.75, base_corr: float = 0.25, seed: int | None = None) DataFrame[source]¶
Generate a symmetric positive semidefinite factor covariance matrix.
- finance_datagen.generate_specific_variance(n_assets: int = 50, target_vol: float = 0.25, dispersion: float = 0.35, seed: int | None = None, symbols: Sequence[str] | None = None) DataFrame[source]¶
Generate a positive idiosyncratic variance vector.
Post-trade fixtures¶
from finance_datagen import (
ExecutionsGenerator,
OrdersGenerator,
PositionsGenerator,
TransactionsGenerator,
generate_executions,
generate_orders,
generate_positions,
generate_transactions,
)
positions = PositionsGenerator(n_dates=20, n_assets=50, seed=0).generate()
transactions = TransactionsGenerator(n_dates=20, n_assets=50, seed=0).generate()
orders = OrdersGenerator(n_dates=20, n_assets=50, seed=0).generate()
executions = ExecutionsGenerator(n_dates=20, n_assets=50, seed=0).generate()
# Equivalent convenience wrappers are available:
positions = generate_positions(n_dates=20, n_assets=50, seed=0)
transactions = generate_transactions(n_dates=20, n_assets=50, seed=0)
orders = generate_orders(n_dates=20, n_assets=50, seed=0)
executions = generate_executions(n_dates=20, n_assets=50, seed=0)
Risk-model fixtures¶
from finance_datagen import StatisticalRiskModelGenerator
model = StatisticalRiskModelGenerator(n_dates=252, n_assets=100, n_factors=5, seed=0).generate()
factor_loadings = model["factor_loadings"]
factor_returns = model["factor_returns"]
specific_variance = model["specific_variance"]
Recipes¶
Composing models¶
ohlc_from_close is generator-agnostic — feed it any close series.
from finance_datagen import HestonGenerator, ohlc_from_close
px = HestonGenerator(seed=42).generate()
bars = ohlc_from_close(px["price"], symbol="HEST", seed=42)
Long horizons¶
# 10 years of daily data
GBMGenerator(n_steps=252 * 10, seed=0).generate()
Custom timestamp grids¶
step_ms and start_ms are independent of dt, so you can produce a
high-frequency timestamp grid for visual inspection while keeping the
SDE on a daily scale:
GBMGenerator(
n_steps=1000,
dt=1/252, # daily-scale variance
start_ms=1_700_000_000_000,
step_ms=60_000, # 1-minute timestamps
seed=0,
).generate()
Bypassing the polars wrapper¶
If you need the raw pyarrow.RecordBatch (e.g. to write Parquet
without round-tripping through polars):
from finance_datagen.finance_datagen import GBMGenerator as RustGBM
import pyarrow.parquet as pq
import pyarrow as pa
batch = RustGBM(seed=0).record_batch()
pq.write_table(pa.Table.from_batches([batch]), "gbm.parquet")
Versioning¶
The current version is exposed at finance_datagen.__version__. The
public API is the symbols listed above; private members
(_inner, _rb_to_polars, finance_datagen.finance_datagen.*) are
not part of the SemVer contract and may change between releases.