API

finance-datagen exposes the following public symbols at the top of the package:

from finance_datagen import (
    DataGenerator,
    # Rust-backed price-path generators
    GBMGenerator,
    HestonGenerator,
    GARCHGenerator,
    ohlc_from_close,
    generate_prices,
    generate_gbm,
    generate_heston,
    generate_garch,
    # Python cross-sectional generators
    SignalGenerator,
    FactorLoadingsGenerator,
    BenchmarkGenerator,
    generate_signal,
    generate_factor_loadings,
    generate_benchmark,
    # Python portfolio, market, and risk-model generators
    PositionsGenerator,
    TransactionsGenerator,
    OrdersGenerator,
    ExecutionsGenerator,
    MultiAssetGBMGenerator,
    RegimeSwitchingGenerator,
    MarketImpactCurveGenerator,
    StatisticalRiskModelGenerator,
    FundamentalRiskModelGenerator,
    FactorCovarianceGenerator,
    SpecificVarianceGenerator,
    generate_positions,
    generate_transactions,
    generate_orders,
    generate_executions,
    generate_multi_asset_gbm,
    generate_regime_switching,
    generate_market_impact_curve,
    generate_statistical_risk_model,
    generate_fundamental_risk_model,
    generate_factor_covariance,
    generate_specific_variance,
)

Each generator class inherits from DataGenerator, a pydantic base model. Instantiate with typed parameters, then call .generate() to obtain the synthetic output. next(generator) is also supported as a one-shot iterator convenience. The generate_* functions are thin wrappers that instantiate the matching model for validation and return .generate().

For the precise math, parameter ranges, and output schemas of every model, see the Data page.


Quick start

from finance_datagen import generate_prices, ohlc_from_close

# 1 year of daily log-normal closes, deterministic.
prices = generate_prices(symbol="ACME", seed=0)

# Synthesize OHLCV bars around the closes.
bars = ohlc_from_close(prices["price"], symbol="ACME", seed=0)

print(bars.head())

Reference

class finance_datagen.DataGenerator[source]

Bases: BaseModel, Generic[OutputT], ABC

Pydantic base class for table-generating models.

abstractmethod generate() OutputT[source]

Generate and return the synthetic dataset.

class finance_datagen.GBMGenerator(*, s0: Annotated[float, Gt(gt=0)] = 100.0, mu: float = 0.05, sigma: Annotated[float, Ge(ge=0)] = 0.2, dt: Annotated[float, Gt(gt=0)] = 0.003968253968253968, n_steps: Annotated[int, Gt(gt=0)] = 252, symbol: str = 'SYM', start_ms: int = 0, step_ms: int = 86400000, seed: int | None = None, currency: str | None = None, exchange: str | None = None, include_region: bool = False, instrument_type: str | None = None, market_type: str | None = None, venue_type: str | None = None)[source]

Bases: DataGenerator[DataFrame]

Geometric Brownian Motion price generator.

Discretizes the SDE \(dS_t = \mu S_t\, dt + \sigma S_t\, dW_t\) exactly in log-space. Returns a polars DataFrame with columns [timestamp, symbol, price] of length n_steps + 1.

generate() DataFrame[source]

Simulate the path and return it as a polars DataFrame.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class finance_datagen.HestonGenerator(*, s0: Annotated[float, Gt(gt=0)] = 100.0, v0: Annotated[float, Ge(ge=0)] = 0.04, mu: float = 0.05, kappa: Annotated[float, Ge(ge=0)] = 2.0, theta: Annotated[float, Ge(ge=0)] = 0.04, xi: Annotated[float, Ge(ge=0)] = 0.3, rho: Annotated[float, Ge(ge=-1.0), Le(le=1.0)] = -0.7, dt: Annotated[float, Gt(gt=0)] = 0.003968253968253968, n_steps: Annotated[int, Gt(gt=0)] = 252, symbol: str = 'SYM', start_ms: int = 0, step_ms: int = 86400000, seed: int | None = None, currency: str | None = None, exchange: str | None = None, include_region: bool = False, instrument_type: str | None = None, market_type: str | None = None, venue_type: str | None = None)[source]

Bases: DataGenerator[DataFrame]

Heston stochastic-volatility price generator.

generate() DataFrame[source]

Simulate the path and return it as a polars DataFrame.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class finance_datagen.GARCHGenerator(*, s0: Annotated[float, Gt(gt=0)] = 100.0, mu: float = 0.0, omega: Annotated[float, Ge(ge=0)] = 1e-06, alpha: Annotated[float, Ge(ge=0)] = 0.05, beta: Annotated[float, Ge(ge=0)] = 0.9, n_steps: Annotated[int, Gt(gt=0)] = 252, symbol: str = 'SYM', start_ms: int = 0, step_ms: int = 86400000, seed: int | None = None, currency: str | None = None, exchange: str | None = None, include_region: bool = False, instrument_type: str | None = None, market_type: str | None = None, venue_type: str | None = None)[source]

Bases: DataGenerator[DataFrame]

GARCH(1,1) discrete-time return generator.

generate() DataFrame[source]

Simulate the path and return it as a polars DataFrame.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

finance_datagen.ohlc_from_close(close, intrabar_vol: float = 0.005, base_volume: float = 1000000.0, vol_factor: float = 50000000.0, symbol: str = 'SYM', start_ms: int = 0, step_ms: int = 86400000, seed: int | None = None) DataFrame[source]

Construct an OHLCV bar series from a close-price series.

finance_datagen.generate_prices(s0: float = 100.0, mu: float = 0.05, sigma: float = 0.2, dt: float = 0.003968253968253968, n_steps: int = 252, symbol: str = 'SYM', start_ms: int = 0, step_ms: int = 86400000, seed: int | None = None, currency: str | None = None, exchange: str | None = None, include_region: bool = False, instrument_type: str | None = None, market_type: str | None = None, venue_type: str | None = None) DataFrame

Generate a synthetic price path using Geometric Brownian Motion.

finance_datagen.generate_gbm(s0: float = 100.0, mu: float = 0.05, sigma: float = 0.2, dt: float = 0.003968253968253968, n_steps: int = 252, symbol: str = 'SYM', start_ms: int = 0, step_ms: int = 86400000, seed: int | None = None, currency: str | None = None, exchange: str | None = None, include_region: bool = False, instrument_type: str | None = None, market_type: str | None = None, venue_type: str | None = None) DataFrame[source]

Generate a synthetic price path using Geometric Brownian Motion.

finance_datagen.generate_heston(s0: float = 100.0, v0: float = 0.04, mu: float = 0.05, kappa: float = 2.0, theta: float = 0.04, xi: float = 0.3, rho: float = -0.7, dt: float = 0.003968253968253968, n_steps: int = 252, symbol: str = 'SYM', start_ms: int = 0, step_ms: int = 86400000, seed: int | None = None, currency: str | None = None, exchange: str | None = None, include_region: bool = False, instrument_type: str | None = None, market_type: str | None = None, venue_type: str | None = None) DataFrame[source]

Generate a Heston price path.

finance_datagen.generate_garch(s0: float = 100.0, mu: float = 0.0, omega: float = 1e-06, alpha: float = 0.05, beta: float = 0.9, n_steps: int = 252, symbol: str = 'SYM', start_ms: int = 0, step_ms: int = 86400000, seed: int | None = None, currency: str | None = None, exchange: str | None = None, include_region: bool = False, instrument_type: str | None = None, market_type: str | None = None, venue_type: str | None = None) DataFrame[source]

Generate a GARCH price and return path.

class finance_datagen.SignalGenerator(*, n_dates: Annotated[int, Gt(gt=0)] = 252, n_assets: Annotated[int, Gt(gt=1)] = 50, ic: Annotated[float, Gt(gt=-1.0), Lt(lt=1.0)] = 0.05, return_vol: Annotated[float, Gt(gt=0)] = 0.02, seed: int | None = None, start: date | None = None, symbols: tuple[str, ...] | None = None)[source]

Bases: DataGenerator[DataFrame]

Generate a long-form signal and forward-return panel.

generate() DataFrame[source]

Return [date, symbol, signal, fwd_returns].

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class finance_datagen.FactorLoadingsGenerator(*, n_assets: Annotated[int, Gt(gt=1)] = 50, factors: tuple[str, ...] = ('market', 'value', 'momentum', 'size', 'quality'), seed: int | None = None, symbols: tuple[str, ...] | None = None)[source]

Bases: DataGenerator[DataFrame]

Generate Barra-style factor loadings.

generate() DataFrame[source]

Return symbol plus one column per factor.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class finance_datagen.BenchmarkGenerator(*, n_dates: Annotated[int, Gt(gt=0)] = 252, annual_return: float = 0.08, annual_vol: Annotated[float, Ge(ge=0)] = 0.16, periods_per_year: Annotated[int, Gt(gt=0)] = 252, seed: int | None = None, start: date | None = None)[source]

Bases: DataGenerator[DataFrame]

Generate an independent Gaussian benchmark return series.

generate() DataFrame[source]

Return [date, benchmark].

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

finance_datagen.generate_signal(n_dates: int = 252, n_assets: int = 50, ic: float = 0.05, return_vol: float = 0.02, seed: int | None = None, start: date | None = None, symbols: Sequence[str] | None = None) DataFrame[source]

Generate a long-form panel [date, symbol, signal, fwd_returns].

finance_datagen.generate_factor_loadings(n_assets: int = 50, factors: Sequence[str] = ('market', 'value', 'momentum', 'size', 'quality'), seed: int | None = None, symbols: Sequence[str] | None = None) DataFrame[source]

Generate Barra-style factor loadings.

finance_datagen.generate_benchmark(n_dates: int = 252, annual_return: float = 0.08, annual_vol: float = 0.16, periods_per_year: int = 252, seed: int | None = None, start: date | None = None) DataFrame[source]

Generate a benchmark return series.

class finance_datagen.PositionsGenerator(*, n_dates: Annotated[int, Gt(gt=0)] = 252, n_assets: Annotated[int, Gt(gt=0)] = 50, portfolio_value: Annotated[float, Gt(gt=0)] = 1000000.0, gross_exposure: Annotated[float, Gt(gt=0)] = 1.0, average_price: Annotated[float, Gt(gt=0)] = 100.0, price_vol: Annotated[float, Ge(ge=0)] = 0.02, seed: int | None = None, start: date | None = None, symbols: tuple[str, ...] | None = None, currency: str | None = None, exchange: str | None = None, include_region: bool = False)[source]

Bases: DataGenerator[DataFrame]

Generate a long-form synthetic positions table.

generate() DataFrame[source]

Return [date, symbol, price, quantity, market_value, weight].

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class finance_datagen.TransactionsGenerator(*, n_dates: Annotated[int, Gt(gt=0)] = 252, n_assets: Annotated[int, Gt(gt=0)] = 50, trades_per_day: Annotated[int, Gt(gt=0)] = 25, average_price: Annotated[float, Gt(gt=0)] = 100.0, price_vol: Annotated[float, Ge(ge=0)] = 0.25, max_amount: Annotated[int, Gt(gt=0)] = 1000, commission: Annotated[float, Ge(ge=0)] = 1.0, fee_bps: Annotated[float, Ge(ge=0)] = 0.2, bps: Annotated[float, Ge(ge=0)] = 5.0, seed: int | None = None, start: date | None = None, symbols: tuple[str, ...] | None = None, currency: str | None = None, exchange: str | None = None, include_region: bool = False)[source]

Bases: DataGenerator[DataFrame]

Generate a synthetic transaction log for post-trade tests.

generate() DataFrame[source]

Return transaction rows with side labels and explicit costs.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class finance_datagen.OrdersGenerator(*, n_dates: Annotated[int, Gt(gt=0)] = 252, n_assets: Annotated[int, Gt(gt=0)] = 50, orders_per_day: Annotated[int, Gt(gt=0)] = 25, average_price: Annotated[float, Gt(gt=0)] = 100.0, price_vol: Annotated[float, Ge(ge=0)] = 0.2, max_quantity: Annotated[int, Gt(gt=0)] = 1000, seed: int | None = None, start: date | None = None, symbols: tuple[str, ...] | None = None, exchange: str | None = None, currency: str | None = None, include_region: bool = False)[source]

Bases: DataGenerator[DataFrame]

Generate enum-backed synthetic order fixtures.

generate() DataFrame[source]

Return [timestamp, symbol, order_id, side, order_type, quantity, limit_price, order_status, time_in_force].

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class finance_datagen.ExecutionsGenerator(*, n_dates: Annotated[int, Gt(gt=0)] = 252, n_assets: Annotated[int, Gt(gt=0)] = 50, executions_per_day: Annotated[int, Gt(gt=0)] = 30, average_price: Annotated[float, Gt(gt=0)] = 100.0, price_vol: Annotated[float, Ge(ge=0)] = 0.2, max_quantity: Annotated[int, Gt(gt=0)] = 1000, seed: int | None = None, start: date | None = None, symbols: tuple[str, ...] | None = None, exchange: str | None = None, currency: str | None = None, include_region: bool = False)[source]

Bases: DataGenerator[DataFrame]

Generate synthetic execution fixtures tied to synthetic orders.

generate() DataFrame[source]

Return [timestamp, order_id, symbol, side, price, quantity, liquidity_flag].

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class finance_datagen.MultiAssetGBMGenerator(*, n_steps: Annotated[int, Gt(gt=0)] = 252, n_assets: Annotated[int, Gt(gt=0)] = 10, s0: float | tuple[float, ...] = 100.0, mu: float | tuple[float, ...] = 0.05, sigma: float | tuple[float, ...] = 0.2, dt: Annotated[float, Gt(gt=0)] = 0.003968253968253968, rho: Annotated[float, Gt(gt=-1.0), Lt(lt=1.0)] = 0.3, corr: tuple[tuple[float, ...], ...] | None = None, symbols: tuple[str, ...] | None = None, start_ms: int = 0, step_ms: int = 86400000, seed: int | None = None, instrument_type: str | None = None, market_type: str | None = None, venue_type: str | None = None)[source]

Bases: DataGenerator[DataFrame]

Generate correlated multi-asset GBM paths in long form.

generate() DataFrame[source]

Return [timestamp, symbol, price, return] in long form.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class finance_datagen.RegimeSwitchingGenerator(*, s0: Annotated[float, Gt(gt=0)] = 100.0, n_steps: Annotated[int, Gt(gt=0)] = 252, transition_matrix: tuple[tuple[float, ...], ...] = ((0.95, 0.05), (0.1, 0.9)), regime_mu: tuple[float, ...] = (0.0004, -0.0003), regime_sigma: tuple[float, ...] = (0.008, 0.025), symbol: str = 'SYM', start_ms: int = 0, step_ms: int = 86400000, seed: int | None = None, instrument_type: str | None = None, market_type: str | None = None, venue_type: str | None = None)[source]

Bases: DataGenerator[DataFrame]

Generate a single price path with Markov switching return regimes.

generate() DataFrame[source]

Return [timestamp, symbol, price, return, regime].

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class finance_datagen.MarketImpactCurveGenerator(*, n_assets: Annotated[int, FieldInfo(annotation=NoneType, required=True, metadata=[Gt(gt=0)])] | None = None, symbols: tuple[str, ...] | None = None, participation_rates: tuple[Annotated[float, FieldInfo(annotation=NoneType, required=True, metadata=[Gt(gt=0)])], ...] = (0.01, 0.042222222222222223, 0.07444444444444444, 0.10666666666666666, 0.1388888888888889, 0.1711111111111111, 0.20333333333333334, 0.23555555555555557, 0.2677777777777778, 0.3), average_adv: Annotated[float, Gt(gt=0)] = 1000000.0, average_volatility: Annotated[float, Gt(gt=0)] = 0.02, temporary_impact_coef: Annotated[float, Ge(ge=0)] = 0.5, permanent_impact_coef: Annotated[float, Ge(ge=0)] = 0.1, seed: int | None = None, market_type: str | None = None, venue_type: str | None = None)[source]

Bases: DataGenerator[DataFrame]

Generate Almgren-Chriss-style impact curves by participation rate.

generate() DataFrame[source]

Return an impact curve for every symbol and participation rate.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class finance_datagen.StatisticalRiskModelGenerator(*, n_dates: Annotated[int, Gt(gt=1)] = 252, n_assets: Annotated[int, Gt(gt=1)] = 50, n_factors: Annotated[int, Gt(gt=0)] = 5, factor_vol: Annotated[float, Gt(gt=0)] = 0.01, idiosyncratic_vol: Annotated[float, Gt(gt=0)] = 0.015, seed: int | None = None, start: date | None = None, symbols: tuple[str, ...] | None = None)[source]

Bases: DataGenerator[dict[str, DataFrame]]

Generate PCA-style statistical factor model components.

generate() dict[str, DataFrame][source]

Return factor loadings, factor returns, and specific variance.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class finance_datagen.FundamentalRiskModelGenerator(*, n_assets: Annotated[int, Gt(gt=1)] = 50, sectors: tuple[str, ...] = ('Energy', 'Materials', 'Industrials', 'ConsumerDiscretionary', 'ConsumerStaples', 'HealthCare', 'Financials', 'InformationTechnology', 'CommunicationServices', 'Utilities', 'RealEstate'), style_factors: tuple[str, ...] = ('value', 'momentum', 'size', 'quality', 'low_vol', 'growth'), seed: int | None = None, symbols: tuple[str, ...] | None = None)[source]

Bases: DataGenerator[DataFrame]

Generate Barra-style sector and style-factor exposure data.

generate() DataFrame[source]

Return wide factor loadings with a positive specific variance.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class finance_datagen.FactorCovarianceGenerator(*, factors: tuple[str, ...] = ('market', 'sector', 'value', 'momentum', 'size', 'quality', 'low_vol', 'growth'), factor_vol: Annotated[float, Gt(gt=0)] = 0.16, eigen_decay: Annotated[float, Gt(gt=0.0), Le(le=1.0)] = 0.75, base_corr: Annotated[float, Gt(gt=-1.0), Lt(lt=1.0)] = 0.25, seed: int | None = None)[source]

Bases: DataGenerator[DataFrame]

Generate a symmetric positive semidefinite factor covariance matrix.

generate() DataFrame[source]

Return a wide covariance matrix with a leading factor column.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class finance_datagen.SpecificVarianceGenerator(*, n_assets: Annotated[int, Gt(gt=0)] = 50, target_vol: Annotated[float, Gt(gt=0)] = 0.25, dispersion: Annotated[float, Ge(ge=0)] = 0.35, seed: int | None = None, symbols: tuple[str, ...] | None = None)[source]

Bases: DataGenerator[DataFrame]

Generate a positive idiosyncratic variance vector.

generate() DataFrame[source]

Return [symbol, specific_variance].

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

finance_datagen.generate_positions(n_dates: int = 252, n_assets: int = 50, portfolio_value: float = 1000000.0, gross_exposure: float = 1.0, average_price: float = 100.0, price_vol: float = 0.02, seed: int | None = None, start: date | None = None, symbols: Sequence[str] | None = None, currency: str | None = None, exchange: str | None = None, include_region: bool = False) DataFrame[source]

Generate a synthetic positions table.

finance_datagen.generate_transactions(n_dates: int = 252, n_assets: int = 50, trades_per_day: int = 25, average_price: float = 100.0, price_vol: float = 0.25, max_amount: int = 1000, commission: float = 1.0, fee_bps: float = 0.2, bps: float = 5.0, seed: int | None = None, start: date | None = None, symbols: Sequence[str] | None = None, currency: str | None = None, exchange: str | None = None, include_region: bool = False) DataFrame[source]

Generate a synthetic transaction log.

finance_datagen.generate_orders(n_dates: int = 252, n_assets: int = 50, orders_per_day: int = 25, average_price: float = 100.0, price_vol: float = 0.2, max_quantity: int = 1000, seed: int | None = None, start: date | None = None, symbols: Sequence[str] | None = None, currency: str | None = None, exchange: str | None = None, include_region: bool = False) DataFrame[source]

Generate synthetic order fixtures.

finance_datagen.generate_executions(n_dates: int = 252, n_assets: int = 50, executions_per_day: int = 30, average_price: float = 100.0, price_vol: float = 0.2, max_quantity: int = 1000, seed: int | None = None, start: date | None = None, symbols: Sequence[str] | None = None, currency: str | None = None, exchange: str | None = None, include_region: bool = False) DataFrame[source]

Generate synthetic execution fixtures.

finance_datagen.generate_multi_asset_gbm(n_steps: int = 252, n_assets: int = 10, s0: float | Sequence[float] = 100.0, mu: float | Sequence[float] = 0.05, sigma: float | Sequence[float] = 0.2, dt: float = 0.003968253968253968, rho: float = 0.3, corr: Sequence[Sequence[float]] | None = None, symbols: Sequence[str] | None = None, start_ms: int = 0, step_ms: int = 86400000, seed: int | None = None, instrument_type: str | None = None, market_type: str | None = None, venue_type: str | None = None) DataFrame[source]

Generate correlated multi-asset GBM paths in long form.

finance_datagen.generate_regime_switching(s0: float = 100.0, n_steps: int = 252, transition_matrix: Sequence[Sequence[float]] = ((0.95, 0.05), (0.1, 0.9)), regime_mu: Sequence[float] = (0.0004, -0.0003), regime_sigma: Sequence[float] = (0.008, 0.025), symbol: str = 'SYM', start_ms: int = 0, step_ms: int = 86400000, seed: int | None = None, instrument_type: str | None = None, market_type: str | None = None, venue_type: str | None = None) DataFrame[source]

Generate a single price path with Markov switching return regimes.

finance_datagen.generate_market_impact_curve(n_assets: int | None = None, symbols: Sequence[str] | None = None, participation_rates: Sequence[float] = (0.01, 0.042222222222222223, 0.07444444444444444, 0.10666666666666666, 0.1388888888888889, 0.1711111111111111, 0.20333333333333334, 0.23555555555555557, 0.2677777777777778, 0.3), average_adv: float = 1000000.0, average_volatility: float = 0.02, temporary_impact_coef: float = 0.5, permanent_impact_coef: float = 0.1, seed: int | None = None, market_type: str | None = None, venue_type: str | None = None) DataFrame[source]

Generate market-impact curves by participation rate.

finance_datagen.generate_statistical_risk_model(n_dates: int = 252, n_assets: int = 50, n_factors: int = 5, factor_vol: float = 0.01, idiosyncratic_vol: float = 0.015, seed: int | None = None, start: date | None = None, symbols: Sequence[str] | None = None) dict[str, DataFrame][source]

Generate PCA-style statistical factor model components.

finance_datagen.generate_fundamental_risk_model(n_assets: int = 50, sectors: Sequence[str] = ('Energy', 'Materials', 'Industrials', 'ConsumerDiscretionary', 'ConsumerStaples', 'HealthCare', 'Financials', 'InformationTechnology', 'CommunicationServices', 'Utilities', 'RealEstate'), style_factors: Sequence[str] = ('value', 'momentum', 'size', 'quality', 'low_vol', 'growth'), seed: int | None = None, symbols: Sequence[str] | None = None) DataFrame[source]

Generate Barra-style sector and style-factor exposure data.

finance_datagen.generate_factor_covariance(factors: Sequence[str] = ('market', 'sector', 'value', 'momentum', 'size', 'quality', 'low_vol', 'growth'), factor_vol: float = 0.16, eigen_decay: float = 0.75, base_corr: float = 0.25, seed: int | None = None) DataFrame[source]

Generate a symmetric positive semidefinite factor covariance matrix.

finance_datagen.generate_specific_variance(n_assets: int = 50, target_vol: float = 0.25, dispersion: float = 0.35, seed: int | None = None, symbols: Sequence[str] | None = None) DataFrame[source]

Generate a positive idiosyncratic variance vector.

Post-trade fixtures

from finance_datagen import (
    ExecutionsGenerator,
    OrdersGenerator,
    PositionsGenerator,
    TransactionsGenerator,
    generate_executions,
    generate_orders,
    generate_positions,
    generate_transactions,
)

positions = PositionsGenerator(n_dates=20, n_assets=50, seed=0).generate()
transactions = TransactionsGenerator(n_dates=20, n_assets=50, seed=0).generate()
orders = OrdersGenerator(n_dates=20, n_assets=50, seed=0).generate()
executions = ExecutionsGenerator(n_dates=20, n_assets=50, seed=0).generate()

# Equivalent convenience wrappers are available:
positions = generate_positions(n_dates=20, n_assets=50, seed=0)
transactions = generate_transactions(n_dates=20, n_assets=50, seed=0)
orders = generate_orders(n_dates=20, n_assets=50, seed=0)
executions = generate_executions(n_dates=20, n_assets=50, seed=0)

Risk-model fixtures

from finance_datagen import StatisticalRiskModelGenerator

model = StatisticalRiskModelGenerator(n_dates=252, n_assets=100, n_factors=5, seed=0).generate()
factor_loadings = model["factor_loadings"]
factor_returns = model["factor_returns"]
specific_variance = model["specific_variance"]

Recipes

Composing models

ohlc_from_close is generator-agnostic — feed it any close series.

from finance_datagen import HestonGenerator, ohlc_from_close

px = HestonGenerator(seed=42).generate()
bars = ohlc_from_close(px["price"], symbol="HEST", seed=42)

Long horizons

# 10 years of daily data
GBMGenerator(n_steps=252 * 10, seed=0).generate()

Custom timestamp grids

step_ms and start_ms are independent of dt, so you can produce a high-frequency timestamp grid for visual inspection while keeping the SDE on a daily scale:

GBMGenerator(
    n_steps=1000,
    dt=1/252,                # daily-scale variance
    start_ms=1_700_000_000_000,
    step_ms=60_000,          # 1-minute timestamps
    seed=0,
).generate()

Bypassing the polars wrapper

If you need the raw pyarrow.RecordBatch (e.g. to write Parquet without round-tripping through polars):

from finance_datagen.finance_datagen import GBMGenerator as RustGBM
import pyarrow.parquet as pq
import pyarrow as pa

batch = RustGBM(seed=0).record_batch()
pq.write_table(pa.Table.from_batches([batch]), "gbm.parquet")

Versioning

The current version is exposed at finance_datagen.__version__. The public API is the symbols listed above; private members (_inner, _rb_to_polars, finance_datagen.finance_datagen.*) are not part of the SemVer contract and may change between releases.