maxent_disaggregation.shares#

This module provides functions for sampling from Dirichlet and generalized Dirichlet distributions, as well as hybrid approaches, given specified means (shares) and standard deviations (sds) for the shares. It supports maximum entropy Dirichlet sampling, bias correction, and robust handling of edge cases such as missing or partially specified parameters.

Functions#

  • generalized_dirichlet(n, shares, sds):

    Generate random samples from a Generalised Dirichlet distribution with given shares and standard deviations.

  • dirichlet_max_ent(n, shares, **kwargs):

    Generate samples from a Dirichlet distribution with maximum entropy given input shares.

  • sample_shares(n, shares, sds=None, grad_based=False, threshold_shares=0.1, threshold_sd=0.2, **kwargs):

    This is the main function which handles all the different cases and samples from a distribution of shares based on given means and standard deviations, using the appropriate distribution or a hybrid approach depending on the completeness of the input information.

  • hybrid_dirichlet(shares, size=None, sds=None, max_rel_bias=0.10, max_iter_bias_fix=20, max_iter_beta_sampling=1e3, **kwargs):

    Sample shares in the case of partial mean and sd information using a hybrid Dirichlet distribution with iterative bias correction.

  • sample_dirichlet(shares, size=None, gamma_par=None, threshold_dirichlet=0.01, force_nonzero_samples=True, **kwargs):

    Wrapper to sample from a Dirichlet distribution with given shares and gamma concentration parameter, with pragmatic handling of small shape parameters to avoid numerical issues.

  • check_sample_means_and_sds(sample, shares, sds, threshold_shares=0.1, threshold_sd=0.2, suppress_warnings=False):

    Check if the sample means and standard deviations deviate more than the specified thresholds from the specified shares and standard deviations, raising warnings if so.

  • sample_from_beta(n, shares, sds, fix=True, max_iter=1e3):

    Generate random samples from independent Beta distributions with specified means and standard deviations, ensuring that the sum of samples across columns does not exceed 1 for each row.

  • The module is robust to missing or partially specified input parameters, using uniform priors or hybrid approaches as needed.

  • Warnings are raised if input parameters are inconsistent or if generated samples deviate significantly from specified means or standard deviations.

  • The module is intended for probabilistic modeling of compositional data, such as branching ratios or shares that sum to one.

Module Contents#

Functions#

generalized_dirichlet(n, shares, sds[, seed])

Generate random samples from a Generalised Dirichlet distribution

dirichlet_max_ent(n, shares[, seed])

Generate samples from a Dirichlet distribution with maximum entropy.

sample_shares(n, shares[, sds, grad_based, ...])

Samples from a distribution of shares based on given means and standard deviations.

hybrid_dirichlet(shares[, size, sds, max_rel_bias, ...])

Function to sample in the case of partial mean and sd information using a hybrid

sample_dirichlet(shares[, size, gamma_par, ...])

A wrapper function to sample from a Dirichlet distribution with a

check_sample_means_and_sds(sample, shares, sds[, ...])

Check if the sample means and standard deviations deviate more than the specified thresholds

sample_from_beta(n, shares, sds[, fix, max_iter, seed])

Generate random samples from independent Beta distributions with specified means (shares) and standard deviations (sds), ensuring that the sum of samples across columns does not exceed 1 for each row.

maxent_disaggregation.shares.generalized_dirichlet(n, shares, sds, seed=None)[source]#

Generate random samples from a Generalised Dirichlet distribution with given shares and standard deviations.

Reference:#

Plessis, Sylvain, Nathalie Carrasco, and Pascal Pernot. “Knowledge-Based Probabilistic Representations of Branching Ratios in Chemical Networks: The Case of Dissociative Recombinations.” The Journal of Chemical Physics 133, no. 13 (October 7, 2010): 134110. https://doi.org/10.1063/1.3479907.

Parameters:#

n (int): Number of samples to generate. shares (array-like): best-guess (mean) values for the shares.

Must sum to 1!y.

sds (array-like): Array of standard deviations for the shares.

Returns:#

tuple: A tuple containing:
  • sample (ndarray): An array of shape (n, lentgh(shares)) containing the generated samples.

  • None: Placeholder for compatibility with other functions (always returns None).

maxent_disaggregation.shares.dirichlet_max_ent(n: int, shares: numpy.ndarray | list, seed=None, **kwargs)[source]#

Generate samples from a Dirichlet distribution with maximum entropy. This function computes the gamma parameter that maximizes the entropy of the Dirichlet distribution given the input shares. It then generates n samples from the resulting Dirichlet distribution. :param n: The number of samples to generate. :type n: int :param shares: The input shares (probabilities) that define

the Dirichlet distribution.

Parameters:

**kwargs – Additional keyword arguments passed to the find_gamma_maxent function.

Returns:

A tuple containing:
  • sample (ndarray): An array of shape (n, len(shares)) containing the generated samples.

  • gamma_par (float): The computed gamma parameter that maximizes the entropy of the Dirichlet distribution.

Return type:

tuple

maxent_disaggregation.shares.sample_shares(n: int, shares: numpy.ndarray | list, sds: numpy.ndarray | list = None, grad_based: bool = False, threshold_shares: float = 0.1, threshold_sd: float = 0.2, suppress_warnings: bool = False, seed: int = None, **kwargs)[source]#

Samples from a distribution of shares based on given means and standard deviations.

This function generates samples of shares using either a generalized Dirichlet distribution, a maximum entropy Dirichlet distribution, or a combination of both, depending on the availability of mean and standard deviation inputs.

Parameters:#

nint

Number of samples to generate.

sharesnp.ndarray | list

Array or list of mean values for the shares. These should sum to 1 if fully specified.

sdsnp.ndarray | list, optional

Array or list of standard deviations for the shares. If not provided, defaults to NaN.

grad_basedbool, optional

Whether to use gradient-based optimization for maximum entropy Dirichlet sampling. Default is False.

threshold_sharesfloat, optional

Threshold for the relative difference between the sample mean and the specified shares. If the difference exceeds this threshold, a warning is raised. Default is 0.1 (10%).

threshold_sdfloat, optional

Threshold for the relative difference between the sample standard deviation and the specified sds. If the difference exceeds this threshold, a warning is raised. Default is 0.2 (20%).

suppress_warningsbool, optional

If True, suppress warnings about sample means and standard deviations deviating from the specified values. Default is False.

seedint, optional

Random seed for reproducibility. Default is None.

**kwargsdict

Additional keyword arguments passed to the underlying sampling functions.

Returns:#

samplenp.ndarray

A 2D array of shape (n, K), where K is the number of shares, containing the sampled values.

gamma_parnp.ndarray

Parameters of the Dirichlet or generalized Dirichlet distribution used for sampling.

Notes:#

  • If both means and standard deviations are provided for all shares, the generalized Dirichlet distribution is used.

  • If only means are provided, the maximum entropy Dirichlet distribution is used.

  • If no means are provided, a uniform Dirichlet distribution is used.

  • If a mix of known and unknown means/standard deviations is provided, a hierarchical approach is used to sample the shares (function called hybrid_dirichlet).

  • The function raises warnings if standard deviations are provided without corresponding mean values, as this is not recommended.

maxent_disaggregation.shares.hybrid_dirichlet(shares, size=None, sds=None, max_rel_bias=0.1, max_iter_bias_fix=20, max_iter_beta_sampling=1000.0, seed=None, **kwargs)[source]#

Function to sample in the case of partial mean and sd information using a hybrid Dirichlet distribution with iterative bias correction. Samples are generated from a combination of beta distributions for shares with both mean and sd, and a maximum-entropy Dirichlet distribution for shares with only mean values. This function iteratively adjusts the standard deviations of shares that exceed a specified relative bias threshold, ensuring that the final samples meet the desired accuracy in terms of relative bias.

Parameters:#

sharesarray-like

Array of mean values for the shares. These should sum to 1 if fully specified.

sizeint

Number of samples to generate.

sdsarray-like, optional

Array of standard deviations for the shares. If not provided, defaults to NaN.

max_rel_biasfloat, optional

Maximum relative bias allowed for the generated samples. Default is 0.10 (10%).

max_iter_bias_fixint, optional

Maximum number of iterations for bias correction. Default is 20.

max_iter_rbeta3int, optional

Maximum number of iterations for beta sampling. Default is 1e3.

**kwargsdict

Additional keyword arguments passed to the underlying sampling functions.

Returns:#

samplenp.ndarray

A 2D array of shape (size, len(shares)) containing the sampled values.

maxent_disaggregation.shares.sample_dirichlet(shares, size=None, gamma_par=None, threshold_dirichlet=0.01, force_nonzero_samples=True, seed=None, **kwargs)[source]#

A wrapper function to sample from a Dirichlet distribution with a given set of shares and gamma concentration parameter.

It differs from the default Dirichlet distroibution in that when the

For each variable i whose mean value (alpha_i = gamma_par * share_i) that is below a threshold, a fallback parametrization of the Gamma distribution (which is used for sampling from the Dirichlet distribution) is applied to avoid zero or near-zero sampling. This is especially useful for very small shape parameters, which can cause numerical issues in in the dirichlet sampling. The following pragmatic workaround is used that sets:

  • alpha_i = 1 (shape) for shares below threshold

  • rate = 1 / alpha_i ensuring less extreme values.

For more details, see the discussion in [rgamma()] under “small shape values” and the references there. This approach helps mitigate issues where numeric precision can push small Gamma-distributed values to zero (see https://stat.ethz.ch/R-manual/R-devel/library/stats/html/GammaDist.html). Note however that fix changes the expectation values (means) of the sampled parameters such that they can deviate from the inputed shares. If this is undesired set force_nonzero_samples=False.

Parameters:#

sizeint

The number of samples to generate.

sharesarray-like

The input shares (probabilities) that define the Dirichlet distribution.

gamma_parfloat

The gamma parameter that scales the shares for the Dirichlet distribution.

thresholdfloat

The threshold below which the shares are adjusted to avoid zero sampling.

force_nonzero_samplesbool

If True, forces non-zero samples by adjusting alphas and rate/scale within the gamma distribution. This may lead to biased means of the samples. If False, uses the original scipy implementation of the Dirichlet distribution. Note that in the case of very small alphas, this may lead to a large number zeros in the samples due to numerical issues. The means are unbiased though.

Methods:#

sample():

Generates samples from the Dirichlet distribution.

maxent_disaggregation.shares.check_sample_means_and_sds(sample, shares, sds, threshold_shares=0.1, threshold_sd=0.2, suppress_warnings=False)[source]#

Check if the sample means and standard deviations deviate more than the specified thresholds from the specified shares and standard deviations. If they do, a warning is raised. Parameters: ———- sample : np.ndarray

The generated samples from the Dirichlet distribution.

sharesnp.ndarray

The specified shares (mean values) for the Dirichlet distribution.

sdsnp.ndarray

The specified standard deviations for the shares.

threshold_sharesfloat, optional

The threshold for the relative difference between the sample mean and the specified shares. If the difference exceeds this threshold, a warning is raised. Default is 0.1 (10%).

threshold_sdfloat, optional

The threshold for the relative difference between the sample standard deviation and the specified sds. If the difference exceeds this threshold, a warning is raised. Default is 0.2 (20%).

suppress_warningsbool, optional

If True, suppress all warnings from this function. Default is False.

maxent_disaggregation.shares.sample_from_beta(n, shares, sds, fix=True, max_iter=1000.0, seed=None)[source]#

Generate random samples from independent Beta distributions with specified means (shares) and standard deviations (sds), ensuring that the sum of samples across columns does not exceed 1 for each row.

Parameters:
  • n (int) – Number of samples to generate.

  • shares (array-like) – Array of mean values (between 0 and 1) for each Beta distribution.

  • sds (array-like) – Array of standard deviations for each Beta distribution.

  • fix (bool, optional (default=True)) – If True, automatically adjust invalid variance values to the maximum allowed for the given mean. If False, raise a ValueError when invalid parameter combinations are detected.

  • max_iter (int, optional (default=1e3)) – Maximum number of iterations to attempt resampling rows where the sum exceeds 1.

Returns:

x – An (n, k) array of samples, where k is the length of shares, such that each row sums to less than or equal to 1.

Return type:

ndarray

Raises:

ValueError – If the provided standard deviation is too large for the given mean (unless fix=True), or if a valid sample cannot be generated within max_iter iterations.

Notes

The function ensures that for each sample (row), the sum across all Beta-distributed variables does not exceed 1 by resampling as needed.