:py:mod:`maxent_disaggregation.aggregate` ========================================= .. py:module:: maxent_disaggregation.aggregate Module Contents --------------- Functions ~~~~~~~~~ .. autoapisummary:: maxent_disaggregation.aggregate.sample_aggregate maxent_disaggregation.aggregate.sample_truncnorm maxent_disaggregation.aggregate.estimate_truncnormparams maxent_disaggregation.aggregate.check_sample_vs_input .. py:function:: sample_aggregate(n: int, mean: float = None, sd: float = None, low_bound: float = 0, high_bound: float = np.inf, log: bool = True, suppress_warnings: bool = False, seed: int = None) -> numpy.ndarray Generate random aggregate values based on the information provided. The distribution from which to sample is determined internally based on the information provided by the user." :param n: The number of samples to generate. :type n: int :param mean: The best guess of the aggregate value. :param sd: The standard deviation of the aggregate value. :param low_bound: The lower boundary of the aggregate value. :param high_bound: The upper boundary of the aggregate value. :param log: If True, the lognormal distribution is used for the aggregate value when a mean and a standard deviation are provided. If False, samples are drawn from a truncated normal distribution, which is the maximum entropy solution but produces a biased mean. Default is True :param suppress_warnings: If True, suppress warnings about sample statistics deviating from input values. Default is False. :type suppress_warnings: bool, optional :param seed: Random seed for reproducibility. Default is None. :type seed: int, optional .. py:function:: sample_truncnorm(obs_mean, obs_std, a=None, b=None, size=1000, seed=None) Draw random samples from a truncated normal distribution given observed mean, standard deviation, and bounds. :param obs_mean: Observed mean used to infer the underlying normal distribution's location. :type obs_mean: float :param obs_std: Observed standard deviation used to infer the underlying normal distribution's scale. :type obs_std: float :param a: Lower truncation bound (expressed in the same units as obs_mean/obs_std). If None, defaults to 0. :type a: float :param b: Upper truncation bound (expressed in the same units as obs_mean/obs_std). If None, defaults to infinity. :type b: float :param size: Number of random samples to draw. Default is 1000. :type size: int, optional :returns: 1-D array of random variates drawn from the truncated normal distribution. :rtype: numpy.ndarray .. rubric:: Notes This function relies on estimate_truncnormparams(obs_mean, obs_std, a, b) to compute parameters (mu, sigma, alpha, beta) suitable for scipy.stats.truncnorm.rvs, where mu and sigma are the location and scale of the underlying normal distribution and alpha, beta are the standardized truncation limits accepted by scipy.stats.truncnorm. .. rubric:: Examples >>> samples = sample_truncnorm(10.0, 2.0, 5.0, 15.0, size=500) >>> samples.shape(500,) .. py:function:: estimate_truncnormparams(obs_mean, obs_std, a, b, mu_init=None, sigma_init=None, mean_weight=10) Estimate the Gaussian parameters of a truncated normal distribution given observed statistics. This function finds the parameters (mu, sigma) of a truncated normal distribution that best match the observed mean and standard deviation, given truncation bounds. :param obs_mean: The observed mean of the truncated distribution. :type obs_mean: float :param obs_std: The observed standard deviation of the truncated distribution. :type obs_std: float :param a: The lower truncation bound. :type a: float :param b: The upper truncation bound. :type b: float :param mu_init: Initial guess for the location parameter (mu). Defaults to obs_mean. :type mu_init: float, optional :param sigma_init: Initial guess for the scale parameter (sigma). Defaults to obs_std. :type sigma_init: float, optional :param mean_weight: Weighting factor for the mean relative to the standard deviation in the optimization objective. Higher values prioritize matching the mean more closely. Default is 10. Adjust as needed. :type mean_weight: float, optional :returns: * **mu_opt** (*float*) -- Optimal location parameter of the underlying normal distribution. * **sigma_opt** (*float*) -- Optimal scale parameter of the underlying normal distribution. * **alpha_opt** (*float*) -- Standardized lower truncation bound: (a - mu_opt) / sigma_opt. * **beta_opt** (*float*) -- Standardized upper truncation bound: (b - mu_opt) / sigma_opt. .. rubric:: Notes The function uses least squares optimization to minimize the difference between the theoretical and observed moments of the truncated normal distribution. The scale parameter (sigma) is optimized in logscale to ensure positivity without boundary issues. .. rubric:: Examples >>> mu, sigma, alpha, beta = estimate_truncnormparams(5.0, 1.5, 0, 10) >>> print(f"Estimated mu: {mu:.2f}, sigma: {sigma:.2f}") .. py:function:: check_sample_vs_input(mean, sd, low_bound, high_bound, samples, threshold_shares=0.05, threshold_sd=0.2, suppress_warnings=False) Check if the sample mean and standard deviation are close to the input values. Raise warnings if the mean and standard deviation deviate beyond specified thresholds. Raise a ValueError if samples fall outside the specified bounds. :param mean: The input mean value. :type mean: float :param sd: The input standard deviation value. :type sd: float :param low_bound: The lower bound used in sampling. :type low_bound: float :param high_bound: The upper bound used in sampling. :type high_bound: float :param samples: The array of sampled values. :type samples: numpy.ndarray :param threshold_shares: The relative tolerance for mean comparison. Default is 0.05 (5%). :type threshold_shares: float, optional :param threshold_sd: The relative tolerance for standard deviation comparison. Default is 0.2 (20%). :type threshold_sd: float, optional :param suppress_warnings: If True, suppress warnings about sample statistics deviating from input values. Default is False. :type suppress_warnings: bool, optional :returns: * *None* * *- Warnings are printed if the sample statistics deviate significantly from the input values for the mean or standard deviation.* * *- Raises ValueError if samples fall outside the specified bounds.*