maxent_disaggregation.aggregate#
Module Contents#
Functions#
|
Generate random aggregate values based on the information provided. |
|
Draw random samples from a truncated normal distribution |
|
Estimate the Gaussian parameters of a truncated normal distribution given observed |
|
Check if the sample mean and standard deviation are close to the input values. |
- maxent_disaggregation.aggregate.sample_aggregate(n: int, mean: float = None, sd: float = None, low_bound: float = 0, high_bound: float = np.inf, log: bool = True, suppress_warnings: bool = False, seed: int = None) numpy.ndarray[source]#
Generate random aggregate values based on the information provided. The distribution from which to sample is determined internally based on the information provided by the user.”
- Parameters:
n (int) – The number of samples to generate.
mean – The best guess of the aggregate value.
sd – The standard deviation of the aggregate value.
low_bound – The lower boundary of the aggregate value.
high_bound – The upper boundary of the aggregate value.
log – If True, the lognormal distribution is used for the aggregate value when a mean and a standard deviation are provided. If False, samples are drawn from a truncated normal distribution, which is the maximum entropy solution but produces a biased mean. Default is True
suppress_warnings (bool, optional) – If True, suppress warnings about sample statistics deviating from input values. Default is False.
seed (int, optional) – Random seed for reproducibility. Default is None.
- maxent_disaggregation.aggregate.sample_truncnorm(obs_mean, obs_std, a=None, b=None, size=1000, seed=None)[source]#
Draw random samples from a truncated normal distribution given observed mean, standard deviation, and bounds.
- Parameters:
obs_mean (float) – Observed mean used to infer the underlying normal distribution’s location.
obs_std (float) – Observed standard deviation used to infer the underlying normal distribution’s scale.
a (float) – Lower truncation bound (expressed in the same units as obs_mean/obs_std). If None, defaults to 0.
b (float) – Upper truncation bound (expressed in the same units as obs_mean/obs_std). If None, defaults to infinity.
size (int, optional) – Number of random samples to draw. Default is 1000.
- Returns:
1-D array of random variates drawn from the truncated normal distribution.
- Return type:
numpy.ndarray
Notes
This function relies on estimate_truncnormparams(obs_mean, obs_std, a, b) to compute parameters (mu, sigma, alpha, beta) suitable for scipy.stats.truncnorm.rvs, where mu and sigma are the location and scale of the underlying normal distribution and alpha, beta are the standardized truncation limits accepted by scipy.stats.truncnorm.
Examples
>>> samples = sample_truncnorm(10.0, 2.0, 5.0, 15.0, size=500) >>> samples.shape(500,)
- maxent_disaggregation.aggregate.estimate_truncnormparams(obs_mean, obs_std, a, b, mu_init=None, sigma_init=None, mean_weight=10)[source]#
Estimate the Gaussian parameters of a truncated normal distribution given observed statistics. This function finds the parameters (mu, sigma) of a truncated normal distribution that best match the observed mean and standard deviation, given truncation bounds.
- Parameters:
obs_mean (float) – The observed mean of the truncated distribution.
obs_std (float) – The observed standard deviation of the truncated distribution.
a (float) – The lower truncation bound.
b (float) – The upper truncation bound.
mu_init (float, optional) – Initial guess for the location parameter (mu). Defaults to obs_mean.
sigma_init (float, optional) – Initial guess for the scale parameter (sigma). Defaults to obs_std.
mean_weight (float, optional) – Weighting factor for the mean relative to the standard deviation in the optimization objective. Higher values prioritize matching the mean more closely. Default is 10. Adjust as needed.
- Returns:
mu_opt (float) – Optimal location parameter of the underlying normal distribution.
sigma_opt (float) – Optimal scale parameter of the underlying normal distribution.
alpha_opt (float) – Standardized lower truncation bound: (a - mu_opt) / sigma_opt.
beta_opt (float) – Standardized upper truncation bound: (b - mu_opt) / sigma_opt.
Notes
The function uses least squares optimization to minimize the difference between the theoretical and observed moments of the truncated normal distribution. The scale parameter (sigma) is optimized in logscale to ensure positivity without boundary issues.
Examples
>>> mu, sigma, alpha, beta = estimate_truncnormparams(5.0, 1.5, 0, 10) >>> print(f"Estimated mu: {mu:.2f}, sigma: {sigma:.2f}")
- maxent_disaggregation.aggregate.check_sample_vs_input(mean, sd, low_bound, high_bound, samples, threshold_shares=0.05, threshold_sd=0.2, suppress_warnings=False)[source]#
Check if the sample mean and standard deviation are close to the input values. Raise warnings if the mean and standard deviation deviate beyond specified thresholds. Raise a ValueError if samples fall outside the specified bounds.
- Parameters:
mean (float) – The input mean value.
sd (float) – The input standard deviation value.
low_bound (float) – The lower bound used in sampling.
high_bound (float) – The upper bound used in sampling.
samples (numpy.ndarray) – The array of sampled values.
threshold_shares (float, optional) – The relative tolerance for mean comparison. Default is 0.05 (5%).
threshold_sd (float, optional) – The relative tolerance for standard deviation comparison. Default is 0.2 (20%).
suppress_warnings (bool, optional) – If True, suppress warnings about sample statistics deviating from input values. Default is False.
- Returns:
None
- Warnings are printed if the sample statistics deviate significantly from the input values for the mean or standard deviation.
- Raises ValueError if samples fall outside the specified bounds.