我是一个非常新手,而且我正在研究一个有限范围的动态优化程序:我的奖励函数受制于作为控件的函数而积累的资本,以及根据相同的控件耗尽的有限资源.然而,我的资源约束函数似乎不起作用,因为资源最终是负的:

非负性约束:

# Non-negativity constraint for finite resource

def resource_constraint(controls):
    finite_resource = initial_state[0]
    for t in range(horizon):
        finite_resource -= controls[t]
        if finite_resource < 0:
            return finite_resource  # Constraint violated, return the negative value
    return 0  # Constraint satisfied, return 0

这是在优化 routine 中实现的:

# Define constraint for finite resource
finite_resource_constraint = {'type': 'ineq', 'fun': resource_constraint}

# Solve the optimization problem with constraints
result = minimize(optimization_problem, initial_controls, method='SLSQP', bounds=control_bounds_list, constraints=finite_resource_constraint)

完整的计划是

import numpy as np
from scipy.optimize import minimize

# Problem parameters
horizon = 40  # Time horizon
state_dim = 2  # Dimension of the state
initial_state = np.array([100.0, 0.0])  # Initial state: [finite resource, accumulated resource]

# Dynamics function
def dynamics_function(state, control):
    finite_resource = state[0] - control  # Depletion of the finite resource
    accumulated_resource = state[1] + 0.2 * control  # Accumulation of the second resource
    return np.array([finite_resource, accumulated_resource])

# Reward function to be maximized
def reward_function(state, control):
    return 1 - control**2 + 2*state[1]  # Example production utility, to be maximized

# Non-negativity constraint for finite resource
def resource_constraint(controls):
    finite_resource = initial_state[0]
    for t in range(horizon):
        finite_resource -= controls[t]
        if finite_resource < 0:
            return finite_resource  # Constraint violated, return the negative value
    return 0  # Constraint satisfied, return 0


# Define the optimization problem
def optimization_problem(controls):
    total_reward = 0
    state = initial_state.copy()

    for t in range(horizon):
        control = controls[t]
        total_reward += reward_function(state, control)
        state = dynamics_function(state, control)

    return -total_reward  # Maximize the total reward (minimize the negative)

# Initial guess for controls
initial_controls = np.zeros(horizon)

# Define bounds for controls (production rate)
control_bounds = (0, np.inf)  # Production rate bounds with unbounded upper limit
control_bounds_list = [control_bounds] * horizon

# Define constraint for finite resource
finite_resource_constraint = {'type': 'ineq', 'fun': resource_constraint}

# Solve the optimization problem with constraints
result = minimize(optimization_problem, initial_controls, method='SLSQP', bounds=control_bounds_list, constraints=finite_resource_constraint)

# Extract the optimal controls (production rates)
optimal_controls = result.x

# Calculate the finite resource at the end of optimization
final_state = initial_state.copy()
for t in range(horizon):
    final_state = dynamics_function(final_state, optimal_controls[t])

print("Optimal Production Rates:", optimal_controls)
print("Optimal Utility:", -result.fun)
print("Final Finite Resource:", final_state[0])

我预计资源将保持非负值,但事实并非如此.你知道我哪里搞错了吗?

先谢谢你

推荐答案

(很多)要求简化.最重要的更改是,您应该从约束函数返回从控制变量的累积和中找到的完整的control series,并让Scipy将此系列中的每个值解释为需要为非负数:

import numpy as np
from scipy.optimize import minimize, Bounds, NonlinearConstraint

horizon = 40   # Time horizon
initial_state = np.array((100, 0))  # finite resource, accumulated resource

# Depletion of the finite resource,
# Accumulation of the second resource
control_coef = np.array((-1, 0.2))


def dynamics_function(state: np.ndarray, control: float) -> np.ndarray:
    return state + control_coef*control


def reward_function(state: np.ndarray, control: float) -> float:
    """Reward function to be maximized"""
    finite, accumulated = state
    return 1 - control**2 + 2*accumulated  # Example production utility, to be maximized


def resource_constraint(controls: np.ndarray) -> float:
    """Non-negativity constraint for finite resource"""
    finite, accumulated = initial_state
    control_series = finite - controls.cumsum()
    return control_series


def optimization_problem(controls: np.ndarray) -> float:
    total_reward = 0
    state = initial_state

    for control in controls:
        total_reward += reward_function(state, control)
        state = dynamics_function(state, control)

    return -total_reward  # Maximize the total reward (minimize the cost)


def main() -> None:
    result = minimize(
        fun=optimization_problem,
        x0=np.zeros(horizon),
        bounds=Bounds(lb=0),
        constraints=NonlinearConstraint(fun=resource_constraint, lb=0, ub=np.inf),
    )

    optimal_controls = result.x   # production rates
    final_state = initial_state
    for control in optimal_controls:
        final_state = dynamics_function(final_state, control)

    print('Optimal production rates:')
    print(optimal_controls)
    print(f'Optimal utility: {-result.fun:.2f}')
    print(f'Final resources: {final_state[0]:.2f} finite, {final_state[1]:.2f} accumulated')


if __name__ == '__main__':
    main()
Optimal production rates:
[6.22507154 6.02506764 5.82506307 5.62505724 5.42505567 5.22503537
 5.02504236 4.82503844 4.62502625 4.42501332 4.22500823 4.02501148
 3.8250004  3.62500542 3.42498674 3.22498371 3.02496703 2.82496198
 2.62495751 2.42494841 2.22494338 2.02494237 1.82493205 1.62494446
 1.4249527  1.22497927 1.02498866 0.82499873 0.62502126 0.42503274
 0.22499934 0.02496324 0.         0.         0.         0.
 0.         0.         0.         0.        ]
Optimal utility: 776.62
Final resources: 0.00 finite, 20.00 accumulated

Python相关问答推荐

从包含数字和单词的文件中读取和获取数据集

由于NEP 50,向uint 8添加-256的代码是否会在numpy 2中失败?

将整组数组拆分为最小值与最大值之和的子数组

如何让程序打印新段落上的每一行?

avxspan与pandas period_range

不能使用Gekko方程'

如何指定列数据类型

Maya Python脚本将纹理应用于所有对象,而不是选定对象

如何杀死一个进程,我的Python可执行文件以sudo启动?

如何检测鼠标/键盘的空闲时间,而不是其他输入设备?

ConversationalRetrivalChain引发键错误

(Python/Pandas)基于列中非缺失值的子集DataFrame

Discord.py -

如何在Python请求中组合多个适配器?

Python日志(log)模块如何在将消息发送到父日志(log)记录器之前向消息添加类实例变量

如何在Python Pandas中填充外部连接后的列中填充DDL值

如何将一组组合框重置回无 Select tkinter?

为用户输入的整数查找根/幂整数对的Python练习

Seaborn散点图使用多个不同的标记而不是点

比较两个有条件的数据帧并删除所有不合格的数据帧