我是一个非常新手,而且我正在研究一个有限范围的动态优化程序:我的奖励函数受制于作为控件的函数而积累的资本,以及根据相同的控件耗尽的有限资源.然而,我的资源约束函数似乎不起作用,因为资源最终是负的:
非负性约束:
# Non-negativity constraint for finite resource
def resource_constraint(controls):
finite_resource = initial_state[0]
for t in range(horizon):
finite_resource -= controls[t]
if finite_resource < 0:
return finite_resource # Constraint violated, return the negative value
return 0 # Constraint satisfied, return 0
这是在优化 routine 中实现的:
# Define constraint for finite resource
finite_resource_constraint = {'type': 'ineq', 'fun': resource_constraint}
# Solve the optimization problem with constraints
result = minimize(optimization_problem, initial_controls, method='SLSQP', bounds=control_bounds_list, constraints=finite_resource_constraint)
完整的计划是
import numpy as np
from scipy.optimize import minimize
# Problem parameters
horizon = 40 # Time horizon
state_dim = 2 # Dimension of the state
initial_state = np.array([100.0, 0.0]) # Initial state: [finite resource, accumulated resource]
# Dynamics function
def dynamics_function(state, control):
finite_resource = state[0] - control # Depletion of the finite resource
accumulated_resource = state[1] + 0.2 * control # Accumulation of the second resource
return np.array([finite_resource, accumulated_resource])
# Reward function to be maximized
def reward_function(state, control):
return 1 - control**2 + 2*state[1] # Example production utility, to be maximized
# Non-negativity constraint for finite resource
def resource_constraint(controls):
finite_resource = initial_state[0]
for t in range(horizon):
finite_resource -= controls[t]
if finite_resource < 0:
return finite_resource # Constraint violated, return the negative value
return 0 # Constraint satisfied, return 0
# Define the optimization problem
def optimization_problem(controls):
total_reward = 0
state = initial_state.copy()
for t in range(horizon):
control = controls[t]
total_reward += reward_function(state, control)
state = dynamics_function(state, control)
return -total_reward # Maximize the total reward (minimize the negative)
# Initial guess for controls
initial_controls = np.zeros(horizon)
# Define bounds for controls (production rate)
control_bounds = (0, np.inf) # Production rate bounds with unbounded upper limit
control_bounds_list = [control_bounds] * horizon
# Define constraint for finite resource
finite_resource_constraint = {'type': 'ineq', 'fun': resource_constraint}
# Solve the optimization problem with constraints
result = minimize(optimization_problem, initial_controls, method='SLSQP', bounds=control_bounds_list, constraints=finite_resource_constraint)
# Extract the optimal controls (production rates)
optimal_controls = result.x
# Calculate the finite resource at the end of optimization
final_state = initial_state.copy()
for t in range(horizon):
final_state = dynamics_function(final_state, optimal_controls[t])
print("Optimal Production Rates:", optimal_controls)
print("Optimal Utility:", -result.fun)
print("Final Finite Resource:", final_state[0])
我预计资源将保持非负值,但事实并非如此.你知道我哪里搞错了吗?
先谢谢你