我正在试图找到有效解决这个问题的算法:

给定一个未排序的数字数组,您需要将其分为几个长度从a到b的子数组,以便每个子数组中最小数和最大数之间的差之和最大数.必须保留数字的顺序.

例子:

a = 3, b = 7
input: [5, 8, 4, 5, 1, 3, 5, 1, 3, 1]
answer: [[5, 8, 4], [5, 1, 3], [5, 1, 3, 1]] (diff sum is 12)

a = 3, b = 4
input: [1, 6, 2, 2, 5, 2, 8, 1, 5, 6]
answer: [[1, 6, 2], [2, 5, 2, 8], [1, 5, 6]] (diff sum is 16)

a = 4, b = 5
input: [5, 8, 4, 5, 1, 3, 5, 1, 3, 1, 2]
answer: splitting is impossible

到目前为止,我提出的唯一解决方案是try 所有可能的子数组组合.

from collections import deque

def partition_array(numbers, min_len, max_len):
  max_diff_subarray = None

  queue = deque()

  for end in range(min_len - 1, max_len):
    if end < len(numbers):
      diff = max(numbers[0:end + 1]) - min(numbers[0:end + 1])
      queue.append(Subarray(previous=None, start=0, end=end, diff_sum=diff))

  while queue:
    subarray = queue.popleft()

    if subarray.end == len(numbers) - 1:
      if max_diff_subarray is None:
        max_diff_subarray = subarray
      elif max_diff_subarray.diff_sum < subarray.diff_sum:
        max_diff_subarray = subarray
      continue

    start = subarray.end + 1

    for end in range(start + min_len - 1, start + max_len):
      if end < len(numbers):
        diff = max(numbers[start:end + 1]) - min(numbers[start:end + 1])
        queue.append(Subarray(previous=subarray, start=start, end=end, diff_sum=subarray.diff_sum + diff))
      else:
        break

  return max_diff_subarray

class Subarray:
  def __init__(self, previous=None, start=0, end=0, diff_sum=0):
    self.previous = previous
    self.start = start
    self.end = end
    self.diff_sum = diff_sum

numbers = [5, 8, 4, 5, 1, 3, 5, 1, 3, 1]
a = 3
b = 7
result = partition_array(numbers, a, b)
print(result.diff_sum)

还有更省时的解决方案吗?

推荐答案

首先让我们解决一个更简单的问题.让我们运行一个数组,并为所有固定大小的窗口给出分钟和最大值.

def window_mins_maxes (size, array):
    min_values = deque()
    min_positions = deque()
    max_values = deque()
    max_positions = deque()

    for i, value in enumerate(array):
        if size <= i:
            yield (i, min_values[0], max_values[0])
            if min_positions[0] <= i - size:
                min_values.popleft()
                min_positions.popleft()

            if max_positions[0] <= i - size:
                max_values.popleft()
                max_positions.popleft()

        while 0 < len(min_values) and value <= min_values[-1]:
            min_values.pop()
            min_positions.pop()
        min_values.append(value)
        min_positions.append(i)

        while 0 < len(max_values) and max_values[-1] <= value:
            max_values.pop()
            max_positions.pop()
        max_values.append(value)
        max_positions.append(i)

    yield (len(array), min_values[0], max_values[0])

这显然需要O(size)内存.不太明显的是,处理长度为n的数组需要时间O(n).但我们可以通过摊销分析看到这一点.我们将将判断小于它的可能值的成本、稍后判断是否应该删除的某些元素的成本以及添加的成本归因于每个元素.这考虑了所有操作(尽管这不是它们发生的顺序),并且是每个元素的固定工作量.

另请注意,解决方案的这一部分所需的内存适合O(n)以内.

到目前为止,我认为这是一个众所周知的动态规划问题.现在让我们让它更具挑战性.

我们将将分区问题作为传统的动态编程问题来解决.我们将构建一个数组best_weight,其中包含到该点为止的最佳分区,以及在该点之前结束的前一个分区的开始的prev_index.

为了构建它,我们将使用上述算法获取之前的分区,并将min_len中的一个添加到它中.如果它比之前的分区更好,我们将将其信息保存在这些数组中.然后我们将从该分区向前扫描,并进行最多max_len次扫描.然后我们继续到分区的下一个可能的开始.

完成后,我们将从该代码中找到答案.

这是它的样子:

def partition_array(numbers, min_len, max_len):
    if max_len < min_len or len(numbers) < min_len:
        return (None, None)

    best_weight = [None for _ in numbers]
    prev_index = [None for _ in numbers]

    # Need an extra entry for off of the end of the array.
    best_weight.append(None)
    prev_index.append(None)

    best_weight[0] = 0

    for i, min_value, max_value in window_mins_maxes(min_len, numbers):
        window_start_weight = best_weight[i - min_len]
        if window_start_weight is not None:
            j = i
            while j - i < max_len - min_len and j < len(numbers):
                new_weight = window_start_weight + max_value - min_value
                if best_weight[j] is None or best_weight[j] < new_weight:
                    best_weight[j] = new_weight
                    prev_index[j] = i - min_len

                if numbers[j] < min_value:
                    min_value = numbers[j]
                if max_value < numbers[j]:
                    max_value = numbers[j]
                j += 1

            # And fill in the longest value.
            new_weight = window_start_weight + max_value - min_value
            if best_weight[j] is None or best_weight[j] < new_weight:
                best_weight[j] = new_weight
                prev_index[j] = i - min_len

    if best_weight[-1] is None:
        return (None, None)
    else:
        path = [len(numbers)]
        while prev_index[path[-1]] is not None:
            path.append(prev_index[path[-1]])
        path = list(reversed(path))
        partitioned = [numbers[path[i]:path[i+1]] for i in range(len(path)-1)]
        return (best_weight[-1], partitioned)

请注意,我们对每个可能的开始和长度进行O(1)次工作.这就是时间O((max_len + 1 - min_len)*n).我们使用的数据 struct 的大小均以O(n)为上限.给出了我在 comments 中promise 的整体效率.

现在让我们测试一下.

print(partition_array([5, 8, 4, 5, 1, 3, 5, 1, 3, 1], 3, 7))
print(partition_array([1, 6, 2, 2, 5, 2, 8, 1, 5, 6], 3, 4))
print(partition_array([5, 8, 4, 5, 1, 3, 5, 1, 3, 1, 2], 4, 5))

输出是:

(12, [[5, 8, 4], [5, 1, 3], [5, 1, 3, 1]])
(16, [[1, 6, 2], [2, 5, 2, 8], [1, 5, 6]])
(None, None)

Python相关问答推荐

在for循环中仅执行一次此操作

Python中MongoDB的BSON时间戳

使用polars .滤镜进行切片速度比pandas .loc慢

Polars LazyFrame在收集后未返回指定的模式顺序

ModuleNotFound错误:没有名为flags.State的模块; flags不是包

图像 pyramid .难以创建所需的合成图像

两个pandas的平均值按元素的结果串接元素.为什么?

Pandas:将多级列名改为一级

Stacked bar chart from billrame

将输入聚合到统一词典中

Scrapy和Great Expectations(great_expectations)—不合作

Pandas Loc Select 到NaN和值列表

实现神经网络代码时的TypeError

使用Python和文件进行模糊输出

(Python/Pandas)基于列中非缺失值的子集DataFrame

30个非DATETIME天内的累计金额

根据客户端是否正在传输响应来更改基于Flask的API的行为

jsonschema日期格式

Pandas在rame中在组内洗牌行,保持相对组的顺序不变,

高效地计算数字数组中三行上三个点之间的Angular