在序列元素只能与相邻元素交换的要求下,如何计算两个序列的数字距离?不允许插入和删除.距离函数必须确定从一个字符串到达另一个字符串所需的交换操作数量.此外,距离函数不会为不同长度的序列定义,或者如果序列不共享相同的符号.最后,您可以假设序列中没有重复项.

根据我的研究,我需要像Damerau–Levenshtein distance这样的东西,但我不知道如何正确修改距离函数.

Example

对于序列ABC,距离为1的仅有两个有效变换是BACACB.

Code

Rust Playground

use std::collections::HashSet;
use std::hash::Hash;

fn dist<T: PartialEq + Eq + Hash>(lhs: &[T], rhs: &[T]) -> Option<usize> {
    // Distance function is not defined for differing lengths
    if lhs.len() != rhs.len() {
        return None;
    }

    let lhs_c: HashSet<_> = lhs.iter().collect();
    let rhs_c: HashSet<_> = rhs.iter().collect();

    // Distance function is not defined if an element occurs more than once in each sequence
    if lhs.len() != lhs_c.len() || rhs.len() != rhs_c.len() {
        return None;
    }

    // Distance function is not defined if the sequences don't share all elements
    if lhs_c != rhs_c {
        return None;
    } 

    // Calculate the edit distance of `lhs` and `rhs` with only adjacent transpositions.
    todo!()
}

#[test]
fn dist_is_none_for_differing_lengths() {
    assert_eq!(None, dist(&["b", "a", "c"], &["a", "b", "c", "d"]));
}

#[test]
fn dist_is_none_with_duplicates() {
    assert_eq!(None, dist(&["b", "a", "c", "a"], &["a", "b", "c", "d"]));
    assert_eq!(None, dist(&["a", "b", "c", "d"], &["a", "b", "c", "c"]));
}

#[test]
fn dist_is_none_with_non_empty_difference() {
    assert_eq!(None, dist(&["a", "b", "c"], &["d", "e", "f"]));
}

#[test]
fn dist_bac_to_abc_is_one() {
    assert_eq!(Some(1), dist(&["b", "a", "c"], &["a", "b", "c"]));
}

#[test]
fn dist_bca_to_bac_is_one() {
    assert_eq!(Some(1), dist(&["b", "c", "a"], &["b", "a", "c"]));
}

#[test]
fn dist_cab_to_abc_is_two() {
    assert_eq!(Some(2), dist(&["c", "a", "b"], &["a", "b", "c"]));
}

#[test]
fn dist_cdab_to_abcd_is_four() {
    assert_eq!(Some(4), dist(&["c", "d", "a", "b"], &["a", "b", "c", "d"]));
}

推荐答案

您可以创建从元素到数字的映射,当将其应用于其中一个序列时,将对其进行排序.然后您可以将此映射应用于其他序列.之后,您可以计算对第二个序列进行气泡排序所需的交换数:(plaground)

use std::collections::HashMap;
use core::hash::Hash;
use std::collections::HashSet;

fn create_mapping<T: Eq + Hash>(seq: &[T]) -> HashMap<&T, usize> {
    let mut ret = HashMap::new();
    for (i, elem) in seq.iter().enumerate() {
        ret.insert(elem, i);
    }
    ret
}

fn dist<T: Eq + Hash>(lhs: &[T], rhs: &[T]) -> Option<usize> {
    // Distance function is not defined for differing lengths
    if lhs.len() != rhs.len() {
        return None;
    }
    let lhs_c: HashSet<_> = lhs.iter().collect();
    let rhs_c: HashSet<_> = rhs.iter().collect();
    // Distance function is not defined if an element occurs more than once in each sequence
    if lhs.len() != lhs_c.len() || rhs.len() != rhs_c.len() {
        return None;
    }
    // Distance function is not defined if the sequences don't share all elements
    if lhs_c != rhs_c {
        return None;
    } 
    // Calculate the edit distance of `lhs` and `rhs` with only adjacent transpositions.
    let mapping = create_mapping(lhs);
    let rhs_mapped: Vec<usize> = rhs.iter().map(|e|mapping[e]).collect();
    Some(bubble_sort_length(&rhs_mapped))
}

fn bubble_sort_length(seq: &[usize]) -> usize {
    let mut ret = 0;
    for i in 0..seq.len() {
        for j in (i+1)..seq.len() {
            if seq[i]>seq[j] {
                ret += 1;
            }
        }
    }
    ret
}

#[test]
fn dist_is_none_for_differing_lengths() {
    assert_eq!(None, dist(&["b", "a", "c"], &["a", "b", "c", "d"]));
}

#[test]
fn dist_is_none_with_duplicates() {
    assert_eq!(None, dist(&["b", "a", "c", "a"], &["a", "b", "c", "d"]));
    assert_eq!(None, dist(&["a", "b", "c", "d"], &["a", "b", "c", "c"]));
}

#[test]
fn dist_is_none_with_non_empty_difference() {
    assert_eq!(None, dist(&["a", "b", "c"], &["d", "e", "f"]));
}

#[test]
fn dist_bac_to_abc_is_one() {
    assert_eq!(Some(1), dist(&["b", "a", "c"], &["a", "b", "c"]));
}

#[test]
fn dist_bca_to_bac_is_one() {
    assert_eq!(Some(1), dist(&["b", "c", "a"], &["b", "a", "c"]));
}

#[test]
fn dist_cab_to_abc_is_two() {
    assert_eq!(Some(2), dist(&["c", "a", "b"], &["a", "b", "c"]));
}

#[test]
fn dist_cdab_to_abcd_is_four() {
    assert_eq!(Some(4), dist(&["c", "d", "a", "b"], &["a", "b", "c", "d"]));
}

该算法的时间复杂度为O(n²),其中n是序列的长度.可能有更好的方法来实现bubble_sort_length功能(参见声称时间复杂度为O(n*log(n))https://www.geeksforgeeks.org/inversion-count-in-array-using-merge-sort/)

Rust相关问答推荐

如何在 struct 中填充缓冲区并同时显示它?

如何在不安全的代码中初始化枚举 struct

为什么复印是豆荚的一个重要特征?

如何装箱生命周期相关联的两个对象?

常量泛型和类型枚举箱有重叠的用途吗?

如何仅使用http机箱发送http请求?

无法将记录器向下转换回原始 struct

如何为 struct 字段设置新值并在Ruust中的可变方法中返回旧值

为什么`AlternateScreen`在读取输入键时需要按Enter键?

为什么Option类型try块需要类型注释?

Rust将String上的迭代器转换为&;[&;str]

为什么我必须使用 PhantomData?在这种情况下它在做什么?

go 重并堆积MPSC通道消息

为什么Rust中无法推断生命周期?

Rust Redis 中的 HSET 命令问题

使用 traits 时,borrow 的值不会存在足够长的时间

提取 struct 生成宏中字段出现的索引

将数据序列化为 struct 模型,其中两个字段的数据是根据 struct 中的其他字段计算的

为什么 Rust 标准库同时为 Thing 和 &Thing 实现特征?

基于名称是否存在的条件编译