Background

在PowerShell中,构建hash table以通过特定属性快速访问对象是很常见的,例如,将索引基于LastName:

$List =  ConvertFrom-Csv @'
Id, LastName, FirstName, Country
 1, Aerts,    Ronald,    Belgium
 2, Berg,     Ashly,     Germany
 3, Cook,     James,     England
 4, Duval,    Frank,     France
 5, Lyberg,   Ash,       England
 6, Fischer,  Adam,      Germany
'@

$Index = @{}
$List |ForEach-Object { $Index[$_.LastName] = $_ }

$Index.Cook

Id LastName FirstName Country
-- -------- --------- -------
3  Cook     James     England

在某些情况下,需要在两个(甚至更多)属性上建立索引,例如FirstNameLastName.为此,您可以创建多维键,例如:

$Index = @{}
$List |ForEach-Object {
     $Index[$_.FirstName] = @{}
     $Index[$_.FirstName][$_.LastName] = $_
}

$Index.James.Cook

Id LastName FirstName Country
-- -------- --------- -------
3  Cook     James     England

But it is easier (and possibly even faster) to just concatenate the two properties. If only for checking for the existence of the entry: $Index.ContainsKey('James').ContainsKey('Cook') where an error might occur if the FirstName doesn't exist.
To join the properties, it is required to use a delimiter between the property otherwise different property lists might end up as the same key. As this example: AshlyBerg and AshLyberg.

$Index = @{}
$List |ForEach-Object { $Index["$($_.FirstName)`t$($_.LastName)"] = $_ }

$Index."James`tCook"

Id LastName FirstName Country
-- -------- --------- -------
3  Cook     James     England

Note:以上为Minimal, Reproducible Examples.在现实生活中,我多次遇到下面的问题,其中包括通常在索引中使用的属性的背景和数量可变的情况下连接对象.

Questions:

  1. 对于这种情况,连接(串联)属性是一种好的做法吗?
  2. 如果是,是否有(标准?)此的分隔符?(表示属性名称中不应使用/存在的字符或字符序列)

推荐答案

no built-in separator for multi-component hashtable (dictionary) keys个.

至于custom separator:对于组件本身中不太可能出现的字符,您的最佳匹配是100(带有code point 101的字符),您可以在PowerShell中表示为102.但是,performing convention-based string operations on every lookup is awkward(例如$Index."James`0Cook")和通常only works if stringifying the key components is feasible-或者如果它们都是字符串开头,如您的示例所示.

Using arrays for multi-component keys is syntactically preferable, but using collections generally does not work as-is,因为.NET reference types通常不会对不同的实例进行有意义的比较,即使它们恰好表示相同的数据-请参见this answer.

  • 注: The following assumes that the elements of collections serving as keys do compare meaningfully (are themselves strings or .NET value types or .NET reference types with custom equality logic). If that assumption doesn't hold, there's no robust general solution, but a best-effort approach based on CLIXML serialization shown in the linked answer may work, which you yourself have proposed.

zett42's helpful answer使用tuples,其中do perform meaningful comparisons个不同实例的members包含相同的数据.

There is a way of making regular PowerShell arrays work as hastable keys,以only adds complexity to creating the hashtable(调用构造函数)、while allowing regular array syntax for additions / updates and lookups(例如$Index.('James', 'Cook'))的方式.

  • 注: The following works equally with [ordered] hashtables, which, however, must be referred to by their true type name so as to be able to call a construct, namely [System.Collections.Specialized.OrderedDictionary].
    However, it does not work with generic dictionaries ([System.Collections.Generic.Dictionary[TKey, TValue]]).
# Sample objects for the hashtable.
$list =  ConvertFrom-Csv @'
Id, LastName, FirstName, Country
 1, Aerts,    Ronald,    Belgium
 2, Berg,     Ashly,     Germany
 3, Cook,     James,     England
 4, Duval,    Frank,     France
 5, Lyberg,   Ash,       England
 6, Fischer,  Adam,      Germany
'@

# Initialize the hashtable with a structural equality comparer, i.e.
# a comparer that compares the *elements* of the array and only returns $true
# if *all* compare equal.
# This relies on the fact that [System.Array] implements the
# [System.Collections.IStructuralEquatable] interface.
$dict = [hashtable]::new([Collections.StructuralComparisons]::StructuralEqualityComparer)

# Add entries that map the combination of first name and last name 
# to each object in $list.
# Note the regular array syntax.
$list.ForEach({ $dict.($_.FirstName, $_.LastName) = $_ })

# Use regular array syntax for lookups too.
# 注: CASE MATTERS
$dict.('James', 'Cook')

Important:与常规PowerShell哈希表不同,上述performs case-SENSITIVE comparisons(正如zett42的元组解决方案所做的那样).

Making the comparisons case-INSENSITIVE requires more work,因为需要[System.Collections.IEqualityComparer]接口的自定义实现,即[System.Collections.StructuralComparisons]::StructuralEqualityComparer提供的内容的case-insensitive实现:

# Case-insensitive IEqualityComparer implementation for arrays.
# See the bottom section of this answer for a better .NET 7+ alternative.
class CaseInsensitiveArrayEqualityComparer: System.Collections.IEqualityComparer {
  [bool] Equals([object] $o1, [object] $o2) {
    if ($o1 -isnot [array] -or $o2 -isnot [array]) { return $false }
    return ([System.Collections.IStructuralEquatable] $o1).Equals($o2, [System.StringComparer]::InvariantCultureIgnoreCase)
  }
  [int] GetHashCode([object] $o) {
    if ($o -isnot [Array]) { return $o.GetHashCode() }
    [int] $hashCode = 0
    foreach ($el in $o) {
      if ($null -eq $el) { 
        continue
      } elseif ($el -is [string]) {
        $hashCode = $hashCode -bxor $el.ToLowerInvariant().GetHashCode()
      } else {
        $hashCode = $hashCode -bxor $el.GetHashCode()
      }
    }
    return $hashCode
  }
}

$list =  ConvertFrom-Csv @'
Id, LastName, FirstName, Country
 1, Aerts,    Ronald,    Belgium
 2, Berg,     Ashly,     Germany
 3, Cook,     James,     England
 4, Duval,    Frank,     France
 5, Lyberg,   Ash,       England
 6, Fischer,  Adam,      Germany
'@

# Pass the custom IEqualityComparer to the constructor.
$dict = [hashtable]::new([CaseInsensitiveArrayEqualityComparer]::new())

$list.ForEach({ $dict.($_.FirstName, $_.LastName) = $_ })

# Now, case does NOT matter.
$dict.('james', 'cook')

上面自定义比较器类中100 implementation的注释:

  • 需要自定义.GetHashCode()实现来为比较为相等的所有对象返回same哈希代码([int]值)(即,如果$o1 -eq $o2$true,则$o1.GetHashCode()$o2.GetHashCode()必须返回相同的值).

  • 虽然哈希代码不要求为unique(并且不可能在所有情况下都是unique),但理想情况下,尽可能少的对象共享相同的哈希代码,因为这减少了所谓的冲突的数量,从而降低了哈希表的查找效率-有关背景信息,请参阅相关的Wikipedia article.

  • 上面的实现使用了一个非常简单的基于-bxor的(按位异或)算法,这将为具有相同元素的两个数组生成相同的哈希代码,但却是different order.

    • .GetHashCode()帮助主题显示了更复杂的方法,包括使用auxiliary元组实例,因为its哈希代码算法是顺序感知的-虽然简单,但这种方法的计算成本很高,需要更多的工作才能获得更好的性能.请参见底部部分了解.净7+选项.

zett42collision test code(已调整),它确定zett420个数组中有多少个具有给定数量的随机字符串值的元素会导致same哈希码,即产生冲突,并由此计算冲突百分比.如果需要提高上述实现的效率,可以使用此代码对其进行测试.

# Create an instance of the custom comparer defined above.
$cmp = [CaseInsensitiveArrayEqualityComparer]::new()

$numArrays = 1000
foreach ($elementCount in 2..5 + 10) {

  $numUniqueHashes = (
      1..$numArrays | 
      ForEach-Object { 
        $cmp.GetHashCode(@(1..$elementCount | ForEach-Object { "$(New-Guid)" })) 
      } |
      Sort-Object -Unique
    ).Count

  [pscustomobject] @{
    ElementCount = $elementCount
    CollisionPercentage = '{0:P2}' -f (($numArrays - $numUniqueHashes) / $numArrays)
  }

}

上述实现的样本结果(具体百分比因字符串值的随机性而异),数组为25个元素,以及10个元素.

似乎-bxor方法足以防止冲突,至少对于随机字符串,并且不包括仅在元素order中不同的数组的变化.


Superior custom equality comparer implementation in .NET 7+(至少需要PowerShell 7.3的预览版本):

zett42指出,[HashCode]::Combine(),可用于.NET 7+,允许更高效的实施,因为它:

  • order-aware
  • 允许为single operation中的multiple个值确定哈希代码.

注:

  • 该方法仅限于at most 100 array elements-但对于多组分,这应该足够了.

  • 要组合的值(在本例中是数组元素)必须作为individual arguments传递给方法-传递数组as a whole不能按预期工作.这使得实现有些繁琐.

# .NET 7+ / PowerShell 7.3+
# Case-insensitive IEqualityComparer implementation for arrays
# using [HashCode]::Combine() - limited to 8 elements.
class CaseInsensitiveArrayEqualityComparer: System.Collections.IEqualityComparer {
  [bool] Equals([object] $o1, [object] $o2) {
    if ($o1 -isnot [array] -or $o2 -isnot [array]) { return $false }
    return ([System.Collections.IStructuralEquatable] $o1).Equals($o2, [System.StringComparer]::InvariantCultureIgnoreCase)
  }
  [int] GetHashCode([object] $o) {
    if ($o -isnot [Array] -or 0 -eq $o.Count) { return $o.GetHashCode() }
    $o = $o.ForEach({ $_ -is [string] ? $_.ToLowerInvariant() : $_ })
    $hashCode = switch ($o.Count) {
      1 { [HashCode]::Combine($o[0]) }
      2 { [HashCode]::Combine($o[0], $o[1]) }
      3 { [HashCode]::Combine($o[0], $o[1], $o[2]) }
      4 { [HashCode]::Combine($o[0], $o[1], $o[2], $o[3]) }
      5 { [HashCode]::Combine($o[0], $o[1], $o[2], $o[3], $o[4]) }
      6 { [HashCode]::Combine($o[0], $o[1], $o[2], $o[3], $o[4], $o[5]) }
      7 { [HashCode]::Combine($o[0], $o[1], $o[2], $o[3], $o[4], $o[5], $o[6]) }
      8 { [HashCode]::Combine($o[0], $o[1], $o[2], $o[3], $o[4], $o[5], $o[6], $o[7]) }
      default { throw 'Not implemented for more than 8 array elements.' }
    }
    return $hashCode
  }
}

.net相关问答推荐

从Couchbase删除_txn文档的推荐方法?""

ZstdNet库的问题:Src大小不正确,异常

升级到.NET8后,SignalR(在坞站容器上)网关损坏

如何从 tshark 的 stderr 捕获实时数据包计数?

为什么 PropertyInfo.SetValue 在此示例中不起作用以及如何使其起作用?

如何判断属性设置器是否公开

如何找到windows服务exe路径

StreamWriter.Flush() 和 StreamWriter.Close() 有什么区别?

ASP.NET MVC:隐藏字段值不会使用 HtmlHelper.Hidden 呈现

在目录中创建应用程序快捷方式

大型 WCF Web 服务请求因 (400) HTTP 错误请求而失败

Microsoft.Practices.ServiceLocation 来自哪里?

如何判断对象是否是某种类型的数组?

我应该在 LINQ 查询中使用两个where子句还是&&?

所有数组在 C# 中都实现了哪些接口?

ASP.NET Core (.NET Core) 和 ASP.NET Core (.NET Framework) 的区别

覆盖 ASP.NET MVC 中的授权属性

如何在 WPF 中的 Xaml 文件中添加注释?

使用 DateTime.ToString() 时获取日期后缀

通过继承扩展枚举