Background

在PowerShell中,构建hash table以通过特定属性快速访问对象是很常见的,例如,将索引基于LastName:

$List =  ConvertFrom-Csv @'
Id, LastName, FirstName, Country
 1, Aerts,    Ronald,    Belgium
 2, Berg,     Ashly,     Germany
 3, Cook,     James,     England
 4, Duval,    Frank,     France
 5, Lyberg,   Ash,       England
 6, Fischer,  Adam,      Germany
'@

$Index = @{}
$List |ForEach-Object { $Index[$_.LastName] = $_ }

$Index.Cook

Id LastName FirstName Country
-- -------- --------- -------
3  Cook     James     England

在某些情况下,需要在两个(甚至更多)属性上建立索引,例如FirstNameLastName.为此,您可以创建多维键,例如:

$Index = @{}
$List |ForEach-Object {
     $Index[$_.FirstName] = @{}
     $Index[$_.FirstName][$_.LastName] = $_
}

$Index.James.Cook

Id LastName FirstName Country
-- -------- --------- -------
3  Cook     James     England

But it is easier (and possibly even faster) to just concatenate the two properties. If only for checking for the existence of the entry: $Index.ContainsKey('James').ContainsKey('Cook') where an error might occur if the FirstName doesn't exist.
To join the properties, it is required to use a delimiter between the property otherwise different property lists might end up as the same key. As this example: AshlyBerg and AshLyberg.

$Index = @{}
$List |ForEach-Object { $Index["$($_.FirstName)`t$($_.LastName)"] = $_ }

$Index."James`tCook"

Id LastName FirstName Country
-- -------- --------- -------
3  Cook     James     England

Note:以上为Minimal, Reproducible Examples.在现实生活中,我多次遇到下面的问题,其中包括通常在索引中使用的属性的背景和数量可变的情况下连接对象.

Questions:

  1. 对于这种情况,连接(串联)属性是一种好的做法吗?
  2. 如果是,是否有(标准?)此的分隔符?(表示属性名称中不应使用/存在的字符或字符序列)

推荐答案

no built-in separator for multi-component hashtable (dictionary) keys个.

至于custom separator:对于组件本身中不太可能出现的字符,您的最佳匹配是100(带有code point 101的字符),您可以在PowerShell中表示为102.但是,performing convention-based string operations on every lookup is awkward(例如$Index."James`0Cook")和通常only works if stringifying the key components is feasible-或者如果它们都是字符串开头,如您的示例所示.

Using arrays for multi-component keys is syntactically preferable, but using collections generally does not work as-is,因为.NET reference types通常不会对不同的实例进行有意义的比较,即使它们恰好表示相同的数据-请参见this answer.

  • 注: The following assumes that the elements of collections serving as keys do compare meaningfully (are themselves strings or .NET value types or .NET reference types with custom equality logic). If that assumption doesn't hold, there's no robust general solution, but a best-effort approach based on CLIXML serialization shown in the linked answer may work, which you yourself have proposed.

zett42's helpful answer使用tuples,其中do perform meaningful comparisons个不同实例的members包含相同的数据.

There is a way of making regular PowerShell arrays work as hastable keys,以only adds complexity to creating the hashtable(调用构造函数)、while allowing regular array syntax for additions / updates and lookups(例如$Index.('James', 'Cook'))的方式.

  • 注: The following works equally with [ordered] hashtables, which, however, must be referred to by their true type name so as to be able to call a construct, namely [System.Collections.Specialized.OrderedDictionary].
    However, it does not work with generic dictionaries ([System.Collections.Generic.Dictionary[TKey, TValue]]).
# Sample objects for the hashtable.
$list =  ConvertFrom-Csv @'
Id, LastName, FirstName, Country
 1, Aerts,    Ronald,    Belgium
 2, Berg,     Ashly,     Germany
 3, Cook,     James,     England
 4, Duval,    Frank,     France
 5, Lyberg,   Ash,       England
 6, Fischer,  Adam,      Germany
'@

# Initialize the hashtable with a structural equality comparer, i.e.
# a comparer that compares the *elements* of the array and only returns $true
# if *all* compare equal.
# This relies on the fact that [System.Array] implements the
# [System.Collections.IStructuralEquatable] interface.
$dict = [hashtable]::new([Collections.StructuralComparisons]::StructuralEqualityComparer)

# Add entries that map the combination of first name and last name 
# to each object in $list.
# Note the regular array syntax.
$list.ForEach({ $dict.($_.FirstName, $_.LastName) = $_ })

# Use regular array syntax for lookups too.
# 注: CASE MATTERS
$dict.('James', 'Cook')

Important:与常规PowerShell哈希表不同,上述performs case-SENSITIVE comparisons(正如zett42的元组解决方案所做的那样).

Making the comparisons case-INSENSITIVE requires more work,因为需要[System.Collections.IEqualityComparer]接口的自定义实现,即[System.Collections.StructuralComparisons]::StructuralEqualityComparer提供的内容的case-insensitive实现:

# Case-insensitive IEqualityComparer implementation for arrays.
# See the bottom section of this answer for a better .NET 7+ alternative.
class CaseInsensitiveArrayEqualityComparer: System.Collections.IEqualityComparer {
  [bool] Equals([object] $o1, [object] $o2) {
    if ($o1 -isnot [array] -or $o2 -isnot [array]) { return $false }
    return ([System.Collections.IStructuralEquatable] $o1).Equals($o2, [System.StringComparer]::InvariantCultureIgnoreCase)
  }
  [int] GetHashCode([object] $o) {
    if ($o -isnot [Array]) { return $o.GetHashCode() }
    [int] $hashCode = 0
    foreach ($el in $o) {
      if ($null -eq $el) { 
        continue
      } elseif ($el -is [string]) {
        $hashCode = $hashCode -bxor $el.ToLowerInvariant().GetHashCode()
      } else {
        $hashCode = $hashCode -bxor $el.GetHashCode()
      }
    }
    return $hashCode
  }
}

$list =  ConvertFrom-Csv @'
Id, LastName, FirstName, Country
 1, Aerts,    Ronald,    Belgium
 2, Berg,     Ashly,     Germany
 3, Cook,     James,     England
 4, Duval,    Frank,     France
 5, Lyberg,   Ash,       England
 6, Fischer,  Adam,      Germany
'@

# Pass the custom IEqualityComparer to the constructor.
$dict = [hashtable]::new([CaseInsensitiveArrayEqualityComparer]::new())

$list.ForEach({ $dict.($_.FirstName, $_.LastName) = $_ })

# Now, case does NOT matter.
$dict.('james', 'cook')

上面自定义比较器类中100 implementation的注释:

  • 需要自定义.GetHashCode()实现来为比较为相等的所有对象返回same哈希代码([int]值)(即,如果$o1 -eq $o2$true,则$o1.GetHashCode()$o2.GetHashCode()必须返回相同的值).

  • 虽然哈希代码不要求为unique(并且不可能在所有情况下都是unique),但理想情况下,尽可能少的对象共享相同的哈希代码,因为这减少了所谓的冲突的数量,从而降低了哈希表的查找效率-有关背景信息,请参阅相关的Wikipedia article.

  • 上面的实现使用了一个非常简单的基于-bxor的(按位异或)算法,这将为具有相同元素的两个数组生成相同的哈希代码,但却是different order.

    • .GetHashCode()帮助主题显示了更复杂的方法,包括使用auxiliary元组实例,因为its哈希代码算法是顺序感知的-虽然简单,但这种方法的计算成本很高,需要更多的工作才能获得更好的性能.请参见底部部分了解.净7+选项.

zett42collision test code(已调整),它确定zett420个数组中有多少个具有给定数量的随机字符串值的元素会导致same哈希码,即产生冲突,并由此计算冲突百分比.如果需要提高上述实现的效率,可以使用此代码对其进行测试.

# Create an instance of the custom comparer defined above.
$cmp = [CaseInsensitiveArrayEqualityComparer]::new()

$numArrays = 1000
foreach ($elementCount in 2..5 + 10) {

  $numUniqueHashes = (
      1..$numArrays | 
      ForEach-Object { 
        $cmp.GetHashCode(@(1..$elementCount | ForEach-Object { "$(New-Guid)" })) 
      } |
      Sort-Object -Unique
    ).Count

  [pscustomobject] @{
    ElementCount = $elementCount
    CollisionPercentage = '{0:P2}' -f (($numArrays - $numUniqueHashes) / $numArrays)
  }

}

上述实现的样本结果(具体百分比因字符串值的随机性而异),数组为25个元素,以及10个元素.

似乎-bxor方法足以防止冲突,至少对于随机字符串,并且不包括仅在元素order中不同的数组的变化.


Superior custom equality comparer implementation in .NET 7+(至少需要PowerShell 7.3的预览版本):

zett42指出,[HashCode]::Combine(),可用于.NET 7+,允许更高效的实施,因为它:

  • order-aware
  • 允许为single operation中的multiple个值确定哈希代码.

注:

  • 该方法仅限于at most 100 array elements-但对于多组分,这应该足够了.

  • 要组合的值(在本例中是数组元素)必须作为individual arguments传递给方法-传递数组as a whole不能按预期工作.这使得实现有些繁琐.

# .NET 7+ / PowerShell 7.3+
# Case-insensitive IEqualityComparer implementation for arrays
# using [HashCode]::Combine() - limited to 8 elements.
class CaseInsensitiveArrayEqualityComparer: System.Collections.IEqualityComparer {
  [bool] Equals([object] $o1, [object] $o2) {
    if ($o1 -isnot [array] -or $o2 -isnot [array]) { return $false }
    return ([System.Collections.IStructuralEquatable] $o1).Equals($o2, [System.StringComparer]::InvariantCultureIgnoreCase)
  }
  [int] GetHashCode([object] $o) {
    if ($o -isnot [Array] -or 0 -eq $o.Count) { return $o.GetHashCode() }
    $o = $o.ForEach({ $_ -is [string] ? $_.ToLowerInvariant() : $_ })
    $hashCode = switch ($o.Count) {
      1 { [HashCode]::Combine($o[0]) }
      2 { [HashCode]::Combine($o[0], $o[1]) }
      3 { [HashCode]::Combine($o[0], $o[1], $o[2]) }
      4 { [HashCode]::Combine($o[0], $o[1], $o[2], $o[3]) }
      5 { [HashCode]::Combine($o[0], $o[1], $o[2], $o[3], $o[4]) }
      6 { [HashCode]::Combine($o[0], $o[1], $o[2], $o[3], $o[4], $o[5]) }
      7 { [HashCode]::Combine($o[0], $o[1], $o[2], $o[3], $o[4], $o[5], $o[6]) }
      8 { [HashCode]::Combine($o[0], $o[1], $o[2], $o[3], $o[4], $o[5], $o[6], $o[7]) }
      default { throw 'Not implemented for more than 8 array elements.' }
    }
    return $hashCode
  }
}

.net相关问答推荐

跨请求共享数据

NETSDK1083:无法识别指定的 RuntimeIdentifierwin10-x64

无法加载文件或程序集 不支持操作. (来自 HRESULT 的异常:0x80131515)

如何在 ASP.NET Core MVC 中读取操作方法的属性?

在 ASP.NET MVC 中我可以在哪里放置自定义类?

如何摆脱 VS2008 中的目标程序集不包含服务类型错误消息?

不同命名空间中的部分类

我应该从 .NET 中的 Exception 或 ApplicationException 派生自定义异常吗?

使用只读属性或方法?

[DllImport("QCall")] 是什么?

如何让 .NET 的 Path.Combine 将正斜杠转换为反斜杠?

无法将文件 *.mdf 作为数据库附加

String.Format - 它是如何工作的以及如何实现自定义格式字符串

react 式扩展使用的好例子

自定义属性的构造函数何时运行?

Linq to SQL - 返回前 n 行

C# 中的 override 和 new 关键字有什么区别?

C# (.NET) 设计缺陷

在 IIS 中访问 .svc 文件时出现 HTTP 404

如何从 webclient 获取状态码?