我正在使用ApachePOI从EXCEL文件中读取数据,并将其转换为对象列表.但现在我想将基于特定规则的任何重复项提取到该对象的另一个列表中,并获得非重复项列表.
判断重复项的条件
- 名字
- 邮箱
- 电话号码
- 商品及服务税编号
这些属性中的任何一个都可能导致重复.这意味着or分而不是and分
Party Class个
public class Party {
private String 名字;
private Long number;
private String 邮箱;
private String address;
private BigDecimal openingBalance;
private LocalDateTime openingDate;
private String gstNumber;
// Getter Setter Skipped
}
假设这是到目前为止由数据处理逻辑返回的列表
var firstParty = new Party();
firstParty.setName("Valid Party");
firstParty.setAddress("Valid");
firstParty.setEmail("Valid");
firstParty.setGstNumber("Valid");
firstParty.setNumber(1234567890L);
firstParty.setOpeningBalance(BigDecimal.ZERO);
firstParty.setOpeningDate(DateUtil.getDDMMDateFromString("01/01/2020"));
var secondParty = new Party();
secondParty.setName("Valid Party");
secondParty.setAddress("Valid Address");
secondParty.setEmail("Valid Email");
secondParty.setGstNumber("Valid GST");
secondParty.setNumber(7593612247L);
secondParty.setOpeningBalance(BigDecimal.ZERO);
secondParty.setOpeningDate(DateUtil.getDDMMDateFromString("01/01/2020"));
var thirdParty = new Party();
thirdParty.setName("Valid Party 1");
thirdParty.setAddress("address");
thirdParty.setEmail("邮箱");
thirdParty.setGstNumber("gst");
thirdParty.setNumber(7593612888L);
thirdParty.setOpeningBalance(BigDecimal.ZERO);
secondParty.setOpeningDate(DateUtil.getDDMMDateFromString("01/01/2020"));
var validParties = List.of(firstParty, secondParty, thirdParty);
到目前为止,我所try 的是:
var partyNameOccurrenceMap = validParties.parallelStream()
.map(Party::getName)
.collect(Collectors.groupingBy(Function.identity(), HashMap::new, Collectors.counting()));
var partyNameOccurrenceMapCopy = SerializationUtils.clone(partyNameOccurrenceMap);
var duplicateParties = validParties.stream()
.filter(party-> {
var occurrence = partyNameOccurrenceMap.get(party.getName());
if (occurrence > 1) {
partyNameOccurrenceMap.put(party.getName(), occurrence - 1);
return true;
}
return false;
})
.toList();
var nonDuplicateParties = validParties.stream()
.filter(party -> {
var occurrence = partyNameOccurrenceMapCopy.get(party.getName());
if (occurrence > 1) {
partyNameOccurrenceMapCopy.put(party.getName(), occurrence - 1);
return false;
}
return true;
})
.toList();
上面的代码只判断party 名字,但我们还需要判断邮箱、电话号码和商品及服务税编号.
上面写的代码运行得很好,但可读性、简洁性和性能可能会有问题,因为数据集足够大,比如EXCEL文件中的10k行