我在一个数据集中有以下列(样例作为字典导出)

'amenities': {1913: '[Cooking basics, Hair dryer, Fire extinguisher, Microwave, Refrigerator, Dedicated workspace, Pocket wifi, Lock on bedroom door, Dishes and silverware, Private living room, Essentials, Free street parking, Oven, Paid parking off premises, Hot water, Paid washer  In building, Extra pillows and blankets, Hangers, Backyard, Shampoo, TV with standard cable, Stove, Heating, First aid kit, Self check-in, Iron, Smoke alarm, Lockbox, Bed linens, Kitchen, Patio or balcony, Coffee maker, Crib, Wifi]',
  11765: '[Cooking basics, Hair dryer, Courtyard view, Coffee maker: drip coffee maker, Standalone high chair - available upon request, Outdoor furniture, Fire extinguisher, Coffee, Microwave, Shared backyard  Fully fenced, Refrigerator, Dishes and silverware, TV, Essentials, Free washer  In unit, Heating - split type ductless system, Oven, Hot water, Extra pillows and blankets, Toaster, Hangers, Long term stays allowed, Trash compactor, Shower gel, Private patio or balcony, Clothing storage: closet, Cleaning products, Drying rack for clothing, City skyline view, Shampoo, Central air conditioning, Garden view, Freezer, Hot water kettle, Host greets you, Dishwasher, Babysitter recommendations, Wifi, Elevator, Private entrance, First aid kit, Baby bath - available upon request, Safe, AC - split type ductless system, Room-darkening shades, Iron, Smoke alarm, Bed linens, Wine glasses, Kitchen, Body soap, Dining table, Sun loungers, Crib, Stainless steel electric stove, Conditioner, Pack n play/Travel crib - available upon request, Free parking on premises]',
  9320: '[Air conditioning, Free street parking, Fire extinguisher, Dedicated workspace, First aid kit, Paid street parking off premises]'}

我正在try 做的是使用我手动创建的词典(请参见下面的示例,完整的数据集超过150个条目)来清理This列.

{'Silver refrigerator': 'Refrigerator',
 'Electronia refrigerator': 'Refrigerator',
 'Kunft refrigerator': 'Refrigerator',
 'com zona de congelao refrigerator': 'Refrigerator',
 'BEKO  refrigerator': 'Refrigerator',
 'Teka  refrigerator': 'Refrigerator',
 'Desconhecida refrigerator': 'Refrigerator',
 'Pequeno com espao de congelao refrigerator': 'Refrigerator',
 'SMEG  refrigerator': 'Refrigerator',
 'Indiferente refrigerator': 'Refrigerator',
 'ORIMA refrigerator': 'Refrigerator',
 'Hotpoint refrigerator': 'Refrigerator',
 'JOCEL refrigerator': 'Refrigerator',
 'Frigorico com congelador de encastre - BALAY refrigerator': 'Refrigerator',
 'Grote Koelkast refrigerator': 'Refrigerator',
 'SMEG refrigerator': 'Refrigerator',
 'Samsung refrigerator': 'Refrigerator',
 'Americano refrigerator': 'Refrigerator',
 'Candy  refrigerator': 'Refrigerator',
 'Bosch refrigerator': 'Refrigerator',
 'Lg  refrigerator': 'Refrigerator',
 'Resort access': 'Resort access'}

特别是,我try 做的是判断字典的关键字是否在列表中,并用字典的值替换它.

我写了以下函数,但它不起作用.输出是一个列表列表,其中每个列表只是一个字母.我试着用一个简单的例子运行相同的函数,它工作正常.我做错了什么?

def clean_words(word_list, replacement_dict):
    cleaned_words = [replacement_dict.get(word, word) for word in word_list]
    return cleaned_words

df['amenities'] = df['amenities'].apply(clean_words, replacement_dict=replacement_dict)

推荐答案

问题是你没有一份 list ,而是一份string.遗憾的是,由于内部字符串没有用引号引起来,因此这不是有效的python列表表示形式,您不能使用ast.literal_eval.

其中一个选项是设置为split:

def clean_words(word_list, replacement_dict):
    cleaned_words = '[%s]' % ', '.join(
                            [replacement_dict.get(word, word) for word
                             in word_list[1:-1].split(', ')])
    return cleaned_words

df['amenities']= df['amenities'].apply(clean_words, replacement_dict=replacement_dict)

Python相关问答推荐

如何在层之间添加任意函数?

JAX是否保存了JIT编译函数的jaxpr?

NumPy使用其他2个3D数组和一个1D数组创建一个3D数组来区分

在HS代码之前获取字符串:数字(不包括HS代码:某个数字)

在错误处理期间使用字典理解中的变量是否安全?

我如何沿着我的图表绘制一条线来显示哪里的数据密度最高?

通过np.interp的辅助x轴:如何更改xlims

类型错误:TIP计算器中/:';str';和';Float';的操作数类型(S)不受支持

按多列分组并将结果广播回数据帧中的每一行

根据每个组中唯一值的数量,按组延迟填充Python Polars中的空值

Pandas 百分比变化矩阵

只有大小写更改时,OpenPyXL中的工作表意外重命名

如何设置日志(log)文件的不可更改路径

更高效地将新数据添加到现有数据帧--PYTHON

如何根据日期范围合并两个字符串?

如何用SymPy解矩阵组

召回分数!=使用COMPUSTION_MATRIX手动计算

在python中设置转化率(以品脱为单位)

累加器没有累加

在PyGame中使用Custom属性更新矩形时遇到问题