I have problems filtering numeric data in pandas.
我有10,000个数据,我需要它们筛选出第3列中的值>;然后是10.
第3列的数据是dtype对象,该列的单元格包含3种类型的数据:点(无)、12.25(单值)、12、45、12.5(多个值,由、分隔).
我试过:
- 使用str.Methods用‘,’分隔值
- 过滤大于10的拆分值
- 然后使用df.loc筛选具有筛选列的主数据帧(筛选值-来自筛选列的值==来自主数据帧的相同列)
#Data sample
{'POS': {0: 20482821,
1: 20482980,
2: 20483463,
3: 20485526,
4: 20485536,
5: 20485630,
6: 20485811,
7: 20485948,
8: 109274570,
9: 109274623,
10: 109274677,
11: 109274857,
12: 109274968,
13: 109275216,
14: 109275325,
15: 109275506,
16: 109275536,
17: 109275600,
18: 109275641,
19: 109275648,
20: 109275684,
21: 197042891,
22: 197042926,
23: 197043092,
24: 197043111},
'CHROM': {0: 'chr1',
1: 'chr1',
2: 'chr1',
3: 'chr1',
4: 'chr1',
5: 'chr1',
6: 'chr1',
7: 'chr1',
8: 'chr1',
9: 'chr1',
10: 'chr1',
11: 'chr1',
12: 'chr1',
13: 'chr1',
14: 'chr1',
15: 'chr1',
16: 'chr1',
17: 'chr1',
18: 'chr1',
19: 'chr1',
20: 'chr1',
21: 'chr3',
22: 'chr3',
23: 'chr3',
24: 'chr3'},
'CADD_phred': {0: 14.27,
1: '.',
2: '.',
3: 17.1,
4: 17.61,
5: '20.1,19.64',
6: 15.99,
7: 15.95,
8: 1.551,
9: 5.142,
10: 14.05,
11: 6.579,
12: 1.225,
13: 14.38,
14: 5.841,
15: 3.85,
16: 4.373,
17: '.',
18: 16.95,
19: 16.94,
20: 3.067,
21: '.',
22: 5.925,
23: 10.3,
24: 9.495}}