我试着解析一个.txt文件的一些数据,格式并不是很容易处理.分隔符是空格字符.该文件包含一个长度可变的字段,这是从右侧开始的第五列.因此,直到第四列我从左开始解析日期,然后我开始从右开始解析数据,直到我到达可变长度的字段.这个不错.但我的主要问题是,有时我的字段中没有任何内容,请参见第三行第三列.因此,使用我的代码我无法准确地解析数据.在输出文件中,解析并不适用于所有行.是否可以跳过空字段,以便sscanf可以识别这些字段?如果有人能给我一个如何正确解析数据的提示,那就太好了. 代码位于Onlinegdb:https://onlinegdb.com/8rOBlIfMhU

enter image description here

#include <stdio.h>
#include <string.h>

#define BUF 1500

// reverse a string
char *strrev(char *str)
{
      char *p1, *p2;

      if (! str || ! *str)
            return str;
      for (p1 = str, p2 = str + strlen(str) - 1; p2 > p1; ++p1, --p2)
      {
            *p1 ^= *p2;
            *p2 ^= *p1;
            *p1 ^= *p2;
      }
      return str;
}

int main()
{
    FILE *ptr=fopen("/tmp/abc123", "w");
    fputs(
   "10000   07/01/1986   68391610   68391610   OPTIMUM MANUFACTURING INC             OMFGA          7952    10                 10396      3       3         3990       3990     39       399    03/12/1986   OMFGA                     Q          A         R       -2.56250         1000         .             2.75000        2.37500        .             .             .       C             C           3680       2    30/01/1986      .          .            .         .              .                 .       .          .             .          .             .             .        1.00000     1.00             .            .            .        .        .      1        1        9       2       0.013809     0.013800     0.011061     0.011046     0.014954\n"
   "12781   30/11/1970   84857L10   50558810   LACLEDE GAS CO                        LG            21080    11                     0      1       1         4925       2741     27       274             .                             N          A         R       25.00000         3500         .            25.00000       24.00000        .             .             .      0.041667      0.041667     4141       .             .      .          .            .         .              .                 .       .          .             .          .             .             .        4.00000     4.0              .            .            .        .        .      .        .        .       .       0.016698     0.016439     0.021276     0.020949     0.014779\n"
   "13901   27/05/1955   02209S10              PHILIP MORRIS & CO LTD                              21398    11                     0      1       1         2110       2111     21       211             .                             N          A         R       42.00000         4400       40.87500       42.00000       40.87500        .             .             .      0.030675      0.030675      2887       .             .      .          .            .         .              .                 .       .          .             .          .             .             .        2479.29   576.000            .            .            .        .        .      .        .        .       .       0.001626     0.001543     0.001477     0.001381      .\n"    
   "13901   31/05/1955   02209S10              PHILIP MORRIS & CO LTD                              21398    11                     0      1       1         2110       2111     21       211             .                             N          A         R       41.37500         5600       42.12500       42.12500       41.00000        .             .             .     -0.014881     -0.014881      2887       .             .      .          .            .         .              .                 .       .          .             .          .             .             .        2479.29   576.000            .            .            .        .        .      .        .        .       .       0.000496     0.000165    -0.000448    -0.000851      .\n"    
   "13901   01/06/1955   02209S10              PHILIP MORRIS INC                                   21398    11                     0      1       1         2110       2111     21       211    01/07/1962                             N          A         R       40.00000        11300       40.87500       40.87500       40.00000        .             .             .     -0.033233     -0.033233      2887       2    29/12/1955      .          .            .         .              .                 .       .          .             .          .             .             .        2479.29   576.000            .            .            .        .        .      .        .        .       .       0.001683     0.001476    -0.000496    -0.000724      .\n"      
   "13901   02/06/1955   02209S10              PHILIP MORRIS INC                                   21398    11                     0      1       1         2110       2111     21       211             .                             N          A         R       39.87500         9600       40.00000       40.12500       39.87500        .             .             .     -0.003125     -0.003125      2887       .             .      .          .            .         .              .                 .       .          .             .          .             .             .        2479.29   576.000            .            .            .        .        .      .        .        .       .       0.003036     0.002973     0.002027     0.001912      .\n"      
   "13901   03/06/1955   02209S10              PHILIP MORRIS INC                                   21398    11                     0      1       1         2110       2111     21       211             .                             N          A         R       40.12500         5500       40.00000       40.62500       40.00000        .             .             .      0.006270      0.006270      2887       .             .      .          .            .         .              .                 .       .          .             .          .             .             .        2479.29   576.000            .            .            .        .        .      .        .        .       .       0.006440     0.006420     0.004233     0.004141      .\n"      
  ,ptr);
    fclose(ptr);

    FILE *fp, *fpp;
    fp=fopen("/tmp/abc123","r");
        char puffer[BUF];
        char a[1000],b[1000],c[1000],d[1000],e[1000],f[1000],g[1000],h[1000],i[1000],j[1000],k[1000],l[1000],m[1000],n[1000],o[1000],p[1000],q[1000],r[1000],s[1000],tt[1000],u[1000],v[1000]; // a->PERMNO; b->date; c->CUSIP; d->NCUSIP; e->COMNAM; f->DIVAMT; g->CFACPR
        char w[1000],x[1000],y[1000],z[1000],aa[1000],ab[1000],ac[1000],ad[1000],ae[1000],af[1000],ag[1000],ah[1000],ai[1000],aj[1000],ak[1000],al[1000],am[1000],an[1000],ao[1000],ap[1000];
        char aq[1000],ar[1000],as[1000],at[1000],au[1000],av[1000],aw[1000],ax[1000],ay[1000],az[1000],ba[1000],bb[1000],bc[1000],bd[1000],be[1000],bf[1000],bg[1000],bh[1000],bi[1000],bj[1000],bk[1000],bl[1000] ;
      
    fpp=fopen("output.txt","w");

    if(fpp==NULL)
    {
        printf("file could not be opened\n");
        return 1;
    }
    
  while(fgets(puffer, BUF, fp) != NULL)
    {
        int n1,n2;
        char t[1000];
        //parse first four columns from left side
        if( 4==sscanf(puffer,"%s%s%s%s%n",a,b,c,d,&n1) )
        //parse 57 cloumns from the right side
        if( 57 ==sscanf(strrev(strcpy(t,puffer)),"%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%n",bl,bk,bj,bi,bh,bg,bf,be,bd,bc,bb,ba,az,ay,ax,aw,av,au,at,as,ar,aq,ap,ao,an,am,al,ak,aj,ai,ah,ag,af,ae,ad,ac,ab,aa,z,y,x,w,v,u,tt,s,r,q,p,o,n,m,l,k,j,i,h,g,f,&n2));
        //parse the variable field, is simply what is left in the middle.
        if( 1==sscanf(puffer+n1+1,"%[^\n]",e) )
        e[strlen(e)-n2]=0,a,b,c,d,e;
                strrev(f), strrev(g),strrev(h), strrev(i), strrev(j), strrev(k),
                strrev(l),strrev(m), strrev(n), strrev(o), strrev(p), strrev(q);
                strrev(r), strrev(s), strrev(tt), strrev(u), strrev(v), strrev(w),
                strrev(x), strrev(y), strrev(z), strrev(aa), strrev(ab), strrev(ac),
                strrev(ad), strrev(ae), strrev(af), strrev(ag), strrev(ah), strrev(ai),
                strrev(aj), strrev(ak),strrev(al), strrev(am), strrev(an), strrev(ao),
                strrev(ap), strrev(aq), strrev(ar), strrev(as), strrev(at), strrev(au),
                strrev(av), strrev(aw), strrev(ax), strrev(ay), strrev(az), strrev(ba);
                strrev(bb), strrev(bc), strrev(bd), strrev(be), strrev(bf), strrev(bg);
                strrev(bh), strrev(bi);
        // print first 5 columns in the console
         printf("%s %s %s %s %s\n",a, b, c, d, e);
        // print all parsed columns in output.txt file
         fprintf(fpp,"%s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s\n",a,b,c,d,e,f, g, h, i, j, k, l, m ,n, o,p,q,r,s,tt,u,v,w,x,y,z,aa,ab,ac,ad,ae,af,ag,ah,ai,aj,ak,al,am,an,ao,ap,aq,ar,as,at,au,av,aw,ax,ay,az,ba,bb,bc,bd,be,bf,bg,bh,bi);
       
    }
     
    fclose(ptr);
    return 0;
}

推荐答案

如果所有列都有固定的长度(包括填充),您问题中提到的问题就很容易解决.除了第30/31st列之外,所有列似乎都是这种情况.由于您已经表示,输入中的这种不一致并不是故意的,我将只回答前10列,因为这些列是您在问题中询问的列.

您的输入数据似乎具有以下格式:

文件中的每一行

  • column #1开头,由5个字符组成,后跟3个空格,
  • 后跟column #2,由10个字符组成,后跟3个空格
  • 后跟column #3,由8个字符组成,后跟3个空格
  • 接着是由8个字符(也可以是空格)组成的column #4,接着是3个空格,
  • 接着是由25个字符(也可以是空格)组成的column #5,接着是13个空格,
  • 后跟column #6,由5个字符(也可以是空格)组成,后跟9个空格
  • 接着是由5个字符(也可以是空格)组成的column #7,接着是4个空格,
  • 后跟column #8,由2个字符组成,后跟17个空格,
  • 接着是由5个字符(也可以是空格)组成的column #9,接着是6个空格,
  • 后跟column #10,由1个字符组成,后跟7个空格,
  • 然后是出于上述原因我将忽略的进一步投入.

如果您的输入数据比您发布的数据多,则上述规则可能不正确.有可能在您没有向我们显示的输入数据中,有一个字段可能更大,因此两列之间的空格填充较少.但是,出于演示目的,我将假设输入数据的规则如上所述.您可能需要根据实际输入数据调整这些规则.

如果您告诉程序每列中数据的长度以及列后用于填充的空格的数量,那么程序将能够确定哪些字符属于哪一列.这样,程序将能够查找和读取仅由空格组成的空列.

我不建议对sscanf使用%s转换格式说明符,因为那样会忽略任何前导空格字符.不允许忽略这些空格字符,因为需要判断空格字符的数量才能找到空字段.相反,我建议单独处理角色.

以下是使用输入数据的前10列的示例:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define NUM_COLUMNS 10
#define MAX_COLUMN_LENGTH 200

//NOTE: This number should not be set too high, otherwise a
//stack overflow may occur. On non-embedded platforms, a
//size of up to 500,000 should generally be safe.
#define MAX_LINE_LENGTH 8000

int main( void )
{
    //local variable declarations
    FILE *fp;
    char line[MAX_LINE_LENGTH+1];

    //define the column lengths (data length and padding length)
    const struct column_length
    {
        int data;
        int padding;

    } column_lengths[NUM_COLUMNS] =
    {
        {  5,  3 },
        { 10,  3 },
        {  8,  3 },
        {  8,  3 },
        { 25, 13 },
        {  5,  9 },
        {  5,  4 },
        {  2, 17 },
        {  5,  6 },
        {  1,  0 }
    };

    //open temporary file
    fp = tmpfile();
    if ( fp == NULL )
    {
        fprintf( stderr, "Error opening temporary file!\n" );
        exit( EXIT_FAILURE );
    }

    //write data to temporary file
    fputs(
       "10000   07/01/1986   68391610   68391610   OPTIMUM MANUFACTURING INC             OMFGA          7952    10                 10396      3\n"
       "12781   30/11/1970   84857L10   50558810   LACLEDE GAS CO                        LG            21080    11                     0      1\n"
       "13901   27/05/1955   02209S10              PHILIP MORRIS & CO LTD                              21398    11                     0      1\n"
       "13901   31/05/1955   02209S10              PHILIP MORRIS & CO LTD                              21398    11                     0      1\n"
       "13901   01/06/1955   02209S10              PHILIP MORRIS INC                                   21398    11                     0      1\n"
       "13901   02/06/1955   02209S10              PHILIP MORRIS INC                                   21398    11                     0      1\n"
       "13901   03/06/1955   02209S10              PHILIP MORRIS INC                                   21398    11                     0      1\n",
       fp
    );

    //seek back to the beginning of the temporary file
    rewind( fp );

    //process one row per loop iteration
    for ( int row = 0; fgets(line,sizeof line,fp) != NULL; row++ )
    {
        //2D array for reading in all fields of a row
        char fields[NUM_COLUMNS][MAX_COLUMN_LENGTH+1];

        //this pointer will always point to the next
        //character of the line to process
        const char *p = line;

        printf( "Processing row #%d:\n", row );

        //verify that an entire line was read
        if ( strchr(line,'\n') == NULL && !feof(fp) )
        {
            fprintf(
                stderr,
                "Error processing row #%d:\n"
                "Buffer was too small to read the entire line.\n"
                "The macro constant MAX_COLUMN_LENGTH may have to be increased.\n",
                row
            );
            exit( EXIT_FAILURE );
        }

        //process one column per loop iteration
        for ( int col = 0; col < NUM_COLUMNS; col++ )
        {
            //verify that buffer size is large enough
            if ( column_lengths[col].data >= (int)sizeof fields[0] )
            {
                fprintf(
                    stderr,
                    "Error processing column #%d on row #%d: Buffer is too small!\n"
                    "The macro constant MAX_COLUMN_LENGTH must be increased.\n",
                    col, row
                );
                exit( EXIT_FAILURE );
            }

            //extract data of the field
            for ( int i = 0; i < column_lengths[col].data; i++ )
            {
                if ( *p == '\0' || *p == '\n' )
                {
                    fprintf(
                        stderr,
                        "Error processing column #%d on row #%d:\n"
                        "Unexpected end of input encountered while reading the data area of a field!\n",
                        col, row
                    );
                    exit( EXIT_FAILURE );
                }

                fields[col][i] = *p;

                p++;
            }

            //add terminating null character
            fields[col][column_lengths[col].data] = '\0';

            //print the extracted field
            printf( "  Field #%d: \"%s\"\n", col, fields[col] );

            //skip padding and verify that it consists only of spaces
            for ( int i = 0; i < column_lengths[col].padding; i++ )
            {
                if ( *p != ' ' )
                {
                    if ( *p == '\0' || *p == '\n' )
                    {
                        fprintf(
                            stderr,
                            "Error processing column #%d on row #%d:\n"
                            "Unexpected end of input encountered while skipping the padding area of a field!\n",
                            col, row
                        );
                    }
                    else
                    {
                        fprintf(
                            stderr,
                            "Error processing column #%d on row #%d:\n"
                            "Non-space padding character encountered!\n",
                            col, row
                        );
                    }

                    exit( EXIT_FAILURE );
                }

                p++;
            }
        }

        printf( "\n" );
    }

    fclose( fp );
}

该程序的输出如下:

Processing row #0:
  Field #0: "10000"
  Field #1: "07/01/1986"
  Field #2: "68391610"
  Field #3: "68391610"
  Field #4: "OPTIMUM MANUFACTURING INC"
  Field #5: "OMFGA"
  Field #6: " 7952"
  Field #7: "10"
  Field #8: "10396"
  Field #9: "3"

Processing row #1:
  Field #0: "12781"
  Field #1: "30/11/1970"
  Field #2: "84857L10"
  Field #3: "50558810"
  Field #4: "LACLEDE GAS CO           "
  Field #5: "LG   "
  Field #6: "21080"
  Field #7: "11"
  Field #8: "    0"
  Field #9: "1"

Processing row #2:
  Field #0: "13901"
  Field #1: "27/05/1955"
  Field #2: "02209S10"
  Field #3: "        "
  Field #4: "PHILIP MORRIS & CO LTD   "
  Field #5: "     "
  Field #6: "21398"
  Field #7: "11"
  Field #8: "    0"
  Field #9: "1"

Processing row #3:
  Field #0: "13901"
  Field #1: "31/05/1955"
  Field #2: "02209S10"
  Field #3: "        "
  Field #4: "PHILIP MORRIS & CO LTD   "
  Field #5: "     "
  Field #6: "21398"
  Field #7: "11"
  Field #8: "    0"
  Field #9: "1"

Processing row #4:
  Field #0: "13901"
  Field #1: "01/06/1955"
  Field #2: "02209S10"
  Field #3: "        "
  Field #4: "PHILIP MORRIS INC        "
  Field #5: "     "
  Field #6: "21398"
  Field #7: "11"
  Field #8: "    0"
  Field #9: "1"

Processing row #5:
  Field #0: "13901"
  Field #1: "02/06/1955"
  Field #2: "02209S10"
  Field #3: "        "
  Field #4: "PHILIP MORRIS INC        "
  Field #5: "     "
  Field #6: "21398"
  Field #7: "11"
  Field #8: "    0"
  Field #9: "1"

Processing row #6:
  Field #0: "13901"
  Field #1: "03/06/1955"
  Field #2: "02209S10"
  Field #3: "        "
  Field #4: "PHILIP MORRIS INC        "
  Field #5: "     "
  Field #6: "21398"
  Field #7: "11"
  Field #8: "    0"
  Field #9: "1"

如您所见,已正确读取空字段.

如果需要,可以在以后进一步处理这些字段,例如,通过删除所有前导和尾随空格字符,并使用函数strtol将数字转换为int值.

C++相关问答推荐

C限制限定符是否可以通过指针传递?

为什么这个select()会阻止?

如何将已分配的数组(运行时已知的大小)放入 struct 中?

在没有动态内存分配的情况下,用C语言最快地将各种数组复制到单个较大的数组中

为什么将函数名括在括号中会禁用隐式声明?

GTK3按钮信号错误

S将C语言宏定义为自身的目的是什么?(在glibc标题中看到)

如何计算打印二叉搜索树时每行所需的空间?

如何用c语言修改shadow文件hash部分(编程)?

用C++构建和使用DLL的困惑

通过对一个大的Malloc内存进行切片来使用Malloc的内存片

生产者消费者计数器意外输出的C代码

即使我在C++中空闲,也肯定会丢失内存

为什么GCC 13没有显示正确的二进制表示法?

UpDown控制与预期相反

x86-64平台上的int_fast8_t大小与int_fast16_t大小

使用fread()函数读取txt文件

C23 中的 [[reproducible]] 和 [[unsequenced]] 属性是什么?什么时候应该使用它们?

为什么使用 C 引用这个 char 数组会导致 Stack smasing?

设置具有非零终止字符串的大整数