C++ Question Help parsing variable length C-string

Zatnikitelman · Apr 20, 2012

I'm having some problems parsing a variable length string of numbers in C++. Basically, I have a string that has an int, a sequence of ints equal to the first int, then several doubles. For example:

5 8 9 10 12 14 0.0 0.0 1.0 1.0 0.0 0.0 45.0

The number of numbers in bold, is equal to the first number. So if I have a 1, then I'll only have one more number before the 0.0 0.0 1.0 1.0 0.0 0.0 45.0 sequence, if I have a 3, I'll have 3 numbers in there etc. I'm having some problems parsing this string however. If it was fixed width, I'd just use sscanf with the proper format, but the code won't know how many ints are in the bolded section, until it reads the first int.
This is my latest attempt at parsing this using vertical pipes around the variable length section, but it won't work because the ints might be more than one character long and in this solution, I'm only incrementing one character at a time:

Code:

// char* strr = "5|8|9|10|12|14|0.0 0.0 1.0 1.0 0.0 0.0 45.0" <- effectively the state of things at this point
char* pch = strtok(strr,"|");
int i = 0;
while(pch != NULL)
{
	sscanf(pch,"%d",ram->grouplist[i]);
	pch = strtok(NULL,"|");
	i++;
}
char strv[1024];
strcpy(strv,strr);
sscanf(strv,"%lf %lf %lf %lf %lf %lf %lf",ram->pos.x,ram->pos.y,ram->pos.z,ram->axis.x,ram->axis.y,ram->axis.z,ram->angle);

So does anyone have any ideas how to parse this?
Thanks for any help you can provide,
Matt

Urwumpe · Apr 20, 2012

Did you already look at the Boost++ RegEx library? AFAIR, it has captures like Lua that you could use.

jedidia · Apr 20, 2012

Here's how I do it (or, better put, how vchamp does it, since I ripped this from his code):

Code:

        vector<string> tokens;
        Tokenize(strr, tokens);
        for (UINT i = 0; i < tokens.size(); i++)
        {
            double paramvalue;
            sscanf(tokens[i].data(), "%lf", &paramvalue); 
            //now do with the value what you need to do with it
        }

You can also tokenize using other delimitnators (space is default), and you can of course further tokenize a string you received from tokenizing. It let's you comfortably group values by different deliminators. like 1,2,3 4,5,6 7,8,9 Can be chipped to three strings of "1,2,3", "4,5,6", "7,8,9", after tokenizing with a space deliminator, and then giving you the individual values when running each string with a comma-deliminator.

So in your case, you could tokenize using a | deliminator, then take the last token and tokenize it with a space deliminator.

Hielor · Apr 20, 2012

Can't you just use a space in strtok instead of a pipe?

Zatnikitelman · Apr 21, 2012

Ok, I've gone all sorts of directions today, and I've ended up with Hielor's idea sort of. But I'm still having a problem. I'm using codepad.org to prototype my code, and it keeps either timing out, or giving segmentation fault errors. Here's what I've come up with so far and I'm tired of staring at it so I'm going to post it and see if anyone sees anything I missed.
Thanks.

Code:

int main()
{
char strr[] = {"5 8 9 10 12 14 0.0 0.0 1.0 1.0 0.0 0.0 45.0"};
int x = 0;
sscanf(strr,"%d",&x);
printf("%d\n",x);
char strc[1024];
strcpy(strc,strr);
puts(strr);
char* pch = strtok(strr," ");
int i = 0;
int grouplist[5];
//The while loop is where the errors appear, in this configuration, it times out on codepad.
while(pch != NULL)
{
        if(i<5)
        {
	sscanf(pch,"%d",&grouplist[i]);
        pch = strtok(NULL," ");        
        }
        i++;
}
for(int k = 0; k<5; k++)
{
   //printf("Value %d: %d\n",k,grouplist[k]);
}
puts(pch);
char strv[1024];
strcpy(strv,strc);
puts(strv);
}

Hielor · Apr 21, 2012

Zatnikitelman said:
Ok, I've gone all sorts of directions today, and I've ended up with Hielor's idea sort of. But I'm still having a problem. I'm using codepad.org to prototype my code, and it keeps either timing out, or giving segmentation fault errors. Here's what I've come up with so far and I'm tired of staring at it so I'm going to post it and see if anyone sees anything I missed.
Thanks.

Code:

... while(pch != NULL) { if(i<5) { sscanf(pch,"%d",&grouplist[i]); pch = strtok(NULL," "); } i++; } ...

The while loop times out because it doesn't have a proper exit condition. For i >= 5, the while loop doesn't do anything other than increment i. Think about it--if i >= 5, the if condition fails, so your loop becomes:

Code:

while(pch != NULL)
{
    i++;
}

Pretty easy to see why that times out...

Zatnikitelman · Apr 21, 2012

Except the while loop should exit when pch finally equals null, i isn't the exit condition.

Hielor · Apr 21, 2012

Zatnikitelman said:
Except the while loop should exit when pch finally equals null, i isn't the exit condition.

Nice theory, but that's not what the code does.

Think: once i has reached 5, how does pch become null?

Hint: it can't.

jthill · Apr 22, 2012

Here's a very rough sketch:

Code:

    int nints;
    int *ints;
    double *doubles;

    const char *thestring = "5 1 2 3 4 5 1.3";
    int scanned;

    sscanf(thestring," %d%n", &nints, &scanned);
    thestring += scanned;
    
    while ( nints-- ) {
        sscanf(thestring," %d%n", ints++, &scanned);
        thestring += scanned;
    }

    while ( sscanf(thestring," %lf%n", doubles++, &scanned) == 1 )
        thestring += scanned;

Zatnikitelman · Apr 25, 2012

Thanks everyone who responded, and jthill in particular. I had forgotten about using %n in the sscanf format string and I think I have a solution. I'm not able to test yet, but here's what I've come up with so far.

Code:

int main()
{
const char* strr = "5 5 6 7 8 9 1.1 1.2 0.0 0.0 0.0 0.0 45";
int n = 2;
char strs[1024];
strcpy(strs,strr+n);
int i = 0;
int ints;
while(i<5)
{
	int n2 = 0;
	sscanf(strs,"%d%n",&ints,&n2);
	strcpy(strs,strs+n2);
	i++;
}
puts(strs);
}

Yes, the number of characters I'm trying to read is hardcoded in that example, but for testing purposes, it seems to work on codepad.
Thanks again!

Hielor · Apr 25, 2012

Zatnikitelman said:
Thanks everyone who responded, and jthill in particular. I had forgotten about using %n in the sscanf format string and I think I have a solution. I'm not able to test yet, but here's what I've come up with so far.

Code:

int main() { const char* strr = "5 5 6 7 8 9 1.1 1.2 0.0 0.0 0.0 0.0 45"; int n = 2; char strs[1024]; strcpy(strs,strr+n); int i = 0; int ints; while(i<5) { int n2 = 0; sscanf(strs,"%d%n",&ints,&n2); strcpy(strs,strs+n2); i++; } puts(strs); }

Yes, the number of characters I'm trying to read is hardcoded in that example, but for testing purposes, it seems to work on codepad.
Thanks again!

The usage of strs and strcpy seems odd and inefficient to me, especially since you're violating the strcpy guidance ("destination...should not overlap in memory with source.")

Basically that means that you might end up with unexpected behavior on some systems, depending on how strcpy is implemented.

Why can you just have "strs" be a const char * into strr and then move it accordingly?

Code:

int main()
{
const char* strr = "5 5 6 7 8 9 1.1 1.2 0.0 0.0 0.0 0.0 45";
int n = 2;
const char * strs = strr + n;
int i = 0;
int ints;
while(i<5)
{
int n2 = 0;
sscanf(strs,"%d%n",&ints,&n2);
strs += n2;
i++;
}
puts(strs);
}

agentgonzo · Apr 26, 2012

Zatnikitelman said:
but here's what I've come up with so far.

In honesty the code's fairly ugly. You're copying to a second string which seems a bit pointless given you can just increment the pointer along the string as jthill suggested and read from that changed pointer. If you don't want to change the pointer then you can just add the desired value to the pointer rather than incrementing it.

Orbiter 2024 has been released!

C++ Question Help parsing variable length C-string

Zatnikitelman

Addon Developer

Urwumpe

Not funny anymore

jedidia

shoemaker without legs

Hielor

Defender of Truth

Zatnikitelman

Addon Developer

Hielor

Defender of Truth

Zatnikitelman

Addon Developer

Hielor

Defender of Truth

jthill

Member

Zatnikitelman

Addon Developer

Hielor

Defender of Truth

agentgonzo

Grounded since '09

Similar threads