C++ Question Help parsing variable length C-string

Zatnikitelman

Addon Developer
Addon Developer
Joined
Jan 13, 2008
Messages
2,303
Reaction score
6
Points
38
Location
Atlanta, GA, USA, North America
I'm having some problems parsing a variable length string of numbers in C++. Basically, I have a string that has an int, a sequence of ints equal to the first int, then several doubles. For example:
5 8 9 10 12 14 0.0 0.0 1.0 1.0 0.0 0.0 45.0
The number of numbers in bold, is equal to the first number. So if I have a 1, then I'll only have one more number before the 0.0 0.0 1.0 1.0 0.0 0.0 45.0 sequence, if I have a 3, I'll have 3 numbers in there etc. I'm having some problems parsing this string however. If it was fixed width, I'd just use sscanf with the proper format, but the code won't know how many ints are in the bolded section, until it reads the first int.
This is my latest attempt at parsing this using vertical pipes around the variable length section, but it won't work because the ints might be more than one character long and in this solution, I'm only incrementing one character at a time:
Code:
// char* strr = "5|8|9|10|12|14|0.0 0.0 1.0 1.0 0.0 0.0 45.0" <- effectively the state of things at this point
char* pch = strtok(strr,"|");
int i = 0;
while(pch != NULL)
{
	sscanf(pch,"%d",ram->grouplist[i]);
	pch = strtok(NULL,"|");
	i++;
}
char strv[1024];
strcpy(strv,strr);
sscanf(strv,"%lf %lf %lf %lf %lf %lf %lf",ram->pos.x,ram->pos.y,ram->pos.z,ram->axis.x,ram->axis.y,ram->axis.z,ram->angle);
So does anyone have any ideas how to parse this?
Thanks for any help you can provide,
Matt
 
Last edited:
Did you already look at the Boost++ RegEx library? AFAIR, it has captures like Lua that you could use.
 
Here's how I do it (or, better put, how vchamp does it, since I ripped this from his code):

Code:
        vector<string> tokens;
        Tokenize(strr, tokens);
        for (UINT i = 0; i < tokens.size(); i++)
        {
            double paramvalue;
            sscanf(tokens[i].data(), "%lf", &paramvalue); 
            //now do with the value what you need to do with it
        }

You can also tokenize using other delimitnators (space is default), and you can of course further tokenize a string you received from tokenizing. It let's you comfortably group values by different deliminators. like 1,2,3 4,5,6 7,8,9 Can be chipped to three strings of "1,2,3", "4,5,6", "7,8,9", after tokenizing with a space deliminator, and then giving you the individual values when running each string with a comma-deliminator.

So in your case, you could tokenize using a | deliminator, then take the last token and tokenize it with a space deliminator.
 
Last edited:
Can't you just use a space in strtok instead of a pipe?
 
Ok, I've gone all sorts of directions today, and I've ended up with Hielor's idea sort of. But I'm still having a problem. I'm using codepad.org to prototype my code, and it keeps either timing out, or giving segmentation fault errors. Here's what I've come up with so far and I'm tired of staring at it so I'm going to post it and see if anyone sees anything I missed.
Thanks.
Code:
int main()
{
char strr[] = {"5 8 9 10 12 14 0.0 0.0 1.0 1.0 0.0 0.0 45.0"};
int x = 0;
sscanf(strr,"%d",&x);
printf("%d\n",x);
char strc[1024];
strcpy(strc,strr);
puts(strr);
char* pch = strtok(strr," ");
int i = 0;
int grouplist[5];
//The while loop is where the errors appear, in this configuration, it times out on codepad.
while(pch != NULL)
{
        if(i<5)
        {
	sscanf(pch,"%d",&grouplist[i]);
        pch = strtok(NULL," ");        
        }
        i++;
}
for(int k = 0; k<5; k++)
{
   //printf("Value %d: %d\n",k,grouplist[k]);
}
puts(pch);
char strv[1024];
strcpy(strv,strc);
puts(strv);
}
 
Ok, I've gone all sorts of directions today, and I've ended up with Hielor's idea sort of. But I'm still having a problem. I'm using codepad.org to prototype my code, and it keeps either timing out, or giving segmentation fault errors. Here's what I've come up with so far and I'm tired of staring at it so I'm going to post it and see if anyone sees anything I missed.
Thanks.
Code:
...
while(pch != NULL)
{
        if(i<5)
        {
    sscanf(pch,"%d",&grouplist[i]);
        pch = strtok(NULL," ");        
        }
        i++;
}
...
The while loop times out because it doesn't have a proper exit condition. For i >= 5, the while loop doesn't do anything other than increment i. Think about it--if i >= 5, the if condition fails, so your loop becomes:
Code:
while(pch != NULL)
{
    i++;
}

Pretty easy to see why that times out...
 
Except the while loop should exit when pch finally equals null, i isn't the exit condition.
Nice theory, but that's not what the code does.

Think: once i has reached 5, how does pch become null?

Hint: it can't.
 
Here's a very rough sketch:
Code:
    int nints;
    int *ints;
    double *doubles;

    const char *thestring = "5 1 2 3 4 5 1.3";
    int scanned;

    sscanf(thestring," %d%n", &nints, &scanned);
    thestring += scanned;
    
    while ( nints-- ) {
        sscanf(thestring," %d%n", ints++, &scanned);
        thestring += scanned;
    }

    while ( sscanf(thestring," %lf%n", doubles++, &scanned) == 1 )
        thestring += scanned;
 
Thanks everyone who responded, and jthill in particular. I had forgotten about using %n in the sscanf format string and I think I have a solution. I'm not able to test yet, but here's what I've come up with so far.
Code:
int main()
{
const char* strr = "5 5 6 7 8 9 1.1 1.2 0.0 0.0 0.0 0.0 45";
int n = 2;
char strs[1024];
strcpy(strs,strr+n);
int i = 0;
int ints;
while(i<5)
{
	int n2 = 0;
	sscanf(strs,"%d%n",&ints,&n2);
	strcpy(strs,strs+n2);
	i++;
}
puts(strs);
}
Yes, the number of characters I'm trying to read is hardcoded in that example, but for testing purposes, it seems to work on codepad.
Thanks again!
 
Thanks everyone who responded, and jthill in particular. I had forgotten about using %n in the sscanf format string and I think I have a solution. I'm not able to test yet, but here's what I've come up with so far.
Code:
int main()
{
const char* strr = "5 5 6 7 8 9 1.1 1.2 0.0 0.0 0.0 0.0 45";
int n = 2;
char strs[1024];
strcpy(strs,strr+n);
int i = 0;
int ints;
while(i<5)
{
    int n2 = 0;
    sscanf(strs,"%d%n",&ints,&n2);
    strcpy(strs,strs+n2);
    i++;
}
puts(strs);
}
Yes, the number of characters I'm trying to read is hardcoded in that example, but for testing purposes, it seems to work on codepad.
Thanks again!
The usage of strs and strcpy seems odd and inefficient to me, especially since you're violating the strcpy guidance ("destination...should not overlap in memory with source.")

Basically that means that you might end up with unexpected behavior on some systems, depending on how strcpy is implemented.

Why can you just have "strs" be a const char * into strr and then move it accordingly?

Code:
int main()
{
const char* strr = "5 5 6 7 8 9 1.1 1.2 0.0 0.0 0.0 0.0 45";
int n = 2;
const char * strs = strr + n;
int i = 0;
int ints;
while(i<5)
{
int n2 = 0;
sscanf(strs,"%d%n",&ints,&n2);
strs += n2;
i++;
}
puts(strs);
}
 
but here's what I've come up with so far.
In honesty the code's fairly ugly. You're copying to a second string which seems a bit pointless given you can just increment the pointer along the string as jthill suggested and read from that changed pointer. If you don't want to change the pointer then you can just add the desired value to the pointer rather than incrementing it.
 
Back
Top