JOIN
Get Time
forums   
Search | Watch Thread  |  My Post History  |  My Watches  |  User Settings
View: Flat (oldest first)  | Threaded  | Tree
Previous Thread  |  Next Thread
Re: 2.1. Parsing String Input (response to post by orchidmajumder) | Reply
orchidmajumber, i know this thread isn't the place for this. But I would like some advice and help with a dynamic programming problems. Can you help me? please e-mail me oliversun93@gmail.com
Re: 2.1. Parsing String Input (response to post by Nickolas) | Reply
I think using stringstream (with the ignore function) may come handy for splitting strings delimited by space,comma or any character delimiter.
#include<cstdio>
#include<iostream>
#include<vector>
#include<sstream>
using namespace std;
 
template<class T> vector<T> split(string s,char c=' ') {
    stringstream A(s);
	vector<T> res;
	T t;
	while (A>>t) {
        res.push_back(t);
        if(A.peek()==c)
            A.ignore();
	}
	return res;
}
 
int main() {
    string s;
    getline(cin,s);
    vector<int>d=split<int>(s,',');
    for(int i=0;i<d.size();i++)
        cout<<d[i]<<"  ";
    return 0;
}
Re: 2.1. Parsing String Input (response to post by Nickolas) | Reply
That's why I use Java :P
Re: 2.1. Parsing String Input (response to post by Nickolas) | Reply
I can't make it any better. split function has STL input and output, and must work properly for arbitrary string length. That's why it is so ugly.

As I've said, the main goal is to show that there is a cool strtok library function.
Re: 2.1. Parsing String Input (response to post by syg96) | Reply
Do you mind just posting the proper C-styled way to write split() using strtok?
Re: 2.1. Parsing String Input (response to post by Nickolas) | Reply
I'm sorry but: now it looks even worse=) Though is it correct now.

Personally I prefer not to mess with manual dynamic allocation in olympic programming=) Only C++ containers and fixed-size arrays.

I think you can leave it as it is now. Coders who prefer C-style would do it in proper way themselves. The key idea is to show strtok, not C string and deallocation routines=)
Re: 2.1. Parsing String Input (response to post by syg96) | Reply
That's exactly why I prefer to use STL :-) Does it look better now?
Re: 2.1. Parsing String Input (response to post by Nickolas) | Reply
The added split code contains memory leak=)
delim and buf are duplicated strings - free should be called to deallocate them.

Though it will work well in SRM... But anyway it is bad style and at least comment should be added.

C programmers have to follow strict discipline for their low-level possibilities=)
Re: 2.1. Parsing String Input (response to post by syg96) | Reply
Agreed, added strtok-based implementation of split(). Thanks!
Re: 2.1. Parsing String Input (response to post by Nickolas) | Reply
How about C parsing functions? They are also useful. Personally I always use strtok instead of stringstream.

Example:
There is a string S which is a text. The separators are: space, period and comma. There may be several separators between words and separators before first and after last words.

char buff[1024];
strcpy(buff, S.c_str());
for (char *word = strtok(buff, " .,"); word; word = strtok(0, " .,"))
    ProcessWord(word);


Be careful with char[] size and remember that "buff" is changed by strtok.
The last: as with any C vs C++ library functions: strtok may be faster than stringstream.
2.1. Parsing String Input | Reply
Problem

Algorithm and especially Marathon problems often give you the input as strings which have to be parsed before actually processing them. A typical problem is: you are given a string which contains fragments of text separated with spaces, and you have to extract these fragments as an array of individual strings.

Solution

C++
C++ is the only language of TopCoder which has no built-in method string.Split, so you'll have to write one yourself, for example, like this:
#include <sstream>
vector<string> split(string s) {
	stringstream A(s);
	vector<string> res;
	string t;
	while (A>>t) res.push_back(t);
	return res;
}
vector<string> fragments = split(input);

Java
String[] fragments = input.split(" ");

C#
string[] fragment = input.Split(new char[]{' '}, StringSplitOptions.RemoveEmptyEntries);

VB
dim fragment() as string
fragment = input.Split(" ")

Python
fragment = input.split();


Discussion

In most problems the string fragmentation task is present in its very basic form: text fragments are strings separated with single spaces, and they the only information to be extracted from the string, like in ToolsBox. But this task can also come in lots of flavors:
- text fragments can be separated with more than one space, and the string can contain leading and trailing spaces as well, like in problem VLNString;
- the separators can be non-space characters, like in DirectoryTree;
- you might need to extract not only the text fragments themselves, but also the values of separators, like in SimpleCalculator;
- text fragments within the string can represent numerical data, like in HiddenNumbers;
- fragments can be numbers and text, mixed in given format, like in SimpleCalculator.

All approaches to the basic task and its versions can be roughly divided into three groups:

1. Manual. As the name implies, this approach means doing the task "by hand", processing the string character by character. Well, this can be done, but what's the point of doing something by hand when there is a library function that does the job? Remember that in SRMs speed is important, and parsing is rarely the main part of the problem, so is should be really fast to type, and manual parsing definitely is not.
The only exception from this rule are specific tasks where the string contains a small number of short fragments in specified format, like getting moment of time from notation "HH:MM". In this case writing
hours = (time[0]-'0')*10 + (time[1]-'0');
minutes = (time[3]-'0')*10 + (time[4]-'0');
takes not much longer than using special functions and doesn't require recalling their signatures.

2. Using special parsing functions. All languages, and especially C++, have lots of built-in classes and functions for different parsing tasks. The tricky part is knowing them in details and choosing the proper one for each task, which is important both for writing your own solution and for challenging other solutions. For example, using standard method string.split (in languages where it is provided) for tasks like VLNString will produce extra zero-length strings for each extra space between the text fragments and for each leading or trailing space, which should be taken into account when processing the result. Most languages allow several solutions for each parsing task; following are several examples not covered in "Solution" section.

SimpleCalculator, Java
(using non-space characters as delimiters and including delimiter characters as fragments)
import java.util.*;
public class SimpleCalculator {
	public int calculate(String input)
	{	StringTokenizer st = new StringTokenizer(input, "+-/*", true);
		int n1 = Integer.parseInt(st.nextToken());
		String op = st.nextToken();
		int n2 = Integer.parseInt(st.nextToken());
		//process
	}
}

SimpleCalculator, C++
(parsing string of given format with sscanf)
#include <string>
#include <stdio.h>
 
using namespace std;
 
class SimpleCalculator {
	public:
	int calculate(string input)
	{	int n1,n2;
		char op;
		sscanf(input.c_str(),"%d%c%d",&n1,&op,&n2);
		//process
	}
};

C++ also provides a variety of C-styled parsing functions, which use strings represented as char *. They require some extra actions to convert STL strings to C-style strings and vice versa, but are generally faster than corresponding STL-based methods, so they can come handy if added to pre-written code.

For example, function strtok looks for delimiters in a string and replaces them with null character (one delimiter at a time). The split function based on strtok would look like this:

vector<string> split(string s, string del = " ")
{	char *word, *buf, *delim;
	vector<string> res;
	delim = strdup(del.c_str());
	buf = strdup(s.c_str());
	for (word = strtok(buf, delim); word; word = strtok(0, delim))
		res.push_back(string(word));
        free(buf);
        free(delim);
	return res;
}

3. Other methods. These include using classes or functions not meant for parsing. Following are two examples of such methods.

3.1. C++, stringstream class.
stringstream provides an interface to manipulate strings as if they were input/output streams, and it's a universal parsing tool. For example, consider a task where all text fragments in the string represent numbers of the same type (integer or floating point). In all other languages you split the string into fragments first, and then convert each fragment into a number. In C++, you can have one universal split() method and use it for splitting the string into an array of literally anything:

#include <sstream>
#include <string>
using namespace std;
 
template<class T> vector<T> split(string s) {
	stringstream A(s);
	vector<T> res;
	T t;
	while (A>>t) res.PB(t);
	return res;
}
 
vector<string> string_fragments = split<string>(input);
vector<int> numbers = split<int>(input);

You can also use stringstream to parse the string into fragments of different data types.

SimpleCalculator, C++
#include <string>
#include <sstream>
#include <stdio.h>
 
using namespace std;
 
class SimpleCalculator {
	public:
	int calculate(string input)
	{	int n1,n2;
		char op;
		stringstream s;
		s<<input;
		s>>n1>>op>>n2;
		//process
	}
};

3.2. Java, java.util.regexp package.
This package contains classes for matching character sequences against patterns specified by regular expression. If we are looking to extract all numbers from a string with any characters as separators, we can use regular expression to specify what we think is a number, and extract all matches of this regular expression from the string.

HiddenNumbers, Java

import java.util.regex.*;
public class HiddenNumbers {
	public String[] findAll(String[] text)
	{	String t="";
		for (int i=0; i<text.length; i++)
			t += text[i];
		Pattern p = Pattern.compile("[0-9]+");
		Matcher m = p.matcher(t);
		while (m.find())
		{	//current number is stored in m.group() as a String
			//store it as needed
		}
		//process
	}
}
RSS