JOIN
Get Time
forums   
Search | Watch Thread  |  My Post History  |  My Watches  |  User Settings
View: Flat  | Threaded  | Tree
Previous Thread  |  Next Thread
2.1. Parsing String Input | Reply
Problem

Algorithm and especially Marathon problems often give you the input as strings which have to be parsed before actually processing them. A typical problem is: you are given a string which contains fragments of text separated with spaces, and you have to extract these fragments as an array of individual strings.

Solution

C++
C++ is the only language of TopCoder which has no built-in method string.Split, so you'll have to write one yourself, for example, like this:
#include <sstream>
vector<string> split(string s) {
	stringstream A(s);
	vector<string> res;
	string t;
	while (A>>t) res.push_back(t);
	return res;
}
vector<string> fragments = split(input);

Java
String[] fragments = input.split(" ");

C#
string[] fragment = input.Split(new char[]{' '}, StringSplitOptions.RemoveEmptyEntries);

VB
dim fragment() as string
fragment = input.Split(" ")

Python
fragment = input.split();


Discussion

In most problems the string fragmentation task is present in its very basic form: text fragments are strings separated with single spaces, and they the only information to be extracted from the string, like in ToolsBox. But this task can also come in lots of flavors:
- text fragments can be separated with more than one space, and the string can contain leading and trailing spaces as well, like in problem VLNString;
- the separators can be non-space characters, like in DirectoryTree;
- you might need to extract not only the text fragments themselves, but also the values of separators, like in SimpleCalculator;
- text fragments within the string can represent numerical data, like in HiddenNumbers;
- fragments can be numbers and text, mixed in given format, like in SimpleCalculator.

All approaches to the basic task and its versions can be roughly divided into three groups:

1. Manual. As the name implies, this approach means doing the task "by hand", processing the string character by character. Well, this can be done, but what's the point of doing something by hand when there is a library function that does the job? Remember that in SRMs speed is important, and parsing is rarely the main part of the problem, so is should be really fast to type, and manual parsing definitely is not.
The only exception from this rule are specific tasks where the string contains a small number of short fragments in specified format, like getting moment of time from notation "HH:MM". In this case writing
hours = (time[0]-'0')*10 + (time[1]-'0');
minutes = (time[3]-'0')*10 + (time[4]-'0');
takes not much longer than using special functions and doesn't require recalling their signatures.

2. Using special parsing functions. All languages, and especially C++, have lots of built-in classes and functions for different parsing tasks. The tricky part is knowing them in details and choosing the proper one for each task, which is important both for writing your own solution and for challenging other solutions. For example, using standard method string.split (in languages where it is provided) for tasks like VLNString will produce extra zero-length strings for each extra space between the text fragments and for each leading or trailing space, which should be taken into account when processing the result. Most languages allow several solutions for each parsing task; following are several examples not covered in "Solution" section.

SimpleCalculator, Java
(using non-space characters as delimiters and including delimiter characters as fragments)
import java.util.*;
public class SimpleCalculator {
	public int calculate(String input)
	{	StringTokenizer st = new StringTokenizer(input, "+-/*", true);
		int n1 = Integer.parseInt(st.nextToken());
		String op = st.nextToken();
		int n2 = Integer.parseInt(st.nextToken());
		//process
	}
}

SimpleCalculator, C++
(parsing string of given format with sscanf)
#include <string>
#include <stdio.h>
 
using namespace std;
 
class SimpleCalculator {
	public:
	int calculate(string input)
	{	int n1,n2;
		char op;
		sscanf(input.c_str(),"%d%c%d",&n1,&op,&n2);
		//process
	}
};

C++ also provides a variety of C-styled parsing functions, which use strings represented as char *. They require some extra actions to convert STL strings to C-style strings and vice versa, but are generally faster than corresponding STL-based methods, so they can come handy if added to pre-written code.

For example, function strtok looks for delimiters in a string and replaces them with null character (one delimiter at a time). The split function based on strtok would look like this:

vector<string> split(string s, string del = " ")
{	char *word, *buf, *delim;
	vector<string> res;
	delim = strdup(del.c_str());
	buf = strdup(s.c_str());
	for (word = strtok(buf, delim); word; word = strtok(0, delim))
		res.push_back(string(word));
        free(buf);
        free(delim);
	return res;
}

3. Other methods. These include using classes or functions not meant for parsing. Following are two examples of such methods.

3.1. C++, stringstream class.
stringstream provides an interface to manipulate strings as if they were input/output streams, and it's a universal parsing tool. For example, consider a task where all text fragments in the string represent numbers of the same type (integer or floating point). In all other languages you split the string into fragments first, and then convert each fragment into a number. In C++, you can have one universal split() method and use it for splitting the string into an array of literally anything:

#include <sstream>
#include <string>
using namespace std;
 
template<class T> vector<T> split(string s) {
	stringstream A(s);
	vector<T> res;
	T t;
	while (A>>t) res.PB(t);
	return res;
}
 
vector<string> string_fragments = split<string>(input);
vector<int> numbers = split<int>(input);

You can also use stringstream to parse the string into fragments of different data types.

SimpleCalculator, C++
#include <string>
#include <sstream>
#include <stdio.h>
 
using namespace std;
 
class SimpleCalculator {
	public:
	int calculate(string input)
	{	int n1,n2;
		char op;
		stringstream s;
		s<<input;
		s>>n1>>op>>n2;
		//process
	}
};

3.2. Java, java.util.regexp package.
This package contains classes for matching character sequences against patterns specified by regular expression. If we are looking to extract all numbers from a string with any characters as separators, we can use regular expression to specify what we think is a number, and extract all matches of this regular expression from the string.

HiddenNumbers, Java

import java.util.regex.*;
public class HiddenNumbers {
	public String[] findAll(String[] text)
	{	String t="";
		for (int i=0; i<text.length; i++)
			t += text[i];
		Pattern p = Pattern.compile("[0-9]+");
		Matcher m = p.matcher(t);
		while (m.find())
		{	//current number is stored in m.group() as a String
			//store it as needed
		}
		//process
	}
}
Subject Author Date
2.1. Parsing String Input Nickolas Apr 26, 2010 at 1:28 PM EDT
Re: 2.1. Parsing String Input syg96 Oct 29, 2010 at 12:22 AM EDT
Re: 2.1. Parsing String Input Nickolas Jan 9, 2011 at 4:10 AM EST
Re: 2.1. Parsing String Input syg96 Jan 9, 2011 at 5:04 AM EST
Re: 2.1. Parsing String Input Nickolas Jan 9, 2011 at 6:25 AM EST
Re: 2.1. Parsing String Input syg96 Jan 9, 2011 at 6:49 AM EST
Re: 2.1. Parsing String Input Nickolas Jan 9, 2011 at 6:53 AM EST
Re: 2.1. Parsing String Input syg96 Jan 9, 2011 at 7:03 AM EST
Re: 2.1. Parsing String Input dimkadimon Jan 9, 2011 at 10:31 PM EST
Re: 2.1. Parsing String Input orchidmajumder Sep 1, 2013 at 3:21 PM EDT
Re: 2.1. Parsing String Input OliverSun Dec 1, 2013 at 1:13 PM EST
RSS