JOIN
Get Time
forums   
Search | Watch Thread  |  My Post History  |  My Watches  |  User Settings
View: Flat (newest first)  | Threaded  | Tree
Previous Thread  |  Next Thread
Working with Strings | Reply
Problem:
Strings are an important data structure used in many common languages. Strings are stored as a sequence of characters. Strings can be used to represent text, as well as other objects, such as 2D maps. This recipe will describe how to use Strings.

Solution:
A String variable can be initialized in a number of ways:
//Java
String s = "Hello World";

'Visual Basic
Dim s As String = "Hello " & "World"


A common operation is to extract the contents of a String. We can extract either a particular character or a whole substring, which will also have type String:
//Java
//print the second character of s
System.out.println(s.charAt(1));
 
//extract contents between first and last 'l's in s
String res=s.substring(s.indexOf('l')+1, s.lastIndexOf('l'));
System.out.println(res);	//prints "lo Wor"

'Visual Basic
'print the first 5 characters of s
Print Left$(s,5)      'prints "Hello"
'print 3 characters of s starting with the fifth character
Print Mid$(s,5,3)      'prints "o W"


Strings in Java are immutable, meaning that we cannot modify their contents directly. Instead, we can use a StringBuffer object, which can be modified character-by-character. Visual Basic does not suffer from this problem:
'Visual Basic
'replace the last 5 characters of s with “Earth”
Right$(s,5) = "Earth"
Print s     'prints "Hello Earth"


Often we need to check whether a String contains a particular substring. In the following we check whether s contains "el":
//Java
if (s.indexOf("el")>=0)
	System.out.println(s+" contains el");

'Visual Basic
If InStr(s, "el") >= 1 Then
	Print s & " contains el"


We can compare two Strings based on their lexicographical (dictionary) order. This is useful when we want to sort Strings:
//Java
int r=string1.compareTo(string2);
//r = 0 if the Strings are identical (case-sensitive)
//r < 0 if string1 is lexicographically less than string2
//r > 0 if string1 is lexicographically greater than string2

'Visual Basic
result = StrComp(string1,string2,vbTextCompare)





Regular expressions are a very powerful tool. They can be used to search for predefined patterns in Strings, amongst other things (LINK to the article on Regular Expressions by Dan[Popovici] and mariusmuja). Lets check whether the String matches a regular expression of the form [0 or more numbers] [0 or more characters] [0 or more numbers]:
//Java
if ("123abcd45".matches("[0-9]*[a-zA-Z]*[0-9]*"))
	System.out.println("String has the form numbers-characters-numbers");

'Visual Basic
If "123abcd45" Like "[0-9]*[a-zA-Z]*[0-9]*" Then
	Print "String has the form numbers-characters-numbers"


Strings can also be parsed as regular expressions. Here we will parse the String into words, ie., text separated by space:
//Java
String[] words=s.split(" ");
for (int i=words.length-1; i>=0; i--)
    System.out.println(words[i]+" ");  //prints "World Hello "

'Visual Basic
Dim word As String=s.Split(" ")(1)
Print word	'prints the second word "World"



Strings can also be used to simplify methods on integers. We will now obtain the most significant digit of an integer by converting it to a String:
//Java
int i=12311;
int digit=(""+i).charAt(0)-'0';
System.out.println("The first digit of "+i+" is "+digit);

Lets compute the number of digits in an integer:
//Java
int digits=(""+i).length();
System.out.println("Number of digits in "+i+" is "+digits);
Re: Working with Strings (response to post by dimkadimon) | Reply
To be honest I don't like this recipe. Currently it is very basic. If someone wanted to learn about Strings they would just read the documentation for their language. Including more complex tricks doesn't seem very useful either, because then the article becomes just a bunch of hacks. Another problem is that Strings are very specific to the programming language and it is hard to keep the article general. I am not very familiar with String manipulation in other languages. So could someone check my VB code?
Re: Working with Strings (response to post by dimkadimon) | Reply
I'll have to think a bit more about "Working with ..." type of recipes. Most likely they really will be a bunch of tricks and hacks "how to do something in 4 languages".

Regular expressions will probably be covered with a separate recipe in "Miscellaneous Problems" chapter.
Re: Working with Strings (response to post by Nickolas) | Reply
I think to make the recipe useful it has to be different for each language. And VB will be a problem=)

And there are a lot of things to add here!
Probably the most dumb way to write the recipe for a particular language is to look briefly over 100 submissions of various problems and note all the string manipulation operations that are done often.

Edit: I've just found the "Parsing String Input" recipe, so there is no need to mention it here.
Re: Working with Strings (response to post by dimkadimon) | Reply
Problem:
Strings are an important data structure used in many common languages. Strings are stored as a sequence of characters. Strings can be used to represent text, as well as other objects, such as 2D maps. This recipe will describe how to use Strings.

Solution:

Java
Java is the only language on TopCoder in which Strings are immutable, meaning that we cannot modify their contents directly. Instead, we can use a StringBuffer object, which can be modified character-by-character.
Java Strings support regular expressions and splitting on characters.
String s = "Hello World";  //initialization
s.charAt(1);  //access character 'e'
s.substring(1,4);  //contents between 1st and 3rd character: "ell"
s.length();  //string length: 11
s.indexOf("el");  //first occurrence of "el": 1
s.compareTo(s2);  //string comparison
s=s+"!";  //string concatenation
 
String[] words=s.split(" ");  //split s into space-separated Strings: "Hello" and "World"
"123abcd45".matches("[0-9]*[a-zA-Z]*[0-9]*");  //regular expressions. This will return true, because it has format number-character-number



C++
string s = "Hello World";  //initialization
s[1];  //access character 'e'
substr(s,1,3);  //contents between 1st and 3rd character: "ell"
s.size();  //string length: 11
s.compare(s2);  //string comparison
s=s+"!";  //string concatenation
 
s[1]='u';  //change 'e' to 'u'

C++ strings do not support splitting on characters, but this can achieved using istringstream:
string word;
istringstream iss(s);
while( iss >> word )     
  //here you can manipulate each word




C#
C# strings support regular expressions, splitting on characters and iteration:
string s = "Hello World";  //initialization
s[1];  //access character 'e'
s.Substring(s,1,3);  //contents between 1st and 3rd character: "ell"
s.Length;  //string length: 11
s.CompareTo(s2);  //string comparison
s=s+"!";  //string concatenation
 
string[] words=s.Split(' ');  //split s into space-separated Strings: "Hello" and "World"
//regular expressions. This will return true, because it has format number-character-number
System.Text.RegularExpressions.Regex.IsMatch("123abcd45", "[0-9]*[a-zA-Z]*[0-9]*");
foreach(char c in s)  //iterate over characters in s
s[1]='u';  //change 'e' to 'u'



Discussion:
The above is only a summary of the main String functions. Each language comes with many other string functions and the best way to find them is to check the corresponding APIs:

Java: http://download.oracle.com/javase/6/docs/api/java/lang/String.html
C++: http://www.cplusplus.com/reference/string/string/
C#: http://msdn.microsoft.com/en-us/library/ms228364.aspx

Note that the string comparison functions return 0 if the two strings are equal, a negative number if the first string is lexicographically earlier than the second one and a positive number otherwise.




EDIT: This is my first draft attempt at rewriting this recipe. I think it is better than before, but still needs more work. I am not really familiar with C++ and C# strings, so it would be great if someone checks it. I started writing the VB strings, but then got really confused, because some of their functions are 0-based and others are 1-based. It would be great if somebody else writes that part.
EDIT2: Turns out that C# does have regular expressions.
RSS