Working with strings

Concatenation

Strings, as has been explained, consists of sequences of characters. Once created, such strings can be changed in a number of ways. One of the operators that you can use in combination with strings is the concatenation operator. Its symbol is the plus (‘+’). You can use this operator to combine two or more existing strings into a longer string.

In [ ]:
firstName = "Jane"
lastName = "Austen"
fullName = firstName + " " + lastName
sentence = 'Pride and Prejudice was written by ' + fullName

print(sentence)

The code above actually contains a number of concatenations. The variables ‘firstName’ and ‘secondNAme’ are firstly combined into the longer variable named fullName. A very short string, consisting of a space only, is placed in between the first name and the last name. The penultimate line uses the variable 'fullName' in a sentence which is printed.

Selecting individual characters of a string

When you create a string, it is useful to bear in mind that all the inividual characters that make up the string are numbered. These numbers are referred to as indices. The first character is given the index 0. Individual characters can be accessed by appending a set of square brackets to the name of the string variable, and by supplying the index of the character you want to access within these brackets. Using these indices, it becomes possible to extract specific characters from variables. When you use negative numbers (e.g. -1 or -2), Python starts selecting characters at the end of the string and moves back.

In [ ]:
name = 'Albert Einstein'

print( name[0] )
## This prints 'A'

print( name[4] )
## This prints 'r'

print ( name[-1] )
## This prints 'n'

You can also extract a range of characters from a given string by providing the start position and the end position of the substring you want to create. The two indices must be separated by a colon. Strings such as these, which are created by extracting a substring from a longer string, are referred to as ‘slices’.

In [ ]:
title = 'Romeo and Juliet'

print( title[0:5] )
# this prints 'Romeo'

print( title[10:16])
# this prints 'Juliet'

Functions and methods for strings

The len() is an in-built function which can return the length of a string. This function measures the length by counting the number of characters.

The value a of string variable can also be manipulated by making use of methods. A method is similar to a function. A method, like a function, is a word that represents a certain action. While functions can be used independently, methods are always associated with a specific type of variable such as a string. We can call a method by appending the name of the method to the name of the variable. The variable and the method must be delimited by a period. The following methods are available for string objects:

Function name Description
upper() Converts all the characters of the string into upper case.
lower() Converts all the characters of the string into lower case.
strip() Removes all white space (such as spaces, hard returns or tabs) from the beginning and the end of a string

The code below gives an indication of how these methods may be used.

In [ ]:
title = "   The Hitchhiker's Guide to the Galaxy   "

title = title.strip()
#this removes the spaces before and after the original string

print( len(title) )
## This prints 36

print( title.lower() )
## This prints 'the hitchhiker's guide to the galaxy'

print( title.upper() )
## This prints 'THE HITCHHIKER'S GUIDE TO THE GALAXY'

The method 'index()', finally, determines whether a string contains a substring. If it does, the method returns a number indicating the starting position (i.e. the index) of this substring. The function always returns the index of the first occurrence of the substring. If the string does not contain the substring that is mentioned as a parameter, the method produces an error message.

This method can be used productively in string slices, as is demonstrated below.

In [ ]:
email = 'person@test.com'

startIndex = email.index('@')

username = email[ 0 : startIndex ]
print(username)
## prints 'person'

domain = email[ startIndex + 1 : len(email) ]
print(domain)
## prints 'test.com'