SECTION D | unit 4 string handling

FIPP

SECTION D

Explain in brief Accessing Values in Strings, Updating Strings and Escape Characters

Accessing Values in Strings

Python does not support a character type; these are treated as strings of length one, thus also considered a substring.

To access substrings, use the square brackets for slicing along with the index or indices to obtain your substring. For example −

#!/usr/bin/python

Var1 = 'Hello World!'

Var2 = "Python Programming"

Print "var1[0]: ", var1[0]

Print "var2[1:5]: ", var2[1:5]

When the above code is executed, it produces the following result −

Var1[0]: H

Var2[1:5]: ytho

Updating Strings

You can "update" an existing string by (re)assigning a variable to another string. The new value can be related to its previous value or to a completely different string altogether. For example −

#!/usr/bin/python

Var1 = 'Hello World!'

Print "Updated String :- ", var1[:6] + 'Python'

When the above code is executed, it produces the following result −

Updated String :- Hello Python

Escape Characters

Following table is a list of escape or non-printable characters that can be represented with backslash notation.

An escape character gets interpreted; in a single quoted as well as double quoted strings.

Backslash notation	Hexadecimal character	Description
\a	0x07	Bell or alert
\b	0x08	Backspace
\cx		Control-x
\C-x		Control-x
\e	0x1b	Escape
\f	0x0c	Formfeed
\M-\C-x		Meta-Control-x
\n	0x0a	Newline
\nnn		Octal notation, where n is in the range 0.7
\r	0x0d	Carriage return
\s	0x20	Space
\t	0x09	Tab
\v	0x0b	Vertical tab
\x		Character x
\xnn		Hexadecimal notation, where n is in the range 0.9, a.f, or A.F

2. What are string special operators and string formatting operators?

String Special Operators

Assume string variable a holds 'Hello' and variable b holds 'Python', then −

Operator	Description	Example
+	Concatenation - Adds values on either side of the operator	a + b will give HelloPython
*	Repetition - Creates new strings, concatenating multiple copies of the same string	a*2 will give –HelloHello
[]	Slice - Gives the character from the given index	a[1] will give e
[ : ]	Range Slice - Gives the characters from the given range	a[1:4] will give ell
In	Membership - Returns true if a character exists in the given string	H in a will give 1
Not in	Membership - Returns true if a character does not exist in the given string	M not in a will give 1
r/R	Raw String - Suppresses actual meaning of Escape characters. The syntax for raw strings is exactly the same as for normal strings with the exception of the raw string operator, the letter "r," which precedes the quotation marks. The "r" can be lowercase (r) or uppercase (R) and must be placed immediately preceding the first quote mark.	Print r'\n' prints \n and print R'\n'prints \n
%	Format - Performs String formatting	See at next section

String Formatting Operator

One of Python's coolest features is the string format operator %. This operator is unique to strings and makes up for the pack of having functions from C's printf() family. Following is a simple example −

#!/usr/bin/python

Print "My name is %s and weight is %d kg!" % ('Zara', 21)

When the above code is executed, it produces the following result −

My name is Zara and weight is 21 kg!

Here is the list of complete set of symbols which can be used along with % −

Format Symbol	Conversion
%c	Character
%s	String conversion via str() prior to formatting
%i	Signed decimal integer
%d	Signed decimal integer
%u	Unsigned decimal integer
%o	Octal integer
%x	Hexadecimal integer (lowercase letters)
%X	Hexadecimal integer (UPPERcase letters)
%e	Exponential notation (with lowercase 'e')
%E	Exponential notation (with UPPERcase 'E')
%f	Floating point real number
%g	The shorter of %f and %e
%G	The shorter of %f and %E

Other supported symbols and functionality are listed in the following table −

Symbol	Functionality
*	Argument specifies width or precision
-	Left justification
+	Display the sign
<sp>	Leave a blank space before a positive number
#	Add the octal leading zero ( '0' ) or hexadecimal leading '0x' or '0X', depending on whether 'x' or 'X' were used.
0	Pad from left with zeros (instead of spaces)
%	'%%' leaves you with a single literal '%'
(var)	Mapping variable (dictionary arguments)
m.n.	m is the minimum total width and n is the number of digits to display after the decimal point (if appl.)

3. Write some of the Built-in String Methods

Python includes the following built-in methods to manipulate strings −

Sr.No.	Methods with Description
1	Capitalize() Capitalizes first letter of string
2	Center(width, fillchar) Returns a space-padded string with the original string centered to a total of width columns.
3	Count(str, beg= 0,end=len(string)) Counts how many times str occurs in string or in a substring of string if starting index beg and ending index end are given.
4	Decode(encoding='UTF-8',errors='strict') Decodes the string using the codec registered for encoding. Encoding defaults to the default string encoding.
5	Encode(encoding='UTF-8',errors='strict') Returns encoded string version of string; on error, default is to raise a ValueError unless errors is given with 'ignore' or 'replace'.
6	Endswith(suffix, beg=0, end=len(string)) Determines if string or a substring of string (if starting index beg and ending index end are given) ends with suffix; returns true if so and false otherwise.
7	Expandtabs(tabsize=8) Expands tabs in string to multiple spaces; defaults to 8 spaces per tab if tabsize not provided.
8	Find(str, beg=0 end=len(string)) Determine if str occurs in string or in a substring of string if starting index beg and ending index end are given returns index if found and -1 otherwise.
9	Index(str, beg=0, end=len(string)) Same as find(), but raises an exception if str not found.
10	Isalnum() Returns true if string has at least 1 character and all characters are alphanumeric and false otherwise.
11	Isalpha() Returns true if string has at least 1 character and all characters are alphabetic and false otherwise.
12	Isdigit() Returns true if string contains only digits and false otherwise.
13	Islower() Returns true if string has at least 1 cased character and all cased characters are in lowercase and false otherwise.
14	Isnumeric() Returns true if a unicode string contains only numeric characters and false otherwise.
15	Isspace() Returns true if string contains only whitespace characters and false otherwise.
16	Istitle() Returns true if string is properly "titlecased" and false otherwise.
17	Isupper() Returns true if string has at least one cased character and all cased characters are in uppercase and false otherwise.
18	Join(seq) Merges (concatenates) the string representations of elements in sequence seq into a string, with separator string.
19	Len(string) Returns the length of the string
20	Ljust(width[, fillchar]) Returns a space-padded string with the original string left-justified to a total of width columns.
21	Lower() Converts all uppercase letters in string to lowercase.
22	Lstrip() Removes all leading whitespace in string.
23	Maketrans() Returns a translation table to be used in translate function.
24	Max(str) Returns the max alphabetical character from the string str.
25	Min(str) Returns the min alphabetical character from the string str.
26	Replace(old, new [, max]) Replaces all occurrences of old in string with new or at most max occurrences if max given.
27	Rfind(str, beg=0,end=len(string)) Same as find(), but search backwards in string.
28	Rindex( str, beg=0, end=len(string)) Same as index(), but search backwards in string.
29	Rjust(width,[, fillchar]) Returns a space-padded string with the original string right-justified to a total of width columns.
30	Rstrip() Removes all trailing whitespace of string.
31	Split(str="", num=string.count(str)) Splits string according to delimiter str (space if not provided) and returns list of substrings; split into at most num substrings if given.
32	Splitlines( num=string.count('\n')) Splits string at all (or num) NEWLINEs and returns a list of each line with NEWLINEs removed.
33	Startswith(str, beg=0,end=len(string)) Determines if string or a substring of string (if starting index beg and ending index end are given) starts with substring str; returns true if so and false otherwise.
34	Strip([chars]) Performs both lstrip() and rstrip() on string.
35	Swapcase() Inverts case for all letters in string.
36	Title() Returns "titlecased" version of string, that is, all words begin with uppercase and the rest are lowercase.
37	Translate(table, deletechars="") Translates string according to translation table str(256 chars), removing those in the del string.
38	Upper() Converts lowercase letters in string to uppercase.
39	Zfill (width) Returns original string leftpadded with zeros to a total of width characters; intended for numbers, zfill() retains any sign given (less one zero).
40	Isdecimal() Returns true if a unicode string contains only decimal characters and false otherwise.

4. Define Unicode string

Introduction

Models that process natural language often handle different languages with different character sets. Unicode is a standard encoding system that is used to represent character from almost all languages. Each character is encoded using a unique integer code point between 0 and 0x10FFFF. A Unicode string is a sequence of zero or more code points.

How to represent Unicode strings in TensorFlow and manipulate them using Unicode equivalents of standard string ops. It separates Unicode strings into tokens based on script detection.

Importtensorflowastf

The tf.string data type

The basic TensorFlowtf.stringdtype allows you to build tensors of byte strings. Unicode strings are utf-8 encoded by default.

Tf.constant(u"Thanks😊")

<tf.Tensor: shape=(), dtype=string, numpy=b'Thanks \xf0\x9f\x98\x8a'>

A tf.string tensor can hold byte strings of varying lengths because the byte strings are treated as atomic units. The string length is not included in the tensor dimensions.

Tf.constant([u"You're",u"welcome!"]).shape

TensorShape([2])

Note: When using python to construct strings, the handling of unicodediffersbetweeen v2 and v3. In v2, unicode strings are indicated by the "u" prefix, as above. In v3, strings are unicode-encoded by default.

5. Write in brief representation of Unicode

Representing Unicode

There are two standard ways to represent a Unicode string in TensorFlow:

String scalar — where the sequence of code points is encoded using a known character encoding.int32 vector — where each position contains a single code point.

For example, the following three values all represent the Unicode string "语言处理" (which means "language processing" in Chinese):

# Unicode string, represented as a UTF-8 encoded string scalar.
text_utf8 =tf.constant(u"语言处理")
text_utf8

<tf.Tensor: shape=(), dtype=string, numpy=b'\xe8\xaf\xad\xe8\xa8\x80\xe5\xa4\x84\xe7\x90\x86'>

# Unicode string, represented as a UTF-16-BE encoded string scalar.
text_utf16be =tf.constant(u"语言处理".encode("UTF-16-BE"))
text_utf16be

<tf.Tensor: shape=(), dtype=string, numpy=b'\x8b\xed\x8a\x00Y\x04t\x06'>

# Unicode string, represented as a vector of Unicode code points.
text_chars=tf.constant([ord(char)forcharinu"语言处理"])
text_chars

<tf.Tensor: shape=(4,), dtype=int32, numpy=array([35821, 35328, 22788, 29702], dtype=int32)>

Converting between representations

TensorFlow provides operations to convert between these different representations:

Tf.strings.unicode_decode : Converts an encoded string scalar to a vector of code points.
Tf.strings.unicode_encode : Converts a vector of code points to an encoded string scalar.
Tf.strings.unicode_transcode : Converts an encoded string scalar to a different encoding.

Tf.strings.unicode_decode(text_utf8,
input_encoding='UTF-8')

<tf.Tensor: shape=(4,), dtype=int32, numpy=array([35821, 35328, 22788, 29702], dtype=int32)>

Tf.strings.unicode_encode(text_chars,
output_encoding='UTF-8')

<tf.Tensor: shape=(), dtype=string, numpy=b'\xe8\xaf\xad\xe8\xa8\x80\xe5\xa4\x84\xe7\x90\x86'>

Tf.strings.unicode_transcode(text_utf8,
input_encoding='UTF8',
output_encoding='UTF-16-BE')

<tf.Tensor: shape=(), dtype=string, numpy=b'\x8b\xed\x8a\x00Y\x04t\x06'>

Batch dimensions

When decoding multiple strings, the number of characters in each string may not be equal. The return result is atf.RaggedTensor , where the length of the innermost dimension varies depending on the number of characters in each string:

# A batch of Unicode strings, each represented as a UTF8-encoded string.
batch_utf8 =[s.encode('UTF-8')for s in
[u'hÃllo', u'What is the weather tomorrow', u'Göödnight', u'😊']]
batch_chars_ragged=tf.strings.unicode_decode(batch_utf8,
input_encoding='UTF-8')
forsentence_charsinbatch_chars_ragged.to_list():
print(sentence_chars)

[104, 195, 108, 108, 111]

[87, 104, 97, 116, 32, 105, 115, 32, 116, 104, 101, 32, 119, 101, 97, 116, 104, 101, 114, 32, 116, 111, 109, 111, 114, 114, 111, 119]

[71, 246, 246, 100, 110, 105, 103, 104, 116]

[128522]

You can use this tf.RaggedTensor directly, or convert it to a dense tf.Tensor with padding or a tf.SparseTensor using the methods tf.RaggedTensor.to_tensor and tf.RaggedTensor.to_sparse.

Batch_chars_padded=batch_chars_ragged.to_tensor(default_value=-1)
print(batch_chars_padded.numpy())

[[ 104 195 108 108 111 -1 -1 -1 -1 -1

-1 -1 -1 -1 -1 -1 -1 -1 -1 -1

-1 -1 -1 -1 -1 -1 -1 -1]

[ 87 104 97 116 32 105 115 32 116 104

101 32 119 101 97 116 104 101 114 32

116 111 109 111 114 114 111 119]

[ 71 246 246 100 110 105 103 104 116 -1

-1 -1 -1 -1 -1 -1 -1 -1 -1 -1

-1 -1 -1 -1 -1 -1 -1 -1]

[128522 -1 -1 -1 -1 -1 -1 -1 -1 -1

-1 -1 -1 -1 -1 -1 -1 -1 -1 -1

-1 -1 -1 -1 -1 -1 -1 -1]]

Batch_chars_sparse=batch_chars_ragged.to_sparse()

When encoding multiple strings with the same lengths, a tf.Tensor may be used as input:

Tf.strings.unicode_encode([[99,97,116],[100,111,103],[99,111,119]],
output_encoding='UTF-8')

<tf.Tensor: shape=(3,), dtype=string, numpy=array([b'cat', b'dog', b'cow'], dtype=object)>

When encoding multiple strings with varying length, a tf.RaggedTensor should be used as input:

Tf.strings.unicode_encode(batch_chars_ragged,output_encoding='UTF-8')

<tf.Tensor: shape=(4,), dtype=string, numpy=

Array([b'h\xc3\x83llo', b'What is the weather tomorrow',

b'G\xc3\xb6\xc3\xb6dnight', b'\xf0\x9f\x98\x8a'], dtype=object)>

If you have a tensor with multiple strings in padded or sparse format, then convert it to a tf.RaggedTensor before calling unicode_encode:

Tf.strings.unicode_encode(
tf.RaggedTensor.from_sparse(batch_chars_sparse),
output_encoding='UTF-8')

<tf.Tensor: shape=(4,), dtype=string, numpy=

Array([b'h\xc3\x83llo', b'What is the weather tomorrow',

b'G\xc3\xb6\xc3\xb6dnight', b'\xf0\x9f\x98\x8a'], dtype=object)>

Tf.strings.unicode_encode(
tf.RaggedTensor.from_tensor(batch_chars_padded, padding=-1),
output_encoding='UTF-8')

<tf.Tensor: shape=(4,), dtype=string, numpy=

Array([b'h\xc3\x83llo', b'What is the weather tomorrow',

b'G\xc3\xb6\xc3\xb6dnight', b'\xf0\x9f\x98\x8a'], dtype=object)>

6. What are Unicode operations?

Character length

The tf.strings.length operation has a parameter unit, which indicates how lengths should be computed. Unit defaults to "BYTE", but it can be set to other values, such as "UTF8_CHAR" or "UTF16_CHAR", to determine the number of Unicode codepoints in each encoded string.

# Note that the final character takes up 4 bytes in UTF8.
thanks =u'Thanks😊'.encode('UTF-8')
num_bytes=tf.strings.length(thanks).numpy()
num_chars=tf.strings.length(thanks, unit='UTF8_CHAR').numpy()
print('{} bytes; {} UTF-8 characters'.format(num_bytes,num_chars))

11 bytes; 8 UTF-8 characters

Character substrings

Similarly, the tf.strings.substr operation accepts the "unit" parameter, and uses it to determine what kind of offsets the "pos" and "len" paremeters contain.

# default: unit='BYTE'. With len=1, we return a single byte.
tf.strings.substr(thanks,pos=7,len=1).numpy()

b'\xf0'

# Specifying unit='UTF8_CHAR', we return a single character, which in this case
# is 4 bytes.
print(tf.strings.substr(thanks,pos=7,len=1, unit='UTF8_CHAR').numpy())

b'\xf0\x9f\x98\x8a'

Split Unicode strings

The tf.strings.unicode_split operation splits unicode strings into substrings of individual characters:

Tf.strings.unicode_split(thanks,'UTF-8').numpy()

Array([b'T', b'h', b'a', b'n', b'k', b's', b' ', b'\xf0\x9f\x98\x8a'],

Dtype=object)

Byte offsets for characters

To align the character tensor generated by tf.strings.unicode_decode with the original string, it's useful to know the offset for where each character begins. The method tf.strings.unicode_decode_with_offsets is similar to unicode_decode, except that it returns a second tensor containing the start offset of each character.

Codepoints, offsets =tf.strings.unicode_decode_with_offsets(u"🎈🎉🎊",'UTF-8')

for(codepoint, offset)in zip(codepoints.numpy(),offsets.numpy()):
print("At byte offset {}: codepoint {}".format(offset,codepoint))

At byte offset 0: codepoint 127880

At byte offset 4: codepoint 127881

At byte offset 8: codepoint 127882

7. Write an Example: Simple segmentation

Segmentation is the task of splitting text into word-like units. This is often easy when space characters are used to separate words, but some languages (like Chinese and Japanese) do not use spaces, and some languages (like German) contain long compounds that must be split in order to analyze their meaning. In web text, different languages and scripts are frequently mixed together, as in "NY株価" (New York Stock Exchange).

We can perform very rough segmentation (without implementing any ML models) by using changes in script to approximate word boundaries. This will work for strings like the "NY株価" example above. It will also work for most languages that use spaces, as the space characters of various scripts are all classified as USCRIPT_COMMON, a special script code that differs from that of any actual text.

# dtype: string; shape: [num_sentences]
#
# The sentences to process. Edit this line to try out different inputs!
sentence_texts=[u'Hello, world.',u'世界こんにちは']

First, we decode the sentences into character codepoints, and find the script identifeir for each character.

# dtype: int32; shape: [num_sentences, (num_chars_per_sentence)]
#
# sentence_char_codepoint[i, j] is the codepoint for the j'th character in
# the i'th sentence.
sentence_char_codepoint=tf.strings.unicode_decode(sentence_texts,'UTF-8')
print(sentence_char_codepoint)

# dtype: int32; shape: [num_sentences, (num_chars_per_sentence)]
#
# sentence_char_scripts[i, j] is the unicode script of the j'th character in
# the i'th sentence.
sentence_char_script=tf.strings.unicode_script(sentence_char_codepoint)
print(sentence_char_script)

<tf.RaggedTensor [[72, 101, 108, 108, 111, 44, 32, 119, 111, 114, 108, 100, 46], [19990, 30028, 12371, 12435, 12395, 12385, 12399]]>

<tf.RaggedTensor [[25, 25, 25, 25, 25, 0, 0, 25, 25, 25, 25, 25, 0], [17, 17, 20, 20, 20, 20, 20]]>

Next, we use those script identifiers to determine where word boundaries should be added. We add a word boundary at the beginning of each sentence, and for each character whose script differs from the previous character:

# dtype: bool; shape: [num_sentences, (num_chars_per_sentence)]
#
# sentence_char_starts_word[i, j] is True if the j'th character in the i'th
# sentence is the start of a word.
sentence_char_starts_word=tf.concat(
[tf.fill([sentence_char_script.nrows(),1],True),
tf.not_equal(sentence_char_script[:,1:],sentence_char_script[:,:-1])],
axis=1)

# dtype: int64; shape: [num_words]
#
# word_starts[i] is the index of the character that starts the i'th word (in
# the flattened list of characters from all sentences).
word_starts=tf.squeeze(tf.where(sentence_char_starts_word.values), axis=1)
print(word_starts)

Tf.Tensor([ 0 5 7 12 13 15], shape=(6,), dtype=int64)

We can then use those start offsets to build a RaggedTensor containing the list of words from all batches:

# dtype: int32; shape: [num_words, (num_chars_per_word)]
#
# word_char_codepoint[i, j] is the codepoint for the j'th character in the
# i'th word.
word_char_codepoint=tf.RaggedTensor.from_row_starts(
values=sentence_char_codepoint.values,
row_starts=word_starts)
print(word_char_codepoint)

<tf.RaggedTensor [[72, 101, 108, 108, 111], [44, 32], [119, 111, 114, 108, 100], [46], [19990, 30028], [12371, 12435, 12395, 12385, 12399]]>

And finally, we can segment the word codepointsRaggedTensor back into sentences:

# dtype: int64; shape: [num_sentences]
#
# sentence_num_words[i] is the number of words in the i'th sentence.
sentence_num_words=tf.reduce_sum(
tf.cast(sentence_char_starts_word, tf.int64),
axis=1)

# dtype: int32; shape: [num_sentences, (num_words_per_sentence), (num_chars_per_word)]
#
# sentence_word_char_codepoint[i, j, k] is the codepoint for the k'th character
# in the j'th word in the i'th sentence.
sentence_word_char_codepoint=tf.RaggedTensor.from_row_lengths(
values=word_char_codepoint,
row_lengths=sentence_num_words)
print(sentence_word_char_codepoint)

<tf.RaggedTensor [[[72, 101, 108, 108, 111], [44, 32], [119, 111, 114, 108, 100], [46]], [[19990, 30028], [12371, 12435, 12395, 12385, 12399]]]>

To make the final result easier to read, we can encode it back into UTF-8 strings:

Tf.strings.unicode_encode(sentence_word_char_codepoint,'UTF-8').to_list()

[[b'Hello', b', ', b'world', b'.'],

[b'\xe4\xb8\x96\xe7\x95\x8c',

b'\xe3\x81\x93\xe3\x82\x93\xe3\x81\xab\xe3\x81\xa1\xe3\x81\xaf']]

8. What are some of the String Manipulation operations explain in brief?

To manipulate strings, we can use some of Pythons built-in methods.

Creation

Word="Hello World"

>>>print word

Hello World

Accessing

Use [ ] to access characters in a string

Word="Hello World"

Letter=word[0]

>>>print letter

Length

Word="Hello World"

>>>len(word)

Finding

Word="Hello World">>>printword.count('l')# count how many times l is in the string

>>>printword.find("H")# find the word H in the string

>>>printword.index("World")# find the letters World in the string

Count

s ="Count, the number of spaces"

>>>prints.count(' ')

Slicing

Use [ # : # ] to get set of letter

Keep in mind that python, as many other languages, starts to count from 0!!

Word="Hello World"

Print word[0]#get one char of the word

Print word[0:1]#get one char of the word (same as above)

Print word[0:3]#get the first three char

Print word[:3]#get the first three char

Print word[-3:]#get the last three char

Print word[3:]#get all but the three first char

Print word[:-3]#get all but the three last character

Word="Hello World"

Word[start:end]# items start through end-1

Word[start:]# items start through the rest of the list

Word[:end]# items from the beginning through end-1

Word[:]# a copy of the whole list

Split Strings

Word="Hello World"

>>>word.split(' ')# Split on whitespace

['Hello','World']

Startswith / Endswith

Word="hello world"

>>>word.startswith("H")

True

>>>word.endswith("d")

True

>>>word.endswith("w")

False

Repeat Strings

Print"."*10# prints ten dots

>>>print"."*10

..........

Replacing

Word="Hello World"

>>>word.replace("Hello","Goodbye")

'Goodbye World'

Changing Upper and Lower Case Strings

String="Hello World"

>>>printstring.upper()

HELLO WORLD

>>>printstring.lower()

Hello world

>>>printstring.title()

Hello World

>>>printstring.capitalize()

Hello world

>>>printstring.swapcase()

HELLOwORLD

Reversing

String="Hello World"

>>>print' '.join(reversed(string))

d l r o W o l l e H

Strip

Python strings have the strip(), lstrip(), rstrip() methods for removing
any character from both ends of a string.

If the characters to be removed are not specified then white-space will be removed

Word="Hello World"

Strip off newline characters from end of the string

>>>printword.strip('

Hello World

Strip()#removes from both ends

Lstrip()#removes leading characters (Left-strip)

Rstrip()#removes trailing characters (Right-strip)

>>>word=" xyz "

>>>print word

Xyz

>>>printword.strip()

Xyz

>>>printword.lstrip()

Xyz

>>>printword.rstrip()

Xyz

Concatenation

To concatenate strings in Python use the “+” operator.

"Hello "+"World"# = "Hello World"

"Hello "+"World"+"!"# = "Hello World!"

Join

>>>print":".join(word)# #add a : between every char

H:e:l:l:o::W:o:r:l:d

>>>print" ".join(word)# add a whitespace between every char

H e l l o W o r l d

9. Explain Compare strings in python

You can use ( > , < , <= , <= , == , != ) to compare two strings. Python compares string lexicographically i.e using ASCII value of the characters.

Suppose you have str1 as "Mary" andstr2 as "Mac". The first two characters from str1 andstr2 ( M and M ) are compared. As they are equal, the second two characters are compared. Because they are also equal, the third two characters (r and c ) are compared. And because r has greater ASCII value than c, str1 is greater than str2.

Here are some more examples:

>>> "tim" == "tie"

False

>>> "free" != "freedom"

True

>>> "arrow" > "aron"

True

>>> "right" >= "left"

True

>>> "teeth" < "tee"

False

>>> "yellow" <= "fellow"

False

>>> "abc" > ""

True

>>>

Try it out:

Top of Form

Print("tim" == "tie")

Print("free" != "freedom")

Print("arrow" > "aron")

Print("right" >= "left")

Print("teeth" < "tee")

Print("yellow" <= "fellow")

Print("abc" > "")

False

True

False

True

10. Explain String Concatenation in Python

String Concatenation is the technique of combining two strings. String Concatenation can be done using many ways.

We can perform string concatenation using following ways:

Using + operator
Using join() method
Using % operator
Using format() function

Using + Operator

It’s very easy to use + operator for string concatenation. This operator can be used to add multiple strings together. However, the arguments must be a string.

Note: Strings are immutable, therefore, whenever it is concatenated, it is assigned to a new variable.

Example:

# Python program to demonstrate

# string concatenation

# Defining strings

Var1 ="Hello "

Var2 ="World"

# + Operator is used to combine strings

Var3 =var1 +var2

Print(var3)

Output:

Hello World

Here, the variable var1 stores the string “Hello ” and variable var2 stores the string “World”. The + Operator combines the string that is stored in the var1 and var2 and stores in another variable var3.

Using join() Method

The join() method is a string method and returns a string in which the elements of sequence have been joined by str separator.

Example:

# Python program to demonstrate

# string concatenation

Var1 ="Hello"

Var2 ="World"

# join() method is used to combine the strings

Print("".join([var1, var2]))

# join() method is used here to combine

# the string with a separator Space(" ")

Var3 =" ".join([var1, var2])

Print(var3)

Output:

HelloWorld

Hello World

In the above example, the variable var1 stores the string “Hello” and variable var2 stores the string “World”. The join() method combines the string that is stored in the var1 and var2. The join method accepts only the list as it’s argument and list size can be anything. We can store the combined string in another variable var3 which is separated by space.

Using % Operator

We can use % operator for string formatting, it can also be used for string concatenation. It’s useful when we want to concatenate strings and perform simple formatting.

Example:

# Python program to demonstrate

# string concatenation

Var1 ="Hello"

Var2 ="World"

# % Operator is used here to combine the string

Print("% s % s"%(var1, var2))

Output:

Hello World

Here, the % Operator combine the string that is stored in the var1 and var2. The %s denotes string data type. The value in both the variable is passed to the string %s and becomes “Hello World”.

Using format() function

Str.format() is one of the string formatting methods in Python, which allows multiple substitutions and value formatting. This method lets us concatenate elements within a string through positional formatting.

Example:

# Python program to demonstrate

# string concatenation

Var1 ="Hello"

Var2 ="World"

# format function is used here to

# combine the string

Print("{} {}".format(var1, var2))

# store the result in another variable

Var3 ="{} {}".format(var1, var2)

Print(var3)

Output:

Hello World

Here, the format() function combines the string that is stored in the var1 and var2 and stores in another variable var3. The curly braces {} are used to set the position of strings. The first variable stores in the first curly braces and second variable stores in the second curly braces. Finally it prints the value “Hello World”.

Sign Up