Skip to content

Strings

A string is a representation of words, sentences, etc. Since they are sequences of characters, a string is represented as an array of characters. However, it is a special kind of array of characters that has to satisfy an additional constraint.

Definition

String

A string has two specify two rules: 1. An array of character. 2. Ends with a null character (i.e., '\0' corresponding to binary 00000000 or simply 0).

Based on the definition above, not all array of characters are string if the array does not satisfy the second condition.

A String

The following arrays are strings:

  1. char code[7] = {'c', 's', '2', '1', '0', '0', '\0'}
  2. char name[] = {'C','o','m','p',' ','O','r','g',0}
  3. char who[] = {65, 100, 105, 0}

Note the use of integer 0 on the second example as opposed to the null character '\0' in the first example. This is accepted because of implicit conversion from 4 bytes int to 1 byte char since they interchangeable for small integers. This is taken to the extreme in the third example.

Not A String

The following arrays are not strings:

  1. char code[7] = {'c', 's', '1', '0', '1', '0', 'e'}
  2. int name[] = {'C','o','m','p',' ','O','r','g',0}
  3. int who[] = {65, 100, 105}

The first example violates the second rule because it does not end with a null character. The second example violates the first rule because it is not an array of characters. We take this to the extreme again with the third example that violates both rules.

Using the definition, we can then declare a string as an array of characters with a certain maximum size1. We can then either initialise it during declaration using an array declaration {elem, elem, ...} or assign each character to an index on the array. Luckily for us, C provides a simpler way to initialise a string with a syntax you are familiar with, the double-quotes.

No Single-Quote String

If you are coming from Python or JavaScript, you may have developed a kind of muscle-memory to simply use single-quotes because then you do not have to press the Shift button. If that is the case, you need to undo that muscle-memory for C.

CreateString.c

CreateString.c
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
/* Assigning Characters */
char str[6]; // declaration
str[0] = 'e';
str[1] = 'g';
str[2] = 'g';
str[3] = '\0'; // without '\0', it is just an array of character and not a string

/* Declare + Initialise */
char fruit1[] = "apple"; // do not need '\0' as it is automatically added
char fruit2[] = {'a','p','p','l','e','\0'}; // manually add '\0'

"egg"

In the example above, the string str is of particular interest. We allocated 6 spaces on the array, but only use 4 of those spaces. What would the visualisation look like? In memory, we would see something like the image on the right.

Input/Output

The strings that we have declared above are currently fixed according to the program text. What if we want to read user input? And a similar problem is, how do we print a string?

The format identifier for string is %s. So now, we can use both scanf() and printf() functions to read string and print the string respectively.

StrIO.c

StrIO.c
1
2
3
4
scanf("%s", str);    // reads until white space
                     // note that we do not need to use &str
                     // since it is already an address
printf("%s\n", str); // prints until '\0' excluding '\0'

This would immediately pose a problem. How do we know we have allocated enough space for the string? Remember, string is an array of character and an array have to be declared with a maximum size. However, the number of characters being read may be larger than this maximum size.

As a side note, the size of the array should be the maximum number of characters to be read +1. The +1 is important because we need to accommodate for the terminating null character '\0' which takes up one slot on the array.

For a safer input reading, C provides an alternative function to read user input that specifies the maximum size of the array to store. This function also comes with a corresponding function to print that automatically adds a newline '\n'.

String I/O

fgets Prototype
1
char* fgets(char *str, int n, FILE *stream);
puts Prototype
1
int fgets(char *str);

Important Notes

  • fgets()

    1. str is the string to store the characters (i.e., array/pointer of characters).
    2. n is the maximum size of str. The number of characters to be read is n-1.
    3. stream is the input stream. For keyboard, it is the standard input stdin.
    4. The return value of the type char* is exactly the same str parameters. We typically ignore this result.
    5. Note that the function also stop reading input when a newline is read. This newline is added to the string.
  • puts()

    1. str is the string to be printed.
    2. The return value is a non-negative integer if the operation is successful. Otherwise, the function returns EOF. We typically ignore this result.
    3. Note that a newline character is automatically added to the printed output.
gets(str)

There is another function called gets(str) to read a string interactively. However, due to security reason, we avoid this and use fgets() function instead.

Due to the newline character '\n' potentially being read by the function fgets(), we will need to remove this newline character if it is actually being read. To do that, the typical procedure is to check the last character being read. If the last character is '\n', then we replace it with '\0' to terminate it here instead.

In the template below, we use the function strlen(str) to find the number of characters of the string. This will be explained when we talk about string functions. For now, it is sufficient to note that the function strlen(str) returns the number of characters in the string str excluding '\0'.

Reading with fgets
1
2
3
4
fgets(str, size, stdin); // stdin is the stream corresponding to user keyboard
len = strlen(str);
if(str[len - 1] == '\n')
  str[len - 1] = '\0';   // one statement if can exclude the use of { }

Differences Between scanf/printf and fgets/puts

StringIO1.c
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#include <stdio.h>
#define LENGTH 10

int main(void) {
  char str[LENGTH];

  printf("Enter string (at most %d characters): ", LENGTH-1);
  scanf("%s", str); 
  printf("str = %s\n", str); 
  return 0;
}

StringIO2.c
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#include <stdio.h>
#define LENGTH 10

int main(void) {
  char str[LENGTH];

  printf("Enter string (at most %d characters): ", LENGTH-1);
  fgets(str, LENGTH, stdin);
  printf("str = ");
  puts(str);
  return 0;
}

Try the examples above with the following input:

My book

You will find the difference in behaviour for the two programs. The outputs are shown below:

Input is 'My book'
1
str = My
Input is 'My book'
1
2
str = My
‎‎

Newlines

Notice how we have two newline characters when we use the combination of fgets() and puts(). This is because the function fgets() also reads the newline characters. As such, the string being read is already str = "My book\n". When we print using puts(), another newline is then added. Thus, the actual string being printed is "My book\n\n". This is the source of the two newline characters being printed.

Remove Vowels

Write a program RemoveVowels.c to remove all vowels in a given input string. You may assume that the input string has at most 100 characters.

Sample Run
1
2
Enter a string: How Have you been, James?
Changed string: Hw HV y bn, Jms?
RemoveVowels.c
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int main(void) {
  int i, len, count = 0;
  char str[101], newstr[101];

  printf("Enter a string (at most 100 characters): ");
  fgets(str, 101, stdin); //what happens if you use scanf() here?
  len = strlen(str);      // strlen() returns number of char in string
  if (str[len - 1] == '\n') 
    str[len - 1] = '\0';
  len = strlen(str); // check length again

  for (i=0; i<len; i++) {
    switch (toupper(str[i])) {
      case 'A': case 'E':
      case 'I': case 'O': case 'U': break; 
      default: newstr[count++] = str[i];
    }
  }
   newstr[count] = '\0';
  printf("New string: %s\n", newstr);
  return 0;
}
  1. To use the function strlen(), we need to include #include <string.h>.
  2. To use the function toupper(), we need to include #include <ctype.h>.

String Functions

C provides a library of string functions. To use them, we must include the header #include <string.h>. Some of the commonly used string functions are listed below:

  1. strlen(s) (string length)
    • Returns the number of characters in s.
  2. strcmp(s1,s2) (string compare)
    • Compare the ASCII values of the corresponding characters in strings s1 and s2 pairwise.
    • The return value should satisfy the following condition:
      • Negative: If string s1 is lexicographically less than than the string s2.
      • Positive: If string s1 is lexicographically greater than than the string s2.
      • Zero: If string s1 is lexicographically equal to the string s2.
  3. strncmp(s1,s2) (string compare up to n)
    • Compare the first n characters of string s1 and string s2.
  4. strcpy(dst,src) (string copy)

    • Copy the string pointed to by src into an array pointed to by dst.
    • The return value is dst.
    • Important Note
      • The following assignment statement does not work.

        Invalid Assignment
        1
        2
        char name[10];
        name = "Matthew";
        

        The reason is the same as the reason that we cannot initialise an array after declaration. Since a string is still an array, this is also not allowed. * If the string to be copied is too long, the copying will simply overwrite whatever memory is present. This may cause undefined behaviour.

        Too Long
        1
        2
        char name[10];
        strcpy(name, "A very long name");
        

        The visualisation will look like the image below.

        Long String

  5. strncpy(dst,src) (string copy up to n)

    • Copy the first n characters of string pointed to by src to dst.

Importance of Null Character

The two rules in the definition of a string above are strict and they affect all the string functions above. In particular, the string functions as well as printf() will not work properly without it. In many case, a string that is not properly terminated with '\0' will result in illegal access of memory.

To make it clearer, we will describe the functions above except the up to n using pseudo-codes. This will also make clear certain (possibly) weird return value in string comparisons.

Pseudo-Codes of String Functions

String Length
1
2
3
4
5
6
# input : a string s
# output: the length of the string s in res
res := 0
while (s[res] != '\0') do
  res := res + 1
end
String Compare
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# input : a string s1
#         a string s2
# output: negative if s1 < s2
#         positive if s1 > s2
#         zero     if s1 = s2
#         result stored in res
idx := 0
while (s1[idx] != '\0' and s2[idx] != '\0') do
  tmp := s1[idx] - s2[idx]  # a simple way to satisfy the output
  if (tmp != 0) do
    res := tmp
    break
  end
end
if (s1[idx] != '\0' or s2[idx] != '\0') do
  res := s1[idx] - s2[idx]  # a simple way to satisfy the output
end
String Length
1
2
3
4
5
6
7
8
9
# input : a string dst
#         a string src
# output: 
idx := 0
while (src[idx] != '\0') do
  dst[idx] := src[idx] # copying
  idx      := idx + 1
end
dst[idx] := src[idx]   # don't forget the '\0'

WithoutNullChar.c

WithoutNullChar.c
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
#include <stdio.h>
#include <string.h>

int main(void) {
  char str[10];

  str[0] = 'a';
  str[1] = 'p';
  str[2] = 'p';
  str[3] = 'l';
  str[4] = 'e';

  printf("Length = %lu\n", strlen(str)); // %lu because it is long unsigned int
  printf("str = %s\n", str); 

  return 0;
}

If you simply run it, Clang will actually initialise the array with all 0. Since 0 is the null character, then you will not be able to see the problem. To do so, you need to compile it with GCC:

  1. Click on "Shell" tab.
  2. Compile with GCC using gcc main.c.
  3. Execute teh code using ./a.out.

You may see the following output (results may vary).

Possible Output
1
2
Length = 8
str = apple�

  1. The maximum size is because array declaration requires us to specify the maximum size.