Description
The goal of this tutorial is to guide you through creating a function that reads a line from a file descriptor. This ensures that repeated calls to the function read the text file one line at a time. We will cover how to manage buffers, handle static variables for data persistence, and properly allocate and free memory.
Single File / Input
We start calling get_next_line() with our file descriptor. This will be either 0 for when we want to use the standard input or any other number generated by the function read(). The initial call would look like the following:
int fd = open("my_file", O_RDONLY);
get_next_line(fd);
// or
get_next_line(1);The get_next_line() returns a string, more specifically a pointer to the first string found when reading the file (a string will be all characters before a \n or EOF).
So, in order to be able to access our retrieved line, we need to add a variable to store it:
char *my_line;
int fd = open("my_file", O_RDONLY);
my_line = get_next_line(fd);
// or
my_line = get_next_line(1);Memory Buffer Allocation
From there, we can step into our get_next_line() function.
The first thing we need is to allocate memory for our buffer. It will be responsible for accumulating the characters as we read our file.
For that, we also need the size (number of bytes) we are accepting every time we read the file. Let's call it BUFFER_SIZE.
char *get_next_line(int fd)
{
char *buffer;
buffer = allocate_buffer(fd, BUFFER_SIZE);
if (!buffer)
return (NULL);
}This new function allocate_buffer(int fd, int buffer_size) will have to return n number of bytes (or characters) from our file so they can be stored at buffer and we can continue processing it. At this point, we are not concerned with stopping on \n or EOF.
The function needs to be sure we are processing valid fd and buffer_size parameters:
char *allocate_buffer(int fd, int buffer_size)
{
if (fd < 0 || buffer_size <= 0)
return (NULL);
}As an addendum, remember that we are opening our file with int fd = open("my_file", O_RDONLY);. Upon successful completion, the open() function shall open the file and return a non-negative integer representing the lowest numbered unused file descriptor. Otherwise, -1 shall be returned. No files shall be created or modified if the function returns -1. On the same note, we also do not want our buffer_size to be 0 or less, because this would mean asking for 0 or negative bytes at every pass, which makes no sense.
Next, we need to allocate enough space into a char variable so we can return it to the buffer in get_next_line().
char *allocate_buffer(int fd, int buffer_size)
{
char *buffer;
if (fd < 0 || buffer_size <= 0)
return (NULL);
buffer = malloc(sizeof(char) * (buffer_size + 1));
if (!buffer)
return (NULL);
return (buffer);
}We want to allocate buffer_size + 1 because we want to end the buffer with a null terminator \0, so the buffer variable can be easily processed back in the get_next_line() function.
Main Logic: Loop & Stash
Now we can come back to our get_next_line() function.
From here we want to iterate through our file until we find a \n or EOF. The problem we have now is that we do not have a good way to store our data. Explaining it with an example: say we are reading our file every 5 bytes (BUFFER_SIZE = 5). If no \n or EOF is found, we want to store it somewhere and continue reading (and adding the subsequent bytes to the previously read ones) until we reach a \n or EOF.
This is where we can create a persistent variable called stash. It will be responsible for accumulating all passes of our function, concatenating all bytes (characters), until we find a \n or EOF.
The static keyword in C, when declared inside the scope of a function, behaves just like that. The first time the function runs, it creates the static variable in memory, and when the function terminates, it persists in memory. If the same function is called again, we can continue using what's inside our statically declared variable.
char *get_next_line(int fd)
{
static char *stash;
char *buffer;
ssize_t bytes_read;
buffer = allocate_buffer(fd, BUFFER_SIZE);
if (!buffer)
return (NULL);
bytes_read = 1;
while (newline_index(stash) == -1 && bytes_read > 0)
{
// do something.
}
}This is how our code looks thus far. We will have to implement a function that checks if the stash contains a \n (newline_index(stash)) and also a way of entering our loop (bytes_read = 1) and exiting it afterwards (by modifying the bytes_read value). Those two responsibilities are being dealt with at this line: while(newline_index(stash) == -1 && bytes_read > 0). This reads as: "While our stash does not contain a \n and the number of bytes read was greater than zero (because if it is 0 we found EOF), continue reading the file and adding to the stash".
So let's start by defining our newline_index(const char *str).
We want it to accept a const string because we do not want to modify anything inside this function; we just want to check if there is a \n somewhere. We also want it to return where the \n is located if it encounters one (this will be helpful afterwards).
int newline_index(const char *str)
{
const char *ptr;
ptr = str;
if (!ptr)
return (-1);
while (*ptr)
{
if (*ptr == '\n')
return (ptr - str);
ptr++;
}
return (-1);
}This function is pretty straightforward. If the pointer for the string is NULL, it returns -1. Otherwise, it iterates through str searching for a \n. In case it finds it, it returns the index of where it is located; otherwise, if it doesn't find it, it returns -1.
This explains why we assigned newline_index(stash) == -1 in our get_next_line() function loop condition.
With this out of the way, we only need to find a way to check how many bytes were retrieved from our file in order to have all the building blocks to continue our function. The read() function returns the number of bytes read, 0 on EOF, and -1 on an error. This will be handy to update our already existing bytes_read, since anything other than a 0 or a negative number is a valid amount to be processed.
char *get_next_line(int fd)
{
static char *stash;
char *buffer;
ssize_t bytes_read;
// ... allocate_buffer check ...
buffer = allocate_buffer(fd, BUFFER_SIZE);
if (!buffer)
return (NULL);
bytes_read = 1;
while (newline_index(stash) == -1 && bytes_read > 0)
{
bytes_read = read(fd, buffer, BUFFER_SIZE);
if (bytes_read == -1)
{
free(buffer);
return (NULL);
}
}
}The read() function tries to read up to BUFFER_SIZE number of bytes from fd (our file) and allocate them in our buffer variable. If there is an error (read() returning -1), we need to free our buffer (since memory was allocated with malloc() for it) to avoid a memory leak, and return NULL.
After that, a certain amount of bytes will be stored in our buffer variable. Remember that we allocated memory for it using a +1, space necessary for a null terminator \0. We need to insert it there. Since the bytes_read variable contains the number of characters (bytes) written in buffer, we can access where our null terminator should be by indexing it in buffer after our bytes_read.
// Inside the while loop...
bytes_read = read(fd, buffer, BUFFER_SIZE);
if (bytes_read == -1)
{
free(buffer);
return (NULL);
}
buffer[bytes_read] = '\0';We now have a buffer with a certain number of characters and we need to add it to whatever already exists in our stash. Remember that buffer contains only what was currently read in one iteration of our while loop. We still need to add to our stash, concatenating everything until a new line \n is found. We can achieve this by using a slightly modified version of strjoin. stash = strjoin_gnl(stash, buffer);
This function needs to accomplish a few things:
- If the
bufferis empty (orNULL), we should returnNULL(or handle carefully). - During the first iteration, it needs to add the
bufferto an emptystash. - During all other iterations, it needs to concatenate the
bufferto the currentstash. - Allocate memory for the new size of the
stash(since it can be growing). - Null terminate the
stashafter concatenatingbufferand what was instash.
To keep track of it all, let's create a few variables:
char *strjoin_gnl(char *old_stash, char *buffer)
{
size_t old_stash_len;
size_t buffer_len;
char *result;
}Here, we will be passing our current stash to the old_stash function parameter. The name was chosen to better exemplify that this is the not-yet-processed stash.
To concatenate our stash (old_stash) with our buffer properly, we need their lengths. We will also be returning the result variable (this is where our concatenated old_stash and buffer will reside).
Let's start with our prerequisites:
bufferis invalid:
if (!buffer)
return (NULL);old_stash_len: During the first iteration,old_stashmight beNULL.
if (old_stash)
old_stash_len = _strlen(old_stash);
else
old_stash_len = 0;This needs to be phrased like that because during the first iteration, our stash is not yet initialized.
buffer_len:
buffer_len = _strlen(buffer);At this point, we can start allocating memory for our new stash. Putting it all together:
char *strjoin_gnl(char *old_stash, char *buffer)
{
size_t old_stash_len;
size_t buffer_len;
char *result;
if (!buffer)
return (NULL);
if (old_stash)
old_stash_len = _strlen(old_stash);
else
old_stash_len = 0;
buffer_len = _strlen(buffer);
result = allocate_strjoin_gnl(old_stash_len, buffer_len);
// ...
}Our allocate_strjoin_gnl() function removes the "burden" of allocating inside strjoin_gnl(), making the code a bit cleaner and modular.
char *allocate_strjoin_gnl(int old_stash_len, int buffer_len)
{
char *result;
result = (char *)malloc(sizeof(char) * (old_stash_len + buffer_len + 1));
if (!result)
return (NULL);
return (result);
}It needs to allocate with +1 due to the fact that we need a null terminator at the end of our stash.
Continuing our strjoin_gnl() function, we now need to add all elements contained in our buffer to the end of our stash. Since we will be using pointer arithmetic to traverse result and old_stash, let's declare 2 more variables:
char *head;
char const *old_stash_it;From here we can set head to our result, since it's the one we want to keep as the pointer to the beginning of our string, and also set old_stash_it to the start of our old_stash to iterate through it without losing old_stash (so we can free it later).
char *strjoin_gnl(char *old_stash, char *buffer)
{
// ... (vars and alloc)
old_stash_it = old_stash;
// ...
head = result;Now we have all the elements to start concatenating our strings.
if (old_stash)
while (*old_stash_it)
*result++ = *old_stash_it++;
while (*buffer)
*result++ = *buffer++;
*result = '\0';
if (old_stash)
free(old_stash);
return (head);
}Here we are firstly certifying that our old_stash pointer exists, then iterating through old_stash_it and copying it to the result.
Following that, we continue using our result pointer to add the buffer.
After that, we set the last element of result to a null terminator and free our old_stash, since we won't use it anymore. We return head (our result string).
Helper Functions & Cleanup
From here we are back at our get_next_line() function.
We can store what we created with strjoin_gnl() into our static char stash.
char *get_next_line(int fd)
{
static char *stash;
char *buffer;
ssize_t bytes_read;
buffer = allocate_buffer(fd, BUFFER_SIZE);
if (!buffer)
return (NULL);
bytes_read = 1;
while (newline_index(stash) == -1 && bytes_read > 0)
{
bytes_read = read(fd, buffer, BUFFER_SIZE);
if (bytes_read == -1)
{
free(buffer);
return (NULL);
}
buffer[bytes_read] = '\0';
stash = strjoin_gnl(stash, buffer);
}
free(buffer);
}This is all our while loop needs to contain; it will loop through the file until it finds a new line \n in stash or file ends.
After that, we can free the buffer.
From here onwards there are only a few more things that we need to do. We need to extract our line from stash until it finds a \n, since stash itself does not guarantee that it ends precisely on a \n. We also need to update our stash to be used again during another call of our get_next_line() function (we need to remove the extracted line from stash).
We need to store our extracted line in a variable char *line, update our stash, and return our line.
// ... inside get_next_line ...
free(buffer);
line = extract_line(stash);
stash = update_stash(stash);
return (line);
}This is the whole function structure. We still need to create extract_line() and update_stash().
Let's start with extract_line(char *str). The objective is simple: find in stash either the first occurrence of a \n or \0 and return the string that came before it (including the \n). We can re-utilize newline_index() here.
char *extract_line(char *str)
{
char *extracted_line;
int index;
index = newline_index(str);
// ...
}If a valid index is found, we can extract it using a simple function like _substr(char *str, start, length).
Here below are the helper functions, _memmove() is required for _substr() to work correctly.
char *_substr(char const *s, unsigned int start, size_t len)
{
size_t s_len;
size_t copy_len;
char *sub;
if (!s)
return (NULL);
s_len = _strlen(s);
if (start >= s_len)
copy_len = 0;
else if (len > s_len - start)
copy_len = s_len - start;
else
copy_len = len;
sub = (char *)malloc(sizeof(char) * (copy_len + 1));
if (!sub)
return (NULL);
if (copy_len > 0)
_memmove(sub, s + start, copy_len);
sub[copy_len] = '\0';
return (sub);
}void *_memmove(void *dest, const void *src, size_t n)
{
unsigned char *ptr_d;
const unsigned char *ptr_s;
if (!dest && !src)
return (0);
ptr_d = (unsigned char *)dest;
ptr_s = (const unsigned char *)src;
if (ptr_d > ptr_s)
{
ptr_d += n;
ptr_s += n;
while (n-- > 0)
*--ptr_d = *--ptr_s;
}
else
{
while (n-- > 0)
*ptr_d++ = *ptr_s++;
}
return (dest);
}Back to extract_line(), we can assign and return the extracted_line variable.
char *extract_line(char *str)
{
char *extracted_line;
int index;
index = newline_index(str);
// Handle EOF / No newline found
if (index == -1)
{
if (!str || str[0] == '\0')
return (NULL);
index = _strlen(str);
return (_substr(str, 0, index));
}
extracted_line = _substr(str, 0, index + 1);
return (extracted_line);
}We use index + 1 because we want the returned string to include the new line \n.
For our last function update_stash(), we need to remove the first line found (including the new line character) from the beginning of the stash. We also need to free stash when all the input has been processed.
char *update_stash(char *stash)
{
int i;
int j;
char *new_stash;
i = 0;
while (stash[i] && stash[i] != '\n')
i++;
if (!stash[i])
{
free(stash);
return (NULL);
}
new_stash = malloc(sizeof(char) * (_strlen(stash) - i + 1));
if (!new_stash)
{
free(stash);
return (NULL);
}
i++;
j = 0;
while (stash[i])
new_stash[j++] = stash[i++];
new_stash[j] = '\0';
free(stash);
return (new_stash);
}The first while loop reaches either a \n or \0. Then we check if !stash[i]. If it is \0, it means we reached the end of the stash without a newline (or it was just processed), so we free the stash and return NULL.
Otherwise, we allocate memory for the rest of the string, copy it, free the old stash, and return the new one.
With that, our function is complete. We now only need to call it in a loop when wanting to read a file:
int main(void)
{
int fd = open("my_file", O_RDONLY);
char *my_line;
while ((my_line = get_next_line(fd)) != NULL)
{
write(1, my_line, _strlen(my_line));
free(my_line);
}
close(fd);
return (0);
}Here in our main, we call it in a loop, assigning the return of get_next_line() to a variable, printing that variable, and then freeing it. This will print the whole file, line by line.
You can find the full code at: get_next_line