Reading binary data in C#

In the C# newsgroup, I've seen quite a lot of code for reading in data from a file like this:

// Bad code! Do not use!
FileStream fs = File.OpenRead(filename);
byte[] data = new byte[fs.Length];
fs.Read (data, 0, data.Length);

This code is far from guaranteed to work. In particular, the FileStream could be reading just the first 10 bytes of the file into the buffer. The Read method is only guaranteed to block until some data is available (or the end of the stream is reached), not until all of the data is available. That's where the return value (which is ignored in the above code) is vital. You need to cope with the case where you can't read all of the data in one go, and loop round until you've read what you want. Here's a method which you can use if you want to read from a stream into the whole of an array, not stopping until it's finished:

/// <summary>
/// Reads data into a complete array, throwing an EndOfStreamException
/// if the stream runs out of data first, or if an IOException
/// naturally occurs.
/// </summary>
/// <param name="stream">The stream to read data from</param>
/// <param name="data">The array to read bytes into. The array
/// will be completely filled from the stream, so an appropriate
/// size must be given.</param>
public static void ReadWholeArray (Stream stream, byte[] data)
{
    int offset=0;
    int remaining = data.Length;
    while (remaining > 0)
    {
        int read = stream.Read(data, offset, remaining);
        if (read <= 0)
            throw new EndOfStreamException 
                (String.Format("End of stream reached with {0} bytes left to read", remaining));
        remaining -= read;
        offset += read;
    }
}

Sometimes, you don't know the length of the stream in advance (for instance a network stream) and just want to read the whole lot into a buffer. Here's a method to do just that:

/// <summary>
/// Reads data from a stream until the end is reached. The
/// data is returned as a byte array. An IOException is
/// thrown if any of the underlying IO calls fail.
/// </summary>
/// <param name="stream">The stream to read data from</param>
public static byte[] ReadFully (Stream stream)
{
    byte[] buffer = new byte[32768];
    using (MemoryStream ms = new MemoryStream())
    {
        while (true)
        {
            int read = stream.Read (buffer, 0, buffer.Length);
            if (read <= 0)
                return ms.ToArray();
            ms.Write (buffer, 0, read);
        }
    }
}

While the above is simple, it's not terribly efficient, as it ends up copying the data at the very end, and probably several times between. Here's some code which works well if you know the expected length of data to start with. (While you could use Stream.Length, it isn't supported for all streams.)

/// <summary>
/// Reads data from a stream until the end is reached. The
/// data is returned as a byte array. An IOException is
/// thrown if any of the underlying IO calls fail.
/// </summary>
/// <param name="stream">The stream to read data from</param>
/// <param name="initialLength">The initial buffer length</param>
public static byte[] ReadFully (Stream stream, int initialLength)
{
    // If we've been passed an unhelpful initial length, just
    // use 32K.
    if (initialLength < 1)
    {
        initialLength = 32768;
    }
    
    byte[] buffer = new byte[initialLength];
    long read=0;
    
    int chunk;
    while ( (chunk = stream.Read(buffer, read, buffer.Length-read)) > 0)
    {
        read += chunk;
        
        // If we've reached the end of our buffer, check to see if there's
        // any more information
        if (read == buffer.Length)
        {
            int nextByte = stream.ReadByte();
            
            // End of stream? If so, we're done
            if (nextByte==-1)
            {
                return buffer;
            }
            
            // Nope. Resize the buffer, put in the byte we've just
            // read, and continue
            byte[] newBuffer = new byte[buffer.Length*2];
            Array.Copy(buffer, newBuffer, buffer.Length);
            newBuffer[read]=(byte)nextByte;
            buffer = newBuffer;
            read++;
        }
    }
    // Buffer is now too big. Shrink it.
    byte[] ret = new byte[read];
    Array.Copy(buffer, ret, read);
    return ret;
}

Using code such as the above, whether synchronously or asynchronously, you shouldn't come across the kinds of error that can otherwise occur, such as data which appears to be corrupted or truncated.

Note that because read here is a long, this code can theoretically cope with files larger than 2GB. In practice I believe all CLR implementations at the time of this writing will choke when asked to create such a large object, but in the future this restriction may be relaxed.


Back to the main page.