How to read large file from SQL Server?

How to read large file from SQL Server? - c#

I tried to read file (650 megabytes) from SQL Server:
using (var reader = command.ExecuteReader(CommandBehavior.SequentialAccess))
{
if (reader.Read())
{
using (var dbStream = reader.GetStream(0))
{
if (!reader.IsDBNull(0))
{
stream.Position = 0;
dbStream.CopyTo(stream, 256);
}
dbStream.Close();
}
}
reader.Close();
}
But I got OutOfMemoryException on CopyTo().
With small files, this code snippet works fine. How can I handle large file?

You can read and write data to some temp file in small chunks. You can see example on MSDN - Retrieving Binary Data.
//Column Index in the result set
const int colIdx = 0;
// Writes the BLOB to a file (*.bmp).
FileStream stream;
// Streams the BLOB to the FileStream object.
BinaryWriter writer;
// Size of the BLOB buffer.
int bufferSize = 100;
// The BLOB byte[] buffer to be filled by GetBytes.
byte[] outByte = new byte[bufferSize];
// The bytes returned from GetBytes.
long retval;
// The starting position in the BLOB output.
long startIndex = 0;
// Open the connection and read data into the DataReader.
connection.Open();
SqlDataReader reader = command.ExecuteReader(CommandBehavior.SequentialAccess);
while (reader.Read())
{
// Create a file to hold the output.
stream = new FileStream(
"some-physical-file-name-to-dump-data.bmp", FileMode.OpenOrCreate, FileAccess.Write);
writer = new BinaryWriter(stream);
// Reset the starting byte for the new BLOB.
startIndex = 0;
// Read bytes into outByte[] and retain the number of bytes returned.
retval = reader.GetBytes(colIdx, startIndex, outByte, 0, bufferSize);
// Continue while there are bytes beyond the size of the buffer.
while (retval == bufferSize)
{
writer.Write(outByte);
writer.Flush();
// Reposition start index to end of last buffer and fill buffer.
startIndex += bufferSize;
retval = reader.GetBytes(colIdx, startIndex, outByte, 0, bufferSize);
}
// Write the remaining buffer.
writer.Write(outByte, 0, (int)retval);
writer.Flush();
// Close the output file.
writer.Close();
stream.Close();
}
// Close the reader and the connection.
reader.Close();
connection.Close();
Make sure you are using SqlDataReader with CommandBehavior.SequentialAccess, note this line in above code snippet.
SqlDataReader reader = command.ExecuteReader(CommandBehavior.SequentialAccess);
More information on CommandBehavior enum can be found here.
EDIT:
Let me clarify myself. I agreed with #MickyD, cause of the problem is not whether you are using CommandBehavior.SequentialAccess or not, but reading the large file at-once.
I emphasized on this because it is commonly missed by developers, they tend to read files in chunks but without setting CommandBehavior.SequentialAccess they will encounter other problems. Although it is already posted with original question, but highlighted in my answer to give point to any new comers.

#MatthewWatson yeah var stream = new MemoreStream(); What is not right with it? – Kliver Max 15 hours ago
Your problem is not whether or not you are using:
`command.ExecuteReader(CommandBehavior.SequentialAccess)`
...which you are as we can see; or that your stream copy buffer size is too big (it's actually tiny) but rather that you are using MemoryStream as you indicated in the comments above. More than likely you are loading in the 650MB file twice, once from SQL and another to be stored in the MemoryStream thus leading to your OutOfMemoryException.
Though the solution is to instead write to a file stream, the cause of the problem wasn't highlighted in the accepted answer. Unless you know the cause of a problem, you won't learn to avoid such issues in the future.

Related

Out of memory using BinaryWriter in C# on adding files to a zip file

I am trying to add files to a Zip file, preserving the directory. The code below is basically working as long as I do not have files of a few 100 Mb to zip. If I just zip a directory with 1 file of about 250 Mb (on a system with plenty of memory BTW) I get an OutOfMemory exception on the write.Write() line.
I already modified the code to read in chunks as it first failed when I read/wrote the whole file. I don't know why it still fails?
using (FileStream zipToOpen = new FileStream(cZipName, eFileMode))
ZipArchiveEntry readmeEntry = archive.CreateEntry(cFileToBackup
);
using (BinaryWriter writer = new BinaryWriter(readmeEntry.Open()))
{
FileStream fsData = null; // Load file into FileStream
fsData = new FileStream(cFileFull, FileMode.Open, FileAccess.Read);
{
byte[] buffer = new byte[1024];
int bytesRead = 0;
while ((bytesRead = fsData.Read(buffer, 0, buffer.Length)) > 0)
{
writer.Write(buffer,0,bytesRead); // here it fails
fsData.Flush(); // ->CHANGED THIS TO writer.Flush() SOLVED IT - nearly..
}
}
fsData.Close();
}
EDIT: Arkadiusz K was right that I used the flush on the reader, not the writer. After changing that, the program zips files of 1 Gb or more where it stopped at 100 Mb first. However, I get another exception when I try to zip e.g. a 6 Gb file - it stops with: System.IO.IOException was unhandled Stream was too long Source=mscorlib
StackTrace:
at System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count) (etc)
Does anybody have an idea why it still fails? I'd say th code should now properly read and write 1 Kb at a time?

First of all, I'd really like to format your code and make it as succinct as it should be:
var readmeEntry = archive.CreateEntry(cFileToBackup);
using (var fsData = new FileStream(cFileFull, FileMode.Open, FileAccess.Read))
using (var writer = new BinaryWriter(readmeEntry.Open()))
{
var buffer = new byte[1024];
int bytesRead;
while ((bytesRead = fsData.Read(buffer, 0, buffer.Length)) > 0)
{
writer.Write(buffer, 0, bytesRead); // here it fails
writer.Flush();
}
}
Now, to explain why it fails:
BinaryWriter is a stream writer. When it has to write the data to the stream, it usually writes it Length-prefixed and:
Length-prefixed means that this method first writes the length of the
string, in bytes, when encoded with the BinaryWriter instance's
current encoding to the stream. This value is written as an unsigned
integer. This method then writes that many bytes to the stream.
In order to write to the file, in your case, the data is written to MemoryStream first. Here, MemoryStream is the backing store stream. Refer to the diagram below:
(Image taken from: http://kcshadow.net/wpdeveloper/sites/default/files/streamd3.png)
Because, either your system's memory is about 6-8GB or because your application has been allocated that much memory only, the backing store stream is expanded to the max possible when you attempt to zip a 6GB file, and throws the exception then onwards.

Regarding your EDIT: ran into the same issue. After some digging, I discovered that zipFileEntry.Open() returns a WrappedStream, which is the underlying stream (the one that cannot be flushed until finished writing to it).
This WrappedStream is the problem: its max length is of ~2GB. I couldn't find a way to get around this, so I ended up using a different compression library altogether.

GZipStream - write not writing all compressed data even with flush?

I've got a pesky problem with gzipstream targeting .Net 3.5. This is my first time working with gzipstream, however I have modeled after a number of tutorials including here and I'm still stuck.
My app serializes a datatable to xml and inserts into a database, storing the compressed data into a varbinary(max) field as well as the original length of the uncompressed buffer. Then, when I need it, I retrieve this data and decompress it and recreates the datatable. The decompress is what seems to fail.
EDIT: Sadly after changing the GetBuffer to ToArray as suggested, my issue remains. Code Updated below
Compress code:
DataTable dt = new DataTable("MyUnit");
//do stuff with dt
//okay... now compress the table
using (MemoryStream xmlstream = new MemoryStream())
{
//instead of stream, use xmlwriter?
System.Xml.XmlWriterSettings settings = new System.Xml.XmlWriterSettings();
settings.Encoding = Encoding.GetEncoding(1252);
settings.Indent = false;
System.Xml.XmlWriter writer = System.Xml.XmlWriter.Create(xmlstream, settings);
try
{
dt.WriteXml(writer);
writer.Flush();
}
catch (ArgumentException)
{
//likely an encoding issue... okay, base64 encode it
var base64 = Convert.ToBase64String(xmlstream.ToArray());
xmlstream.Write(Encoding.GetEncoding(1252).GetBytes(base64), 0, Encoding.GetEncoding(1252).GetBytes(base64).Length);
}
using (MemoryStream zipstream = new MemoryStream())
{
GZipStream zip = new GZipStream(zipstream, CompressionMode.Compress);
log.DebugFormat("Compressing commands...");
zip.Write(xmlstream.GetBuffer(), 0, xmlstream.ToArray().Length);
zip.Flush();
float ratio = (float)zipstream.ToArray().Length / (float)xmlstream.ToArray().Length;
log.InfoFormat("Resulting compressed size is {0:P2} of original", ratio);
using (SqlCommand cmd = new SqlCommand())
{
cmd.CommandText = "INSERT INTO tinydup (lastid, command, compressedlength) VALUES (#lastid,#compressed,#length)";
cmd.Connection = db;
cmd.Parameters.Add("#lastid", SqlDbType.Int).Value = lastid;
cmd.Parameters.Add("#compressed", SqlDbType.VarBinary).Value = zipstream.ToArray();
cmd.Parameters.Add("#length", SqlDbType.Int).Value = xmlstream.ToArray().Length;
cmd.ExecuteNonQuery();
}
}
Decompress Code:
/* This is an encapsulation of what I get from the database
public class DupUnit{
public uint lastid;
public uint complength;
public byte[] compressed;
}*/
//I have already retrieved my list of work to do from the database in a List<Dupunit> dupunits
foreach (DupUnit unit in dupunits)
{
DataSet ds = new DataSet();
//DataTable dt = new DataTable();
//uncompress and extract to original datatable
try
{
using (MemoryStream zipstream = new MemoryStream(unit.compressed))
{
GZipStream zip = new GZipStream(zipstream, CompressionMode.Decompress);
byte[] xmlbits = new byte[unit.complength];
//WHY ARE YOU ALWAYS 0!!!!!!!!
int bytesdecompressed = zip.Read(xmlbits, 0, unit.compressed.Length);
MemoryStream xmlstream = new MemoryStream(xmlbits);
log.DebugFormat("Uncompressed XML against {0} is: {1}", m_source.DSN, Encoding.GetEncoding(1252).GetString(xmlstream.ToArray()));
try{
ds.ReadXml(xmlstream);
}catch(Exception)
{
//it may have been base64 encoded... decode first.
ds.ReadXml(Encoding.GetEncoding(1254).GetString(
Convert.FromBase64String(
Encoding.GetEncoding(1254).GetString(xmlstream.ToArray())))
);
}
xmlstream.Dispose();
}
}
catch (Exception e)
{
log.Error(e);
Thread.Sleep(1000);//sleep a sec!
continue;
}
Note the comment above... bytesdecompressed is always 0. Any ideas? Am I doing it wrong?
EDIT 2:
So this is weird. I added the following debug code to the decompression routine:
GZipStream zip = new GZipStream(zipstream, CompressionMode.Decompress);
byte[] xmlbits = new byte[unit.complength];
int offset = 0;
while (zip.CanRead && offset < xmlbits.Length)
{
while (zip.Read(xmlbits, offset, 1) == 0) ;
offset++;
}
When debugging, sometimes that loop would complete, but other times it would hang. When I'd stop the debugging, it would be at byte 1600 out of 1616. I'd continue, but it wouldn't move at all.
EDIT 3: The bug appears to be in the compress code. For whatever reason, it is not saving all of the data. When I try to decompress the data using a third party gzip mechanism, I only get part of the original data.
I'd start a bounty, but I really don't have much reputation to give as of now :-(

Finally found the answer. The compressed data wasn't complete because GZipStream.Flush() does absolutely nothing to ensure that all of the data is out of the buffer - you need to use GZipStream.Close() as pointed out here. Of course, if you get a bad compress, it all goes downhill - if you try to decompress it, you will always get 0 returned from the Read().

I'd say this line, at least, is the most wrong:
cmd.Parameters.Add("#compressed", SqlDbType.VarBinary).Value = zipstream.GetBuffer();
MemoryStream.GetBuffer:
Note that the buffer contains allocated bytes which might be unused. For example, if the string "test" is written into the MemoryStream object, the length of the buffer returned from GetBuffer is 256, not 4, with 252 bytes unused. To obtain only the data in the buffer, use the ToArray method.
It should be noted that in the zip format, it first works by locating data stored at the end of the file - so if you've stored more data than was required, the required entries at the "end" of the file don't exist.
As an aside, I'd also recommend a different name for your compressedlength column - I'd initially taken it (despite your narrative) as being intended to store, well, the length of the compressed data (and written part of my answer to address that). Maybe originalLength would be a better name?

c# How to read a single file with normal and xml text elements

I am receiving a stream of data from a webservice and trying to save the contents of the stream to file. The stream contains standard lines of text alongside large chunks of xml data (on a single line). The size of the file is about 800Mb.
Problem: Receiving an out of memory exception when I process the xml section of each line.
==start file
line 1
line 2
<?xml version=.....huge line etc</xml>
line 3
line4
<?xml version=.....huge line etc</xml>
==end file
Current code, as you can see when it reads in the huge xml line then it spikes the memory.
string readLine;
using (StreamReader reader = new StreamReader(downloadStream))
{
while ((readLine = reader.ReadLine()) != null)
{
streamWriter.WriteLien(readLine); //writes to file
}
}
I was trying to think of a solution where I used both a TextReader/StreamReader and XmlTextReader in combination to process each section. As I get to the xml section I could switch to the XmlTextReader and use the Read() method to read each node thus stopping the memory spike.
Any suggestions on how I could do this? Alternatively, I could create a custom XmlTextReader that was able to read in these lines? Any pointers for this?
Updated
A further problem to this is that I need to read this file back in and split out the two xml sections to separate xml files! I converted the solution to write the file using a binary writer and then started to read the file back in using a binary reader. I have text processing to detect the start of the xml section and specifically which xml section so I can map it to the correct file! However this causes problems reading in the binary file and doing detection...
using (BinaryReader reader = new BinaryReader(savedFileStream))
{
while ((streamLine = reader.ReadString()) != null)
{
if (streamLine.StartsWith("<?xml version=\"1.0\" ?><tag1"))
//xml file 1
else if (streamLine.StartsWith("<?xml version=\"1.0\" ?><tag2"))
//xml file 2

XML may contain all content as one single line, so you'd probably better use a binary reader/writer where you can decide about the read/write size.
An example below, here we read BUFFER_SIZE bytes for each iteration:
Stream s = new MemoryStream();
Stream outputStream = new MemoryStream();
int BUFFER_SIZE = 1024;
using (BinaryReader reader = new BinaryReader(s))
{
BinaryWriter writer = new BinaryWriter(outputStream);
byte[] buffer = new byte[BUFFER_SIZE];
int read = buffer.Length;
while(read != 0)
{
read = reader.Read(buffer, 0, BUFFER_SIZE);
writer.Write(buffer, 0, read);
}
writer.Flush();
writer.Close();
}
I don't know if this causes you problems with encodings etc, but I think you will have to read the file as binary.

If all you want to do is copy one stream to another without modifying the data, you don't need the Stream text or binary helpers (StreamReader, StreamWriter, BinaryReader, BinaryWriter, etc.), simply copy the stream.
internal static class StreamExtensions
{
public static void CopyTo(this Stream readStream, Stream writeStream)
{
byte[] buffer = new byte[4096];
int read;
while ((read = readStream.Read(buffer, 0, buffer.Length)) > 0)
writeStream.Write(buffer, 0, read);
}
}

I think there is a memory leakage
Are you getting out of memory exception after processing a few lines or on the first line itself?
And there is no streamWriter.Flush() inside the while loop.
Don't you think there should be one?

Raw Stream Has Data, Deflate Returns Zero Bytes

I'm reading data (an adCenter report, as it happens), which is supposed to be zipped. Reading the contents with an ordinary stream, I get a couple thousand bytes of gibberish, so this seems reasonable. So I feed the stream to DeflateStream.
First, it reports "Block length does not match with its complement." A brief search suggests that there is a two-byte prefix, and indeed if I call ReadByte() twice before opening DeflateStream, the exception goes away.
However, DeflateStream now returns nothing at all. I've spent most of the afternoon chasing leads on this, with no luck. Help me, StackOverflow, you're my only hope! Can anyone tell me what I'm missing?
Here's the code. Naturally I only enabled one of the two commented blocks at a time when testing.
_results = new List<string[]>();
using (Stream compressed = response.GetResponseStream())
{
// Skip the zlib prefix, which conflicts with the deflate specification
compressed.ReadByte(); compressed.ReadByte();
// Reports reading 3,000-odd bytes, followed by random characters
/*byte[] buffer = new byte[4096];
int bytesRead = compressed.Read(buffer, 0, 4096);
Console.WriteLine("Read {0} bytes.", bytesRead.ToString("#,##0"));
string content = Encoding.ASCII.GetString(buffer, 0, bytesRead);
Console.WriteLine(content);*/
using (DeflateStream decompressed = new DeflateStream(compressed, CompressionMode.Decompress))
{
// Reports reading 0 bytes, and no output
/*byte[] buffer = new byte[4096];
int bytesRead = decompressed.Read(buffer, 0, 4096);
Console.WriteLine("Read {0} bytes.", bytesRead.ToString("#,##0"));
string content = Encoding.ASCII.GetString(buffer, 0, bytesRead);
Console.WriteLine(content);*/
using (StreamReader reader = new StreamReader(decompressed))
while (reader.EndOfStream == false)
_results.Add(reader.ReadLine().Split('\t'));
}
}
As you can probably guess from the last line, the unzipped content should be TDT.
Just for fun, I tried decompressing with GZipStream, but it reports that the magic number is not correct. MS' docs just say "The downloaded report is compressed by using zip compression. You must unzip the report before you can use its contents."
Here's the code that finally worked. I had to save the content out to a file and read it back in. This does not seem reasonable, but for the small quantities of data I'm working with, it's acceptable, I'll take it!
WebRequest request = HttpWebRequest.Create(reportURL);
WebResponse response = request.GetResponse();
_results = new List<string[]>();
using (Stream compressed = response.GetResponseStream())
{
// Save the content to a temporary location
string zipFilePath = #"\\Server\Folder\adCenter\Temp.zip";
using (StreamWriter file = new StreamWriter(zipFilePath))
{
compressed.CopyTo(file.BaseStream);
file.Flush();
}
// Get the first file from the temporary zip
ZipFile zipFile = ZipFile.Read(zipFilePath);
if (zipFile.Entries.Count > 1) throw new ApplicationException("Found " + zipFile.Entries.Count.ToString("#,##0") + " entries in the report; expected 1.");
ZipEntry report = zipFile[0];
// Extract the data
using (MemoryStream decompressed = new MemoryStream())
{
report.Extract(decompressed);
decompressed.Position = 0; // Note that the stream does NOT start at the beginning
using (StreamReader reader = new StreamReader(decompressed))
while (reader.EndOfStream == false)
_results.Add(reader.ReadLine().Split('\t'));
}
}

You will find that DeflateStream is hugely limited in what data it will decompress. In fact if you are expecting entire files it will be of no use at all.
There are hundereds of (mostly small) variations of ZIP files and DeflateStream will get along only with two or three of them.
Best way is likely to use a dedicated library for reading Zip files/streams like DotNetZip or SharpZipLib (somewhat unmaintained).

You could write the stream to a file and try my tool Precomp on it. If you use it like this:
precomp -c- -v [name of input file]
any ZIP/gZip stream(s) inside the file will be detected and some verbose information will be reported (position and length of the stream). Additionally, if they can be decompressed and recompressed bit-to-bit identical, the output file will contain the decompressed stream(s).
Precomp detects ZIP/gZip (and some other) streams anywhere in the file, so you won't have to worry about header bytes or garbage at the beginning of the file.
If it doesn't detect a stream like this, try to add -slow, which detects deflate streams even if they don't have a ZIP/gZip header. If this fails, you can try -brute which even detects deflate streams that lack the two byte header, but this will be extremely slow and can cause false positives.
After that, you'll know if there is a (valid) deflate stream in the file and if so, the additional information should help you to decompress other reports correctly using zLib decompression routines or similar.

C# TCP file transfer - Images semi-transferred

I am developing a TCP file transfer client-server program. At the moment I am able to send text files and other file formats perfectly fine, such as .zip with all contents intact on the server end. However, when I transfer a .gif the end result is a gif with same size as the original but with only part of the image showing as if most of the bytes were lost or not written correctly on the server end.
The client sends a 1KB header packet with the name and size of the file to the server. The server then responds with OK if ready and then creates a fileBuffer as large as the file to be sent is.
Here is some code to demonstrate my problem:
// Serverside method snippet dealing with data being sent
while (true)
{
// Spin the data in
if (streams[0].DataAvailable)
{
streams[0].Read(fileBuffer, 0, fileBuffer.Length);
break;
}
}
// Finished receiving file, write from buffer to created file
FileStream fs = File.Open(LOCAL_FOLDER + fileName, FileMode.CreateNew, FileAccess.Write);
fs.Write(fileBuffer, 0, fileBuffer.Length);
fs.Close();
Print("File successfully received.");
// Clientside method snippet dealing with a file send
while(true)
{
con.Read(ackBuffer, 0, ackBuffer.Length);
// Wait for OK response to start sending
if (Encoding.ASCII.GetString(ackBuffer) == "OK")
{
// Convert file to bytes
FileStream fs = new FileStream(inPath, FileMode.Open, FileAccess.Read);
fileBuffer = new byte[fs.Length];
fs.Read(fileBuffer, 0, (int)fs.Length);
fs.Close();
con.Write(fileBuffer, 0, fileBuffer.Length);
con.Flush();
break;
}
}
I've tried a binary writer instead of just using the filestream with the same result.
Am I incorrect in believing successful file transfer to be as simple as conversion to bytes, transportation and then conversion back to filename/type?
All help/advice much appreciated.

Its not about your image .. It's about your code.
if your image bytes were lost or not written correctly that's mean your file transfer code is wrong and even the .zip file or any other file would be received .. It's gonna be correpted.
It's a huge mistake to set the byte buffer length to the file size. imagine that you're going to send a large a file about 1GB .. then it's gonna take 1GB of RAM .. for an Idle transfering you should loop over the file to send.
This's a way to send/receive files nicely with no size limitation.
Send File
using (FileStream fs = new FileStream(srcPath, FileMode.Open, FileAccess.Read))
{
long fileSize = fs.Length;
long sum = 0; //sum here is the total of sent bytes.
int count = 0;
data = new byte[1024]; //8Kb buffer .. you might use a smaller size also.
while (sum < fileSize)
{
count = fs.Read(data, 0, data.Length);
network.Write(data, 0, count);
sum += count;
}
network.Flush();
}
Receive File
long fileSize = // your file size that you are going to receive it.
using (FileStream fs = new FileStream(destPath, FileMode.Create, FileAccess.Write))
{
int count = 0;
long sum = 0; //sum here is the total of received bytes.
data = new byte[1024 * 8]; //8Kb buffer .. you might use a smaller size also.
while (sum < fileSize)
{
if (network.DataAvailable)
{
{
count = network.Read(data, 0, data.Length);
fs.Write(data, 0, count);
sum += count;
}
}
}
}
happy coding :)

When you write over TCP, the data can arrive in a number of packets. I think your early tests happened to fit into one packet, but this gif file is arriving in 2 or more. So when you call Read, you'll only get what's arrived so far - you'll need to check repeatedly until you've got as many bytes as the header told you to expect.
I found Beej's guide to network programming a big help when doing some work with TCP.

As others have pointed out, the data doesn't necessarily all arrive at once, and your code is overwriting the beginning of the buffer each time through the loop. The more robust way to write your reading loop is to read as many bytes as are available and increment a counter to keep track of how many bytes have been read so far so that you know where to put them in the buffer. Something like this works well:
int totalBytesRead = 0;
int bytesRead;
do
{
bytesRead = streams[0].Read(fileBuffer, totalBytesRead, fileBuffer.Length - totalBytesRead);
totalBytesRead += bytesRead;
} while (bytesRead != 0);
Stream.Read will return 0 when there's no data left to read.
Doing things this way will perform better than reading a byte at a time. It also gives you a way to ensure that you read the proper number of bytes. If totalBytesRead is not equal to the number of bytes you expected when the loop is finished, then something bad happened.

Thanks for your input Tvanfosson. I tinkered around with my code and managed to get it working. The synchronicity between my client and server was off. I took your advice though and replaced read with reading a byte one at a time.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to read large file from SQL Server? - c#

Related

Out of memory using BinaryWriter in C# on adding files to a zip file

GZipStream - write not writing all compressed data even with flush?

c# How to read a single file with normal and xml text elements

Raw Stream Has Data, Deflate Returns Zero Bytes

C# TCP file transfer - Images semi-transferred

Categories

Resources