Best way to read a short array from disk in C#? - c#

I have to write 4GB short[] arrays to and from disk, so I have found a function to write the arrays, and I am struggling to write the code to read the array from the disk. I normally code in other languages so please forgive me if my attempt is a bit pathetic so far:
using UnityEngine;
using System.Collections;
using System.IO;
public class RWShort : MonoBehaviour {
public static void WriteShortArray(short[] values, string path)
{
using (FileStream fs = new FileStream(path, FileMode.OpenOrCreate, FileAccess.Write))
{
using (BinaryWriter bw = new BinaryWriter(fs))
{
foreach (short value in values)
{
bw.Write(value);
}
}
}
} //Above is fine, here is where I am confused:
public static short[] ReadShortArray(string path)
{
byte[] thisByteArray= File.ReadAllBytes(fileName);
short[] thisShortArray= new short[thisByteArray.length/2];
for (int i = 0; i < 10; i+=2)
{
thisShortArray[i]= ? convert from byte array;
}
return thisShortArray;
}
}

Shorts are two bytes, so you have to read in two bytes each time. I'd also recommend using a yield return like this so that you aren't trying to pull everything into memory in one go. Though if you need all of the shorts together that won't help you.. depends on what you're doing with it I guess.
void Main()
{
short[] values = new short[] {
1, 999, 200, short.MinValue, short.MaxValue
};
WriteShortArray(values, #"C:\temp\shorts.txt");
foreach (var shortInfile in ReadShortArray(#"C:\temp\shorts.txt"))
{
Console.WriteLine(shortInfile);
}
}
public static void WriteShortArray(short[] values, string path)
{
using (FileStream fs = new FileStream(path, FileMode.OpenOrCreate, FileAccess.Write))
{
using (BinaryWriter bw = new BinaryWriter(fs))
{
foreach (short value in values)
{
bw.Write(value);
}
}
}
}
public static IEnumerable<short> ReadShortArray(string path)
{
using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read))
using (BinaryReader br = new BinaryReader(fs))
{
byte[] buffer = new byte[2];
while (br.Read(buffer, 0, 2) > 0)
yield return (short)(buffer[0]|(buffer[1]<<8));
}
}
You could also define it this way, taking advantage of the BinaryReader:
public static IEnumerable<short> ReadShortArray(string path)
{
using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read))
using (BinaryReader br = new BinaryReader(fs))
{
while (br.BaseStream.Position < br.BaseStream.Length)
yield return br.ReadInt16();
}
}

Memory-mapping the file is your friend, there's a MemoryMappedViewAccessor.ReadInt16 function that will allow you to directly read the data, with type short, out of the OS disk cache. Also a Write() overload that accepts an Int16. Also ReadArray and WriteArray functions if you are calling functions that need a traditional .NET array.
Overview of using Memory-mapped files in .NET on MSDN
If you want to do it with ordinary file I/O, use a block size of 1 or 2 megabytes and the Buffer.BlockCopy function to move data en masse between byte[] and short[], and use the FileStream functions that accept a byte[]. Forget about BinaryWriter or BinaryReader, forget about doing 2 bytes at a time.
It's also possible to do the I/O directly into a .NET array with the help of p/invoke, see my answer using ReadFile and passing the FileStream object's SafeFileHandle property here But even though this has no extra copies, it still shouldn't keep up with the memory-mapped ReadArray and WriteArray calls.

Related

Use FileShare.ReadWrite with HttpPostedFile in ASP.NET MVC

I am using the code below to save a posted file to a server, but that file is being read continually and need to use FileShare.ReadWrite so I don't get a locked error.
httpRequest.Files[0].SaveAs(filePath);
Below is my reading method, how can I accomplish this with the HttpPosted file is the right way with the best performance.
using (var fileStream = new FileStream(
fileLocation,
FileMode.Open,
FileAccess.Read,
FileShare.ReadWrite))
{
using (var streamReader = new StreamReader(fileStream))
{
xDocument = XDocument.Parse(streamReader.ReadToEnd());
}
}
Is this my best option?
using (var memoryStream = new MemoryStream())
{
httpRequest.Files[0].InputStream.CopyTo(memoryStream);
var bytes = memoryStream.ToArray();
using (var fs = File.Open(filePath, FileMode.OpenOrCreate, FileAccess.Write, FileShare.ReadWrite))
{
fs.Write(bytes, 0, bytes.Length);
}
}
Proplem:
You want a "Write:Once, Read:Many" Lock
Assumptions :
File is small (average write opration is 5000 ms)
No other write or read oprations (Only one programe with 2 function)
You read the file a lot more than you write to it
Solution
using System;
using System.IO;
using System.Threading;
using System.Web.Mvc;
namespace stackoverflow_56307594.Controllers
{
public class HomeController : Controller
{
public ActionResult A()
{
readFile();
return View();
}
public ActionResult B()
{
writeFile();
return View();
}
private static object writeLock = new Object();
private void readFile()
{
while (!Monitor.TryEnter(writeLock, 5000)) ; //wait 5000 ms for the writeLock (serializing access)
using (var stream = new FileStream("filePath", FileMode.Open, FileAccess.Read, FileShare.Read))
using (var reader = new StreamReader(stream))
{
// active read
// xDocument = XDocument.Parse(streamReader.ReadToEnd());
}
}
private void writeFile()
{
lock (writeLock)
{
FileStream stream = null;
while (stream == null) //wait for the active read
{
try
{
stream = new FileStream("filePath", FileMode.Open, FileAccess.ReadWrite, FileShare.None);
}
catch (IOException)
{
// will fail if active read becase FileShare.None while (stream == null) will wait
}
}
Request.Files[0].InputStream.CopyTo(stream);
}// unlock
}
}
}
Note :
I did not test load or simply test the solution on a webserver
I only tested it on paper 😁
Refs:
locking - How long will a C# lock wait, and what if the code crashes during the lock? - Stack Overflow
c# - Deleting files in use - Stack Overflow
multithreading - Is there a way to detect if an object is locked? - Stack Overflow
Implementing Singleton in C# | Microsoft Docs
c# - Using the same lock for multiple methods - Stack Overflow
c# - Write-Once, Read-Many Lock - Stack Overflow
c# lock write once read many - Google Search
FileShare Enum (System.IO) | Microsoft Docs

How to Compress Large Files C#

I am using this method to compress files and it works great until I get to a file that is 2.4 GB then it gives me an overflow error:
void CompressThis (string inFile, string compressedFileName)
{
FileStream sourceFile = File.OpenRead(inFile);
FileStream destinationFile = File.Create(compressedFileName);
byte[] buffer = new byte[sourceFile.Length];
sourceFile.Read(buffer, 0, buffer.Length);
using (GZipStream output = new GZipStream(destinationFile,
CompressionMode.Compress))
{
output.Write(buffer, 0, buffer.Length);
}
// Close the files.
sourceFile.Close();
destinationFile.Close();
}
What can I do to compress huge files?
You should not to write the whole file to into the memory. Use Stream.CopyTo instead. This method reads the bytes from the current stream and writes them to another stream using a specified buffer size (81920 bytes by default).
Also you don't need to close Stream objects if use using keyword.
void CompressThis (string inFile, string compressedFileName)
{
using (FileStream sourceFile = File.OpenRead(inFile))
using (FileStream destinationFile = File.Create(compressedFileName))
using (GZipStream output = new GZipStream(destinationFile, CompressionMode.Compress))
{
sourceFile.CopyTo(output);
}
}
You can find a more complete example on Microsoft Docs (formerly MSDN).
You're trying to allocate all of this into memory. That just isn't necessary, you can feed the input stream directly into the output stream.
Alternative solution for zip format without allocating memory -
using (var sourceFileStream = new FileStream(this.GetFilePath(sourceFileName), FileMode.Open))
{
using (var destinationStream =
new FileStream(this.GetFilePath(zipFileName), FileMode.Create, FileAccess.ReadWrite))
{
using (var archive = new ZipArchive(destinationStream, ZipArchiveMode.Create, true))
{
var file = archive.CreateEntry(sourceFileName, CompressionLevel.Optimal);
using (var entryStream = file.Open())
{
var fileStream = sourceFileStream;
await fileStream.CopyTo(entryStream);
}
}
}
}
The solution will write directly from input stream to output stream

How can I used ReadAllLines with gzipped file

Is there a way to use the one-liner ReadAllLines on a gzipped file?
var pnDates = File.ReadAllLines("C:\myfile.gz");
Can I put GZipStream wrapper around the file some how?
No, File.ReadAllLines() treats the file specified as text file. A zipfile isn't that. It's trivial to do it yourself:
public IEnumerable<string> ReadAllZippedLines(string filename)
{
using (var fileStream = File.OpenRead(filename))
{
using (var gzipStream = new GZipStream(fileStream, CompressionMode.Decompress))
{
using (var reader = new StreamReader(gzipStream))
{
yield return reader.ReadLine();
}
}
}
}
There is no such thing built-in. You'll have to write yourself a small utility function.
You'd have to inflate the file first as the algorithm for gzip deals with byte data not text and incorporates a CRC. This should work for you:
EDIT - I cant comment for some reason, so this if for the bytestocompress question
byte[] decompressedBytes = new byte[4096];
using (FileStream fileToDecompress = File.Open("C:\myfile.gz", FileMode.Open))
{
using (GZipStream decompressionStream = new GZipStream(fileToDecompress, CompressionMode.Decompress))
{
decompressionStream.Read(decompressedBytes, 0, bytesToCompress.Length);
}
}
var pnDates = System.Text.Encoding.UTF8.GetString(decompressedBytes);

C# Decompress .GZip to file

I have this Code
using System.IO;
using System.IO.Compression;
...
UnGzip2File("input.gz","output.xls");
Which run this procedure, it runs without error but after it, the input.gz is empty and created output.xls is also empty. At the start input.gz had 12MB. What am i doing wrong ? Or have you better/functional solution ?
public static void UnGzip2File(string inputPath, string outputPath)
{
FileStream inputFileStream = new FileStream(inputPath, FileMode.Create);
FileStream outputFileStream = new FileStream(outputPath, FileMode.Create);
using (GZipStream gzipStream = new GZipStream(inputFileStream, CompressionMode.Decompress))
{
byte[] bytes = new byte[4096];
int n;
// To be sure the whole file is correctly read,
// you should call FileStream.Read method in a loop,
// even if in the most cases the whole file is read in a single call of FileStream.Read method.
while ((n = gzipStream.Read(bytes, 0, bytes.Length)) != 0)
{
outputFileStream.Write(bytes, 0, n);
}
}
outputFileStream.Dispose();
inputFileStream.Dispose();
}
Opening the FileStream with the FileMode.Create will overwrite the existing file as documented here. This will cause the file to be empty when you try to decompress it, which in turn leads to an empty output-file.
Below is a working code sample, note that it is async, this can be changed by leaving out async/await and changing the call to the regular CopyTo-method and changing the return type to void.
public static async Task DecompressGZip(string inputPath, string outputPath)
{
using (var input = File.OpenRead(inputPath))
using (var output = File.OpenWrite(outputPath))
using (var gz = new GZipStream(input, CompressionMode.Decompress))
{
await gz.CopyToAsync(output);
}
}

Trying to improve multi-page TIFF file splitting

I am trying to improve the speed at which I am able to split a multi-page TIFF file into it's individual pages, stored as a list of byte arrays. I have this TiffSplitter class that I'm working on, to try and improve the speed of the Paginate method.
I have heard of LibTiff.net, and wonder if it would be any faster than this process? Currently, it takes about 1333 ms to call the Paginate method on a 7-page multipage TIFF file.
Does anyone know what would be the most efficient way to retrieve the individual pages of a multipage TIFF as byte arrays? Or possibly have any suggestions as to how I can improve the speed of the process I'm currently using?
using System;
using System.Collections.Generic;
using System.Linq;
using System.Drawing;
using System.Drawing.Imaging;
using System.IO;
namespace TiffSplitter
{
public class TiffPaginator
{
private List<byte[]> paginatedData;
public List<byte[]> Pages
{
get
{
return paginatedData;
}
}
public TiffPaginator()
{
paginatedData = new List<byte[]>();
}
public void Paginate(string Filename)
{
using (Image img = Image.FromFile(Filename))
{
paginatedData.Clear();
int frameCount = img.GetFrameCount(FrameDimension.Page);
for (int i = 0; i < frameCount; i++)
{
img.SelectActiveFrame(new FrameDimension(img.FrameDimensionsList[0]), i);
using (MemoryStream memstr = new MemoryStream())
{
img.Save(memstr, ImageFormat.Tiff);
paginatedData.Add(memstr.ToArray());
}
}
}
}
}
}
I tried using the LibTiff.net, and for me, it was quite slow. The time to split a singe 2-page tif was measured in seconds.
In the end, I decided to reference PresentationCore and go with this:
(It splits the images to multiple files, but it should be simple to switch the output to byte arrays)
Stream imageStreamSource = new FileStream("filename", FileMode.Open, FileAccess.Read, FileShare.Read);
TiffBitmapDecoder decoder = new TiffBitmapDecoder(imageStreamSource, BitmapCreateOptions.PreservePixelFormat, BitmapCacheOption.Default);
int pagecount = decoder.Frames.Count;
if (pagecount > 1)
{
string fNameBase = Path.GetFileNameWithoutExtension("filename");
string filePath = Path.GetDirectoryName("filename");
for (int i = 0; i < pagecount; i++)
{
string outputName = string.Format(#"{0}\SplitImages\{1}-{2}.tif", filePath, fNameBase, i.ToString());
FileStream stream = new FileStream(outputName, FileMode.Create, FileAccess.Write);
TiffBitmapEncoder encoder = new TiffBitmapEncoder();
encoder.Frames.Add(decoder.Frames[i]);
encoder.Save(stream);
stream.Dispose();
}
imageStreamSource.Dispose();
}

Categories

Resources