I'm exporting data from SQL Server using BCP. The output file is written to disk, but needs to be re-encoded. Using C#, I re-encode the file to UTF8 and re-save it to disk. As is, it has to save the output, re-encode, and then re-save it. Seems inefficient.
I'm wondering if there's a way to eliminate saving the data twice. For example, direct the output to memory or intercept the BCP output file in memory before it's written to disk?
Any advice is greatly appreciated.
Load the data in your C# application and write it to disk using any format you like. BCP does not have inherent efficiency advantages compared to a custom app. You might use more CPU in ADO.NET + C# than the C++ based BCP uses. This is to be tested.
You should look into Windows named pipes. Similar to the Unix concept but not as easily created.
This blog post provides a tool that sets up a pipe and automatically compresses the "file" data that is written to it. We use this to compress bcp output without writing an uncompressed file to disk.
The code is provided so you "should" be able to modify it to convert the encoding instead of compressing the data.
Related
If I have relative small file (<1 Mb) what is better option for program, first compress file to the disk , and than send this .zip file , or send only. I am never sure for net and disk speed, file size is changing to ,but that is not big changes . I think that compress, of course, is better for larger file, but when file is few Kb are I getting something with compressing of it , or I lose because time that i need to write and read from hdd?
Thanx
The best option is to read the file from disk, compress it, and then send it without re-writing the compressed file to disk. The receiver can then decompress it in memory. This is essentially how a web server serves compressed web pages to compatible browsers.
C# is not a language I'm familiar with, but you are probably looking for something like System.IO.Compression.GZipStream.
I want to extract labels, abstracts, categories and relevant dates to each article from DBPedia dump file.
I'm using dotnetrdf and I want to save the extracted data to MS SQL database (I don't want to use triple stores like Virtuoso).
Due to the size of dump file, I can't load the dump file into memory.
Is there any solution to extract statements? The only way I can imagine is to split the dump file into smaller chunk files, is it the only solution?
Actually everything in dotNetRDF is designed to support streaming parsing, the most common use case happens to be loading stuff into our in-memory structures but even that uses the streaming parser subsystem under the hood.
See the Advanced Parsing section of the Reading RDF documentation which introduces the Handlers API, this API gives users complete control over what happens to the data as it is produced by the parser. So you can write a custom handler which receives the data as it is produced by the stream and puts it into your database.
Say you have a method or property on a third party sealed component that expects a file name. You do not have the source for the third party component. You want that component to do what it's supposed to do (read only) on a file you have, but your file is encrypted on disk and you only want the decrypted version in memory so it cannot be easily copied in its plain form.
Is it possible to create a wrapper or some other approach to trick the component to think it's reading from a file when it's actually reading from a MemoryStream? Or is it totally impossible? Can it be done outside .NET in native Windows code?
Thanks
You can't do that the way that you are proposing, no. My recommendation would be to use the Encrypting Filesystem functionality built into windows. That way the file is stored in encrypted form on disk, but is available via the normal IO methods to the application (provided that the account that is running the application has access to the file).
Can it read from "CON" as input (like many text utilities grep/findstr, more,...)? In this case you can try to redirect input/output stream and feed results thata way.
Is it possible to create a wrapper or some other approach to trick the
component to think it's reading from a file when it's actually reading
from a MemoryStream?
No, sorry. You will have to decrypt the file into a temporary file and then pass this temporary file to the plugin. Once it finishes its work delete the temporary file.
This short answer is if a component is expecting a filename e.g. a string you can not parse it a memory stream.
However if the file is encrypted with Encrypting File System (EFS) or something native to Windows it may be able to decrypt the file without knowing the file is encrypted.
These might help:
http://en.wikipedia.org/wiki/Encrypting_File_System
http://en.wikipedia.org/wiki/BitLocker_Drive_Encryption
You could have a look at Dokan. I haven't tried it, but it's a way of creating a virtual file system in .Net.
You can create an in-memory disk drive (either in code or by using third-party application) and put a file there. Another approach is to implement virtual file system and handle all file requests. Both approaches are possible for example using our Virtual Storage products.
Also, I don't know about .NET in particular, but in Windows you can hook API functions and substitute certain operations (including file operations). There even exist components for this, but, again, I don't know if they offer their functionality for .NET.
I have a console application using C# where I want to read the blob stored in SQL Server and write it to windows file system at a specified path.
How do I convert a blob to a image of pdf using C# and write it to a file system?
Read the blob from the database into a byte[] and write this buffer to a file.
Take a look here: http://www.akadia.com/services/dotnet_read_write_blob.html
Using a DataAdapter / DataSet would most likely prove even easier - if you can afford to have the entire BLOB content loaded into memory for the number of rows you're processing.
What kind of blob are we talking about here? Assuming you are using .NET, Check out SqlDataReader or Entity Framework. There are other methods as well but those 2 are pretty popular. Converting to PDF you will probably need a 3rd party tool, check out PDF Sharp. Finally, saving to filesystem is pretty straightforward, look at the .NET System.IO.File class
I have a large raw data file (up to 1GB) which contains raw samples from a USB data logger.
I need to store extra information relating to the file (sample rate, description, trigger point, last seek position etc) and was looking into adding this as a some sort of header.
The header file should ideally be human readable and flexible so I've so far ruled out some sort of binary serialization into a header.
I also want to avoid two separate files as they could end up separated when copied or backed up. I remembered somebody telling me that newer *.*x Microsoft Office documents are actually a number of files in a zip. Is there a simple way to achieve this? Could I still keep the quick seek times to the raw file?
Update
I started using the binary serializer and found it to be a pain. I ended up using the xml serializer as I'm more comfortable using it.
I reserve some space at the start of the files for the xml. Simple
When you say you want to make the header human readable, this suggests opening the file in a text editor. Do you really want to do this considering the file size and (I'm assuming), the remainder of the file being non-human readable binary data? If it is, just write the text header data to the start of the binary file - it will be visible when the file is opened but, of course, the remainder of the file will look like garbage.
You could create an uncompressed ZIP archive, which may allow you to seek directly to the binary data. See this for information on creating a ZIP archive: http://weblogs.asp.net/jgalloway/archive/2007/10/25/creating-zip-archives-in-net-without-an-external-library-like-sharpziplib.aspx