I am exporting a file via a http get response, using ASP.NET Web API.
For that, I am returning a FileContentResult object, as in:
return File(Encoding.UTF8.GetBytes(fileContents.ToString()), "text/plain; charset=UTF-8");
After several minutes stucked with encoding issues, I use google's Advanced REST Client to perform the get to the web api controller's action, and the file is being download just ok.
Well, not exactly. I originally wanted it to be sent/downloaded as a .csv file.
If I set the http request content-type to "text/csv" and the File() call sets the response's content type to "text/csv" just as well, Advanced REST Client will show the contents properly, but excel will open it as gibberish data.
If I simply change the content-type to "text/plain", save it as a .txt file (have to rename it after saving, don't know why it is being saved as _.text-plain, while as a csv it is being saved with .csv extension), and finally perform an import in Excel like described here Excel Import Text Wizard, then then excel opens the file correctly.
Why is the .csv being opened as gibberish, while as a .txt it is not ? For opening a .csv, there is no import wizard like with a .txt file (not that I am aware of).
Providing a bit of the source below:
StringBuilder fileContents = new StringBuilder();
//csv header
fileContents.AppendLine(String.Join(CultureInfo.CurrentCulture.TextInfo.ListSeparator, fileData.Select(fileRecord => fileRecord.Name)));
//csv records
foreach (ExportFileField fileField in fileData)
fileContents.AppendLine(fileField.Value);
return File(Encoding.UTF8.GetBytes(fileContents.ToString()), "text/plain; charset=UTF-8");
As requested, the binary contents of both files.
The text-plain (.txt) version (the one that will open in excel, using import):
and the .csv one (the one that excel will open with junk data):
The (files are the same, the cropping of the screen shots was not the same...)
I was able to reproduce the issue by saving a file containing Greek characters with BOM. Double clicking attempts to import the file using the system's locale (Greek). When manually importing, Excel detects the codepage and offers to use the 65001 (UTF8) codepage.
This behavior is strange but not a bug. Text files contain no indication that would help detect their codepage, nor is it possible to guess. An ASCII file containing only A-Z characters saved as 1252 is identical to one saved using 1253. That's why Windows uses the system codepage, which is the local used for all non-Unicode programs and files.
When you double click on a text file, Excel can't ask you for the correct encoding - this could get tedious very quickly. Instead, it opens the file using your regional settings and the system codepage. ASCII files created on your machine are saved using your system's codepage so this behaviour is logical. Files given to you by non-programmers will probably be saved using your country's codepage as well. Programmers typically switch everything to US English and that's how problems start. Your REST client may have saved the text as ASCII using the Latin encoding used by most programmers.
When you import the text file to an empty sheet though, Excel can ask you what to do. It tries to detect the codepage by checking for a BOM or a codepage that may be matching the file's contents and presents the guess in the import dialog box, together with a preview. The decimal and column separators are still those provided by your regional settings (can't guess those). UTF8 is generally easy to guess - the file starts with a BOM or contains NUL entries.
ASCII codepages are harder though. Saving my Greek file as ASCII results in a Japanese guess. That's English humour for you I guess.
To my surprise, trying to perform the request via a browser instead of using google's Advanced REST Client, clicking on the the file that is downloaded just works! Excel opens it correctly. So the problem must be with ARC.
In any case, since the process is not going to be done using an http client other than a browser... my problem is gone. Again, in ARC's output screen the file is displayed correctly. I do not know why upon clicking it to be opened in Excel it "gets corrupted".
Strange.
The binary contents of the file show a correctly utf-8 encoded CSV file with hebrew characters. If,a s you state in the comments, Excel does not allow you to change it's guessed file encoding when opening a CSV file, that is rather a misbehavior in Excel itself (call it a bug if you want).
Your options are: use LibreOffice (http://www.libreoffice.org/) which spreadsheet component does allow you to customize the settings for opening a CSV file.
Another one is to write a small program to explicitely convert your file to the encoding excel is expecting - if you have a Python3 interpreter installed, you could for example type:
python -c "open('correct.csv', 'wt', encoding='cp1255').write(open('utf8.csv', encoding='utf8').read())"
However, if your default Windows encoding is not cp1255 for handling Hebrew, as I suppose above, that won't help excel, but to give you different gibberish :-) In that case, you should resort to use programs that can correctly deal with different encodings.
(NB. there is a Python call to return the default system encoding in Windows, but I forgot which it is, and it is not easily googleable)
I need a help in developing a Windows Appl using C#.NET VS2010. The functionality is very simple, the user will input a text file and my program is supposed to extract the relevant data from the text file and output it to either csv or text or whatever.
My biggest problem whenever I deal with text files is the format. Even though if you open the input text file in a Notepad or Wordpad it looks perfect, the layout etc. But once we start programming it I realize that what I am seeing is not the way the data is stored inside the file. I read many articles on Unicode/UTF etc.. etc.. but I dont have a definite solution to know exactly what my file format is. So the end result is that I end up getting many exceptions.
In Unix Shell Scripting it used to be simple. There is some good Unix command like less which is similar to more but it also display any formatting characters inside the file. Also there are some useful commands like unix2dos and dos2unix.
Nevertheless, is there some program/code or professional method which can find the exact file formatting of my input file and then reformat it to "plain text" so that the data extraction becomes easy and bug-free.
Thanks
I have already seen solution for hiding text files or messages within Image or audio files..
but i want solution for hiding text file within another text files (.txt, .doc, .pdf).
can somebody help for this??
Steganography is based on slightly changing data to "hide" some other set of data within these changes. That's why an image with steganography is slightly different than the original. You can't notice if if you don't know it's there, but the fact is you saved the data as changes within color information of pixels.
.txt file is nothing else than a big hunk of characters. If you tried to somehow change the data to hide something in it, it would result in unreadable text. If you change the color of a pixel from 215 Red to 217 Red, you won't really notice. But changing A to F or Ł is quite noticable.
So no, I don't believe it can be done. At least not with .txt files.
While I agree with #stonehead that at the end of the day if you put something in the file someone can find it, but there are a few tricks out their that may prove to be viable options.
Since most users are not living in their command prompt the most straight forward approach is to misrepresent the file to the GUI. This is a pretty handy trick for this.
http://www.howtogeek.com/howto/windows-vista/stupid-geek-tricks-hide-data-in-a-secret-text-file-compartment/
If you are storing data in a pdf you should have very little problems. I would use PdfClown. Not to get too into it but you will want to read up about the structure of a pdf. With clown pdf you could store an asset inside with no connection to the presentation layer. Given the complexity of pdf files i will almost bet no one will be looking in, i would base64encode the chunk to have it blend in with images and other data it would be difficult for someone to find it by just opening up the file.
Be For Warned ClownPDF C# library is not for the faint of heart and it will help to have some java experience because a lot of their docs are for java.
Hope these options help.
I am currently working on a project to hide data inside an mp3 file...What I did was, I replaced the last BYTE of every mp3 frame with the bytes from message file(the file to be hidden)... It works fine... I could hide the file in it and also successfully extract it... But some noise are present in the resulting mp3 file, due to addition of external data which is, definitely, not desired... Please help me in where to store the data in mp3 so as to reduce the noise...
PS: There is already a tool to use mp3 for hiding data -Mp3Stego. But it takes uncompressed wav file as input. But I need to have mp3 as input.
Such kind of tools do not replace whole byte. They are replace only last BIT. Try to replace only a BIT instead of BYTE. This will reduce the noise, but also reduce size of an information which you can put in file.
I have a large raw data file (up to 1GB) which contains raw samples from a USB data logger.
I need to store extra information relating to the file (sample rate, description, trigger point, last seek position etc) and was looking into adding this as a some sort of header.
The header file should ideally be human readable and flexible so I've so far ruled out some sort of binary serialization into a header.
I also want to avoid two separate files as they could end up separated when copied or backed up. I remembered somebody telling me that newer *.*x Microsoft Office documents are actually a number of files in a zip. Is there a simple way to achieve this? Could I still keep the quick seek times to the raw file?
Update
I started using the binary serializer and found it to be a pain. I ended up using the xml serializer as I'm more comfortable using it.
I reserve some space at the start of the files for the xml. Simple
When you say you want to make the header human readable, this suggests opening the file in a text editor. Do you really want to do this considering the file size and (I'm assuming), the remainder of the file being non-human readable binary data? If it is, just write the text header data to the start of the binary file - it will be visible when the file is opened but, of course, the remainder of the file will look like garbage.
You could create an uncompressed ZIP archive, which may allow you to seek directly to the binary data. See this for information on creating a ZIP archive: http://weblogs.asp.net/jgalloway/archive/2007/10/25/creating-zip-archives-in-net-without-an-external-library-like-sharpziplib.aspx