Convert .sav(spss) file to .csv file using c#

Convert .sav(spss) file to .csv file using c# - c#

Can i Convert an spss(.sav) file to a .csv file by using C#. Here i want to browse 1 .sav file and i need to generate .csv file of the same.Can anyone provide any link or code for the conversion.
i have some 100 spss files so i need to create a console app which will take each file from the parent folder and generate the corresponding csv file for each sav file

There are several possibilities:
1) Use a library
There seems to be a library to read SPSS files.
You can install the NuGet package SpssLib and create a SpssReader object from a FileStream
using(var fileStream = File.OpenRead("spss-file.sav"))
{
var spssReader = new SpssReader(fileStream);
// read variable names from spssReader.Variables and store to CSV
// read records from spssReader.Records and store to CSV
}
2) Hand-code the solution
If you can't use the library for whatever reason, you may hand-code a solution.
2.1) Have a look at PSPP
If (and only if) you are planning (or at least fine with) releasing your code under the GPL, you can have a look at the PSPP source code. Anyway, if you can't GPL your code, Don't. Do. It. Do a clean-room implementation instead, since otherwise you'll always be on slippery grounds.
2.2) Have a look at the spec
There is a documentation of the SAV file format online. It may take some time, but you may eventually figure out how to convert this to CSV.
3) Use PSPP
If you have no problem shipping a GPLed software (or are able to download the files on demand somehow), you can make use of PSPPs console application. PSPP is a GNU replacement for SPSS, which aims at (but does not, yet) providing the funcitonality that SPSS provides. Obviously it comes with a handy little CLI tool pspp-convert (see the documentation). You can invoke it via the command line with
pspp-convert input.sav output.csv
With the help of the Process class you're able to start another process (i.e. program in this case). If pspp-convert is located in the current directory or in any directory that's in the PATH variable, converting a file to CSV is a easy as
public ConvertSpssFile(string inputFile)
{
var outputFile = Path.ChangeExtension(inputFile, "csv");
Process.Start("pspp-convert", $"{inputFile} {outputFile}");
}

Related

c# uwp read csv from web

I am trying to make an app that make use of open data.
The data I try to read out is in a CSV format (and is about 40mb big).
I have 2 problems I can't solve.
First I having difficulties to read the file from the web.
I already read on MSDN how to read files asynchrome but it's all about local files. I want to make a list of objects. Each line (except the first line) contains all props for 1 object
Secondly when I finally managed to read the file, is there a way to save it's data and read it somehow the next time? Because 40mb is pretty big to re-download each time you open the app and it takes a lot of time.
I was wondering if it is possible that when I read the the file on the web again, it will only read and at the new lines.
I am a newbie in UWP (c#) applications, so my apologies for the questions.
Thanks in advance.

There are two APIs you can use to download a file. One is HttpClient, described here on MSDN Documentation and in a UWP sample here. This class is usually recommended for smaller files and smaller data, but can easily handler larger files as well. Its disadvantage is, that when the user closes the app, the file will stop downloading.
The alternative is BackgroundDownloader, again here on MSDN and here in UWP samples. This class is usually recommended for downloading larger files and data, as it automatically perfroms the download in the background so the download will continue even when the app is closed.
To store your files, you can use the ApplicationData.Current.LocalFolder. This is a special folder provided to you by the system for storage of application files. You have read/write access to this folder and you can not only store your files here, but even create subfolder structure using UWP StorageFile and StorageFolder APIs. More about this is on MSDN.

Create a "directory" in memory?

I'm working in c#, and looking for a way to create a path to a directory that will map to an IO.Stream instead of to the actual file system.
I want to be able to "save" files to that path, manipulate the content or file names, and then save them from that path to a regular file in the file system.
I know I can use a temporary file, but I would rather use the memory for both security and performance.
This kind of thing exists, according to this answer, in Java, using the FileSystemProvider class. I'm looking for a way to do it in c#.
I've tried every search I could think of and came up only with the java answer and suggestions to use Temporary files.
Is it even possible using .net?
Basically, I'm looking for a way to enable saving files directly to memory as if they where saved into the file system.
so, for instance, if I had a 3rd party class that exposes a save method (save(string fullPath)), or something like the SmtpServer.Send(MyMsg) in this question, i could choose that path and save it into the memory stream instead of onto the drive. (the main thing here is that I want to provide a path that will lead directly to a memory stream).

.NET doesn't have an abstraction layer over the host OS's file system. So unless you can build your own for use in custom code, and you need to have 3rd party libraries covered, there are just two workable optilns:
Use streams and avoid any APIs working with file names.
Build a virtual file system plugged into your host OS's storage architecture; however, the effort needed versus benefits is highly questionable.

I went through a similar situation lately, and there is no out of the box solution in .NET for doing that although I used a workaround which was efficient and safe for me.
Using Ionic.Zip Nuget package you can create a whole directory with a complex structure as a stream in memory and although it will be created as a zip file, you can extract it as a stream or even send the zip file as a stream.
using (var zip = new Ionic.Zip.ZipFile())
{
zip.AddEntry($"file1.json", new MemoryStream(Encoding.UTF8.GetBytes(someJsonContent)));
for (int i = 0; i < 4; i++)
{
zip.AddEntry($"{myDir}/{i}.json", new MemoryStream(Encoding.UTF8.GetBytes(anotherJsonContent)));
}
}
And here is how to extract a zip file as a stream using Ionic.Zip

Storing additional metadata about a file

for a small project, I would like to be able to store additional information about a file and keep that information with the file even when it is moved.
The additional information will be stored in a XML-file. To keep the file and its description together, I thought about using ZIP-archives without any compression, but I would like these ZIP-archives to behave just like the original files (i.e. if the original file was a video file, a double-click on the archive should open the file in the media player). This requires me to write a small program that handles this 'new' file format.
However, I have not found a solution that would allow me to open the file without first extracting the file from the archive (even without compression), which does take some time and is not what I want.
My questions are: Is there a library (for C# or C/C++) that allows me to open a zip file and directly play/open a file inside it wihout extracting the archive? Or is there an easier way to implement what I need (maybe I am thinking in the wrong direction)?

Windows already allows you to store additional metadata about a shell item (including files) through the Windows Property System.
The Windows API Code Pack includes samples and documentation on how to work with many of the native OS capabilities, including the Property System.
The following excerpts come from the PropertyEdit sample.
To get a file's property by name:
var myObject= ShellObject.FromParsingName(fileName);
IShellProperty prop = myObject.Properties.GetProperty(propertyName);
To set a string property:
if (prop.ValueType == typeof(string))
{
(prop as ShellProperty<string>).Value = value;
}
If you don't want to use the Property System, you can use NTFS alternate data streams to store additional info about a file. There is no direct support for ADS in .NET but a simple search returns multiple wrappers, libraries and SO questions about them, eg NTFS - Alternate Data Streams

Alternatives to ZIP for combining many files into one on Windows using .NET

Im looking for methods to combine files including their name and relative path into one single file. A folder disguised as a file. I don't need any compression or encryption. Just the file data including some binary metadata attached to each file.
It would be great if this file was possible to open/inspect/unpack with a standard file browser in Windows such as with regular zip-files.
Yes I could use zip. But I'm researching alternatives and I would prefer a simple method I could implement myself in C#/.NET.
UPDATE
I've researched this some more and came across Microsoft's Structured Storage format. It looked promising at first but it seemes to be an obsolete format, replaced with the Open Package Format. And then I found out about the TAR-format. It seemes to be the most basic format. But I'm not sure yet if I can add any custom metadata to the entries with TAR.
UPDATE
I went with DotNetZip at the end anyway...

Why not use zip? You can use a third party library, like dotnetzip, to make the code easy to write. And, as you mentioned, Windows handles zip files well.

If you have specific reason to search an alternative to ZIP, take a look on virtual file systems, eg. CodeBase File System or our Solid File System. Solid File System lets you add alternate data streams (like in NTFS) or tags (small chunks of binary or text data) to each file or directory. And with OS edition of SolFS you can make the filesystem visible to Windows (including Explorer and third-party applications).
I must admit that while virtual file systems are easy to use (easier than ZIP), they are commercial products (I didn't see free virtual file system implementations yet).

There is a virus that my brother got in his computer and what that virus did was to rename almost all files in his computer. It changed the file extensions as well. so a file that might have been named picture.jpg was renamed to kjfks.doc for example.
so what I have done in order to solve this problem is:
remove all file extensions from files. (I use a recursive method to search for all files in a directory and as I go through the files I remove the extension)
now the files do not have an extension. the files now look like:
I think this file names are stored in a local database created by the virus and if I purchase the anti virus they will be renamed back to their original name.
since my brother created a backup I selected the files that had a creation date latter than when my brother performed the backup. so I have placed that files in a directory.
I am not interested in getting the right extension as long as I can see the content of the file. for example, I will scan each file and if it has text inside I know it will have a .txt extension. maybe it was a .html or .css extension I will not be able to know that I know.
I belive that all pdf files should have something in common. or doc files should also have something in common. How can I figure what the most common types (pdf, doc, docx, png, jpg, etc) files have in common)
Edit:
I know it will probably take less time to go over all this 200 files and test each one instead of creating this program. it is just that I am curios to see if it will be possible to get the file extension.

In unix, you can use file to determine the type of file. There is also a port for windows and you can obviously write a script (batch, powershell, etc.) or C# program to automate this.

First, congratulate your brother on doing a backup. Many people don't, and are absolutely wiped out by these problems.
You're going to have to do a lot of research, I'm afraid, but you're on the right track.
Open each file with a TextReader or a BinaryReader and examine the headers. Most of them are detectable.
For instance: Every PDF starts with "%PDF-" and then its version number. Just look at those first 5 characters. If it's "%PDF-", then put a PDF on the filename and move on.
Similarly: "ÿØÿà..JFIF" for JPEG's, "[InternetShortcut]" for URL shortcuts, "L...........À......Fƒ" for regular shortcuts (the "." is a zero/null, BTW)
ZIPs / Compressed directories start with {0x50}{0x4B]{0x03}{0x04}{0x14}, and you should be aware that Office 2007/2010 documents are really ZIPs with XML files inside of them.
You'll have to do some digging as you find each type, but you should be able to write something to establish most of the file types.
You'll have to write some recursion to work through directories, but you can eliminate any file with no extension.
BTW - A great tool to help pwith this is HxD: http://www.mh-nexus.de/ It's what I used to pull this answer together!
Good luck!

"most common types" each have it's own format and most of them have some magic bytes at the fixed position near beginning of the file. You can detect most of formats quite easily. Even HTML, XML, .CSS and similar text files can be detected by analyzing their beginning. But it will take some time to write an application that will guess the format. For some types (such as ODF format or JAR format, which are built on top of regular ZIPs) you will be also able to detect this format.
But ... Can it be that there exists such application on the market? I guess you can find something if you search, cause the task is not as tricky as it initially seems to be.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.