I need to transform/convert/process TransXchange data dump to reduce the size of data as some of the xml files can be up to 400 MB. I have the following options:
Sqlite database
CSV files
Binary Serialization
?
What is the best method of reducing the file sizes so that they would be feasible for use in a Windows Phone 7 application?
EDIT: I am going to create journey planning application that will allow users to specify source and destination. The application will then present the available services. In down under we have spotty mobile broadband coverage therefore I am aiming to have offline application.
This analysis is superb for showing you the timing of serialisation: http://www.eugenedotnet.com/2010/12/windows-phone-7-serialization-comparison/
For size... It's quite easy to guess that binary is smaller than sqlite (or Sterling) which in turn is smaller than CSV
However, if you are looking at processing 400MB of data on the phone... then I'd say you are doing the wrong thing - farm the processing out to a server (to the cloud?) and just view the summary results on the phone - think "thin client".
(Off to wash my mouth out now after all those jargon words!)
The main question is what are you going to do with that data.
If you just need to store the data and files are fine then binary serialization + compression (zlib, lzo...) will work best.
CSV won't do you any good.. will probably occupy more than the XML.
Database (for example, Sqlite) is the most expensive it terms of storage but you can manage and search the data more easily.
Related
This 100 GB of data is from an electronics device and will be in the form of a file.
Currently, our application produces 4 - 6 GB of data, which we are storing in Google Drive.
What we are thinking is to set up servers and manually download the file from the tool.
I think you might investigate a number of possibilities:
1 - Is it possible to compress the data, before trying to upload it, especially if the file format allow for high compression rates.
2 - Is it possible to apply to cleaning to the data file for redundant information, before compression and uploading. If the file contain significant repeated data fields, it might be possible to only keep data changes. And example could be a device reporting if a motor is running or not with 10 ms intervals. If the motor only turns on/off once every hour, then vast amount of information could be removed before uploading and without loosing information.
3 - Would it be possible to constantly stream the data, or maybe in smaller continuous chunks of say 10 or 100 Kb. Streaming data might make the end-to-end process flow more responsive, as wells as more resilient to network/internet dropouts.
4 - Maybe read a bit about how some IoT device patterns manage to upload the significant to the cloud in both online and offline situations.
If you provide more details about your setup, data file format etc. etc - it might be possible to provide more specific suggestions.
for the past 3 days I've been trying to create an upload system for multiple files, possibly large, with progress bars.
I've been roaming the web relentlessly for the past few days, and I can say, I am now familiar with most difficulties.
sadly, all the solutions I've found online are not written c# or vbscript, in fact most of them are written in php.
I wouldn't mind switching to another language but the entire website is written in vb.net and for the sake of coherence I thought it might be best to keep with it.
File uploads:
Problem 1 - progress bar:
I understand file uploads will not work with ajax, since the ajax response will only occur after the file had completed its upload.
I understand there is a solution using iFrames but I cannot seem to find any online examples (preferably using vb.net or c#).
I understand there is another alternative using flash. how???
I also understand people are mostly against using iframes but I can't find what the reason might be.
Problem 2 - Multiple Files:
I can have multiple file support with HTML5. great, but IE doesn't support it? well... IE users will just have to upload one file at a time.
Problem 3 - Large files:
how?
I heard something about chunking, and blobs, but these are still just random gibberish words for me. can somebody explain, the meaning and the implementation?
references to reading material are much appreciated even though, if it's on the web, I've probably already read it in my search for my solution.
#DevlshOne has a decent thread with some good information.
Here are the three basic requirements for what I did:
Create Silverlight app for clientside access and upload control. (use app of your choice)
Create an HttpHandler to receive the data in chunks and manage requests.
Create the database backend to handle the files.
Silverlight worked well because I was already in VB (ASP.NET). When used in-browser, as opposed to out-of-browser, the ASP.NET session was shared with Silverlight, so there was no need to have additional security/login measures. Silverlight also allowed me to limit what file types could be selected and allow the user to select multiple files from the same folder.
The Silverlight app grabs the files selected by the user, displays them for editing of certain properties, and then begins the upload when the user clicks the 'upload' button. This sets off a number of threads that each upload chunks of data to the httphandler. The HttpHandler and Silverlight app send and receive in chunks, with the HttpHandler always sending an OK or ERROR message when the request has been processed for the uploaded chunk.
Our specific implementation of file uploading also required some database properties (fields) to be filled out by the user, so we also had inputs for those properties and uploaded them to the server with the file data.
An in-browser Silverlight app can also have parameters passed into it through the html, so I do this with settings like 'max chunk size' or 'max thread count'. I can change the setting in the database and have it apply to all users.
The database backend is basically a few stored procedures (insert your data management preference here) that control the flow of the logic. One table holds completed files (no file data), and a second holds the temp files that are in progress of being uploaded. One stored procedure initiates a new file record in the temp table and processes additional chunk uploads, and another controls the migration of the completely uploaded file from the temp table to the completed table. (A piece of VB code in the HttpHandler migrates the actual binary file data from the temp table to a physical file.)
This seems pretty complex, but the most difficult part would be the interaction with the handler and passing the chunks around (response/requests, uploading successive chunks, etc.). I left out a lot of information, but this is the basic implementation.
Im trying to use Index service server for searching on DICOM files. I wanted to know if Index service does support .dcm and .dicom file extensions. I did read about IFilters but was unable to find any for DICOM files. Thanks!
The windows indexing service became "Windows Search" when XP came along.
I've never come across anyone using it for DICOM, but I imagine it wouldn't be too hard to use the SDK to write a filter to pull DICOM tags out of files and index them.
However, I suspect it's just going to result in a massive amount of information overload; a DICOM study typically contains several series plus captures/derived series, and each series could be thousands of individual files (ok the latter less of a problem with modern extended DICOM and one-volume-per-file, but there's still a massive amount of per-slice-files out there and being produced). File-level indexing alone doesn't sound that useful. If Windows Search could be made aware of the hierarchy and let you search for studies/series that might be more interesting. But most people just keep their DICOM in a PACS of some sort which will generally have excellent DICOM-oriented search/navigation tools.
In my web application I am working with files. Some files are very large. I use Response.Write() to write the file to the browser. This goes well for the smaller files, but for large files this can take a while and the bandwidth is fully used.
Is it possible to split large documents and send it piece by piece to the browser? Are there other ways to send the document quicker to the browser?
I hold the document as a property of an object.
Why don't you compress the file and store it in the DB and decompress it will extracting it?
You can do a lot of things depending on this questions:
How often does the file change?
Do I really need the files in the DB?
Why not store the File path in the
DB and the File on disk?
Anyhow, since your files are extremely high bandwidth and you would want your app to respond appropriately you might want to use AJAX load the files Asynchronously. You can have a WebHandler .ashx for this.
Here's a few examples:
http://www.dotnetcurry.com/ShowArticle.aspx?ID=193&AspxAutoDetectCookieSupport=1
http://www.viawindowslive.com/Articles/VirtualEarth/InvokingserversidecodeusingAJAX.aspx
My question is, is it possible to
split large documents and send it
piece by piece to the browser?
It depends on the file type, but in general no. If you are sending something like an excel file or a word doc etc. the receiving application will need all of the information (bytes) to fully form the document. You could physically separate the document into multiple ones, and that would allow you to do so.
If the bandwidth is fully used, then there is nothing you can do to "speed it up" short of compressing the document prior to send. In other words, zip it up.
Depending on the document (I know you said .mht, but we're talking content here) you will see the size go down by some amount. Maybe it's enough, maybe not.
Either way, this is entirely a function of the amount of content you want to send versus the size of the pipe available to send it. One of those is more difficult to change than the other.
Try setting IIS's dynamic compression. By default, it's set fairly low, but you can try setting it for a higher compression level and see how much that helps.
I'm not up to speed with ASP.NET but you might be able to buffer from a FileStream to some sort of output stream.
You can use the Flush method to send the currently buffered data to the client (the browser).
Note that this has some implications, as is described aptly here.
I've considered using it myself, a project sent documents that became fairly large and I was cautious about storing the whole data in memory. In the end I decided the data was not large enough to be a problem though.
Sadly the MSDN documentation is very, very vague on what Flush implies and you will probably have to use Google to troubleshoot.
I'm using C# & the compact framework on an embedded device to log data to a compact flash card. Because data integrity is of upper most importance, I'm wondering how to write the data to the flash disk. Will files get lost/damaged if power is lost during a write/flush or while the file is opened? What's the best way to do this?
By the way, the card uses FAT32 as file system if that's important.
greetings,
Korexio
If performance is not an issue, I personally would prefer to use the first method which is OPEN-WRITE-CLOSE. FAT32 is really vulnerable to data damage during write operation.