Combining Multiple Files Into Single Archive (Silverlight/C#)

Combining Multiple Files Into Single Archive (Silverlight/C#) - c#

In Silverlight one does not have access to the entire .NET Library and therefore I am considering the best way to get the functionality I would have courtesy of System.IO.Packaging.
I have multiple text files and I want to combine them into a single archive. Compression is not important but could wind up being valuable.
By instinct I would select an obscure characters as BOF/EOF tokens and then use a single stream to generate the multiple files, marking off BOF/EOF as a single file. I'd probably come up with a format to retrieve the original file name after the BOF as well.
But before I operate on a poor man's instinctive approach, is there a canonical approach to this? Or anyone who has done this before with some words of advice based on experience?

SharpZipLib works on the compact framework, there is no reason it won't work in SilverLight.
As for licensing (From their page):
"In plain English this means you can
use this library in commercial
closed-source applications."

Related

PDF Creating Server

I've been tasked to create (or seek something that is already working) a centralized server with an API that has the ability to return a PDF file passing some data, and the name of the template, it has to be a robust solution, enterprise ready. The goal is as follows:
A series of templates for different company things. (Invoices, Orders, Order Plannings, etc)
A way of returning a PDF from external software (Websites, ERP, etc)
Can be an already ready enterprise solution, but they are pressing for a custom one.
Can be any language, but we don't have any dedicated Java programmers in-house. We are PHP / .NET, some of us dabble, but the learning curve could be a little steep.
So, I've been reading. One way we've thought it may be possible is installing a jasper reports server, and creating the templates in Jaspersoft Studio, then using the API to return the PDF files. A colleague stands for this option, because it's mostly done, but 1º is java and 2º I think it's like using a hammer to crack a nut.
Other option we've been toying with is to use C# with iTextSharp to create a server, and create our own API that returns exactly the PDF with the data we need. Doing this we could have some benefits, like using the database connector we have already made and extracting most of the data from the database, instead of having to pass around a big chunk of data, but as it is bare, it doesn't really have a templating system. We'd have create something from with the XMLWorker or with c# classes but it's not really "easy" as drag and drop. For this case I've been reading about XFA too, but documentation on the iText site is misleading and not clear.
I've been also reading about some other alternatives, like PrinceXML, PDFBox, FOP, etc, but the concept will be the same as iText, we'd have to do it ourselves.
My vote, even if it's more work is to go the route of iText and use HTML / CSS for the templates, but my colleagues claim that the templates should be able to be changed every other week (I doubt it), and be easy. HTML / CSS would be too much work.
So the real question is, how do other business approach this? Did I leave anything out on my search? Is there an easier way to achieve this?
PS: I didn't know if SO would be the correct place for this question, but I'm mostly lost and risking a "too broad question" or "off topic" tag doesn't seem that bad.
EDIT:
Input should be sent with the same request. If we decide the C# route, we can get ~70% of the data from the ERP directly, but anyway, it should accept a post request with some data (template, and data needed for that template, like an invoice data, or the invoice ID if we have access to the ERP).
Output should be a PDF (not interested in other formats, just PDF).
Templates will be updated only by IT. (Mostly us, the development team).
Performance wise, I don't know how much muscle we'll need, but right now, without any increase, we are looking at ~500/1000 PDFs daily, mostly printed from 10 to 10.30 and from 12 to 13h. Then maybe 100 more the rest of the day.
TOP performance should not be more than ~10000 daily when the planets align, and is sales season (twice a year). That should be our ceiling for the years to come.
The templates have some requirements:
Have repeating blocks (invoice lines, for example).
Have images as background, as watermark and as blocks.
Have to be multi language (translatable, with the same data).
Have some blocks that are only show on a condition.
Blocks dependent on the page (PDF header / page header / page footer / PDF footer)
Template will maybe have to do calculations over some of the data, I don't think we'll ever need this, but it's something in the future may be asked by the company.
The PDFs don't need to be stored, as we have a document management system, maybe in the future we could link them.
Extra data: Right now we are using "Fast-Reports v2 VCL"

Your question shows you've been considering the problem in detail before asking for help so I'm sure SO will be friendly.
Certainly one thing you haven't detailed much in your description is the broader functional requirements. You mentioned cracking a nut with a hammer, but I think you are focused mostly on the technology/interfacing. If you consider your broader requirements for the documents you need to create, the variables involved, it's might be a bigger nut that you think.
The approach I would suggest is to prototype solutions, assuming you have some room to do so. From your research, pick maybe the best 3 to try which may well include the custom build you have in mind. Put them through some real use-cases end to end - rough as possible but realistic. One or two key documents you need to output should be used across all solutions. Make sure you are covering the most important or most common requirements in terms of:
Input Format(s) - who can/should be updating templates. What is the ideal requirement and what is the minimum requirement?
Output Requirement(s) - who are you delivering to and what formats are essential/desirable
Data Requirement(s) - what are your sources of data and how hard/easy is it to get data from your sources to the reporting system in the format needed?
Template feature(s) - if you are using templates, what features do the templates need? This includes input format(s) but I was mostly thinking of features of the engine like repeating/conditional content, image insertion, table manipulation etc. ie are your invoices, orders and planning documents plain or complex
API requirements - do you have any broader API requirements. You mentioned you use PHP so a PHP library or Web/Web Service is likely to be a good starting point.
Performance - you haven't mentioned any performance characteristics but certainly if you are working at scale (enterprise) it would be worth even rough-measuring the throughput.
iText and Jasper are certainly enterprise grade engines you can rely on. You may wish to look at Docmosis (please note I work for the company) and probably do some searches for PDF libraries that use templates.
A web service interface is possibly a key feature you might want to look at. A REST API is easy to call from PHP and virtually any technology stack. It means you will likely have options about how you can architect a solution, and it's typically easy to prototype against. If you decide to go down the prototyping path and try Docmosis, start with the cloud service since you can prototype/integrate very quickly.
I hope that helps.

From my years of experience in working with PDF I think you should pay attention to the following points:
The performance: You may do the fastest performance with API based pdf files generation in comparision to HTML or XML to PDF generation (because of an additional layer of conversion involved). Considering peaks in the load you may want to calculate the cost of scaling up the generation by adding more servers (and estimate the cost of additional servers or resources required per additional pdf file per day).
Ease of iterations and changes: how often will you need to adjust templates? If you are going to create templates just once (with some iterations) but then no changes required then you should be OK by just coding them using the API. Otherwise you should strongly consider using HTML or XML for templates to simplify changes and to decrease the complexity of making changes in templates;
Search and indexing: If you may need to run search among created documents then you should consider storing indexes of documents generated or maybe store more the source data in XML along with PDF file generated;
Long time preservation: you should better conform to PDF/A sub-format in case you are looking for a long time digital preservation for your documents. See the VeraPDF open source initiative that you may use to validate generated and incoming PDF documents against the conformance to PDF/A requirements;
Preserving source files The PDF format itself was not designed to be edited (though there are some PDF editors already) so you may consider the need of preserving the source data to be able to regenerate PDF documents later and probably introduce additional output formats later.

How to parse LDAP Data Interchange Format string in .NET? [duplicate]

I am looking for an LDIF parser for C#. I am trying to parse an LDIF file so that I can check objects don't exist before adding them. Adding them when the already exist using ntdsSchemaAdd) causes an entry in the error logs.

A quick websearch revealed: http://wiki.github.com/skradel/Zetetic.Ldap/. They have provided a .net API.
From the page:
Zetetic.Ldap is a .NET library for
.NET 2 and above, which makes it
easier to work with directory servers
(like Active Directory, ADAM, Red Hat
Directory Server, and others). Some of
the key features of Zetetic.Ldap are:
1.LDIF file parsing and generation – Read and write the file format used
for moving data around between
directory systems
2.LDAP Entry-oriented API with change tracking – Create and modify directory
objects in a more natural way
3.LDAP Schema interrogation – Quick programmatic access to the kinds of
objects and fields your directory
server understands. Learn if an
attribute is a string, a number, a
date, etc., without lots of manual
research and re-parsing
4.LDIF Pivoter – Turn an LDIF file into a (comma or tab-delimited) flat
file for analysis or loading into
systems that don’t speak LDIF We built
the Zetetic.Ldap library to make
directory projects and programming
faster and easier, and release it here
in the hopes that others will find it
useful too. As far as we know, this is
the only .NET library that really
understands the LDIF specification.
Download link: http://github.com/downloads/skradel/Zetetic.Ldap/Zetetic.Ldap_20090831.zip

I would parse it myself.
If you look at the LDIF RFC for the EBNF, you'll see that it's not a very complex grammar.
I've parsed a large amount of LDIF before using Regexes reliably. Though your mileage may vary.

Alternative localization with extension methods

I am about to start a localization project for my employer. It concerns a pre-existing project with many windows forms and an established code base, programmed in C# and ASP.NET. I have done research into how to localize an application in visual studio and found resources.
While these are an adequate solution to the problem, I am not entirely happy with the down sides of using resources. This is to say, it has a rather large footprint, requiring changes in each of the form files. Furthermore, the resource files are only editable from within Visual Studio. I would prefer enabling external translators without programming knowledge to do the translation.
So I came up with an alternative solution:
Build a static localization utility class with an extension method on String:
public static String Localize(this String s)
The utility class loads localization strings from file on startup. When the program needs a string somewhere, it is called as
"foo".Localize();
And the program would use the string itself as the key in the table to find the translation.
It seems a safe and effective solution, and I'm happy with the small footprint that it leaves on the existing codebase.
Basically I want to ask:
Are there downsides to my solution that I've missed?
Which file formats for the localization data should I look into (I've already encountered the .po file format)?
Is it a good enough reason to deviate from the resource files solution?
Any advice and/or considerations you may have will be appreciated.

You are trying to reinvent the wheel that MS invented long ago. You can use plenty of tools available for resources or even write your own Resources provider.
Some tools available: What tools are available for adding Localization to an ASP.NET project?
If you want to use a database for translators: Data Driven Resource provider from Rick Strahl

Are there downsides to my solution that I've missed?
I can point out some, what about the texts in the aspx files. Are you going to make your extension method available to them as well? That would be tough I guess.
e.g. <asp:Label Text="Title"> - how are you going to translate that?
Further, some of your claims are not entirely true.
the resource files are only editable from within Visual Studio
They are xml files , so you can use any editor to edit them or write a custom utility to do that.

Are there downsides to my solution that I've missed?
The standard resource files go beyond changing the text.
You might need to resize certain elements to fit the new text (if you don't use the existing layout management mechanism). And for some languages you will need to change the fonts/fonts sizes (think Chinese, Japanese, Korean) or alignment (think right-to-left languages like Arabic and Hebrew).
Also, translating standard files means that using an editor that is aware of the format one can see the dialog "as is", so it gives more context than stand-alone strings, which results in better translation quality.

How to cut, edit and merge OGG files in C#?

I have an ogg vorbis file and I have to do two operations with it:
Cutting a part of a file from one position to another
Merging another file with existing one
How can I do these two operations in C#?

You can do this with libzplay http://libzplay.sourceforge.net/
The steps needed to do what is being asked about:
OpenFile
Seek
SetWaveOutFile(this supports .ogg exporting as well as other
formats)
StartPlayback
StopPlayback(at time needed)
Everything is extremely well documented on the linked site for multiple languages, including c#.
This answer is for all the other people that spent hours searching and weren't helped by the previous answers. This isn't a very efficient solution to the problem here, but while searching this question came up many times, and this might be helpful to others. :)

I'd look into the c documentation for libogg, and figure out how to do this with c. And then write almost the same code in C# using a wrapper over libogg.
I've created a low level wrapper over libogg and libvorbis using the interop assistant:
https://github.com/CodesInChaos/Xiph/blob/master/LowLevel.cs
That project also contains some higher level constructs, but I don't think they'll be useful for what you're doing.
BTW if the stream IDs between the files differ, you can simply append a file to another creating a valid file that plays both streams in sequence.
You probably need to read the input files packet wise using the decoding API, and then write the combined data out packet wise. Possibly replacing the stream ID and granulepos in between.
StreamID is an integer that identifies substreams in an ogg file. To append multiple such substreams you can simply ensure that they have a different ID and then write the data.
Splitting is a bit more annoying, since granulepos is a codec dependent timestamp, and I don't remember how it is defined for vorbis. Another problem here is that you can't simply split in the middle of a packet without reencoding.

How to get all file attributes including author, title, mp3 tags, etc, in one sweep

I would like to write all meta data (including advanced summary properties) for my files in a windows folder to a csv file. Is there a way to collect all the attributes? I see mp3 files have a different set of attributes compared to jpg files. (c#)
This can also be a script (vb, perl)
Update: by looking at libextractor (thank you) I can see this can be achieved by writing different plugins for different type of files. I gather this meta data is not a simple collection...

In Perl, you can use MP3::Tag or MP3::Info

If you can cope w/ VB.Net: http://www.codeproject.com/KB/vb/mp3id3v1.aspx
If you can cope w/ C++/.Net: http://www.codeproject.com/KB/audio-video/mp3fileinfo.aspx
For either (assuming the C++) is compiled to .Net, you can use Reflector to disassemble the binary and convert it to C#. Check w/ the respective authors about their licenses first (usually Code Project articles are under an open license like CPOL).

In a library? Try libextractor if your software is GPL.

Ok, after the clarification edits, I would suggest looking at the introspection available in .Net. I will warn you however that I think you will get more satisfying results if you forgo introspection and define the specific properties that you want for the file types that you expect to see.
Since scripting is valid, then if this were my problem to solve I would use Powershell since the .net introspection is baked in.

It may not be worth it to add all of the data from a jpeg file (exif data). I would hand pick what attributes I wanted from those files.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.