How to parse LDAP Data Interchange Format string in .NET? [duplicate] - c#

I am looking for an LDIF parser for C#. I am trying to parse an LDIF file so that I can check objects don't exist before adding them. Adding them when the already exist using ntdsSchemaAdd) causes an entry in the error logs.

A quick websearch revealed: http://wiki.github.com/skradel/Zetetic.Ldap/. They have provided a .net API.
From the page:
Zetetic.Ldap is a .NET library for
.NET 2 and above, which makes it
easier to work with directory servers
(like Active Directory, ADAM, Red Hat
Directory Server, and others). Some of
the key features of Zetetic.Ldap are:
1.LDIF file parsing and generation – Read and write the file format used
for moving data around between
directory systems
2.LDAP Entry-oriented API with change tracking – Create and modify directory
objects in a more natural way
3.LDAP Schema interrogation – Quick programmatic access to the kinds of
objects and fields your directory
server understands. Learn if an
attribute is a string, a number, a
date, etc., without lots of manual
research and re-parsing
4.LDIF Pivoter – Turn an LDIF file into a (comma or tab-delimited) flat
file for analysis or loading into
systems that don’t speak LDIF We built
the Zetetic.Ldap library to make
directory projects and programming
faster and easier, and release it here
in the hopes that others will find it
useful too. As far as we know, this is
the only .NET library that really
understands the LDIF specification.
Download link: http://github.com/downloads/skradel/Zetetic.Ldap/Zetetic.Ldap_20090831.zip

I would parse it myself.
If you look at the LDIF RFC for the EBNF, you'll see that it's not a very complex grammar.
I've parsed a large amount of LDIF before using Regexes reliably. Though your mileage may vary.

Related

Should an application settings file work with the same program written in different languages

I am writing a program in C# that I plan to have rewritten in different programming languages, for other platforms (namely Mac OSX) in the future. The application needs to save some user-defined settings. My question is, should I save the settings file in such a format that can be read by any programming language, or should I use the app.config built into .NET?
Also what are the benefits of each?
Well, it depends on how you're planning on implementing it in different platforms. If you use Mono, for example, app.config would be fine. But otherwise, I'd consider using a more neutral file format. XML is the obvious option - particularly if it doesn't need to be hand edited. Or you could use JSON, or a simple key = value one-entry-per-line format.
Of course app.config is XML, but it's an XML format which is specific to .NET. The format used by app.config isn't one you'd normally come up with if you were creating your own settings file in general. I'm sure you'd be able to read an app.config file from other platforms without too much hassle, but it wouldn't be the most natural way of going about it.
Choice of format would partly be dictated by what the settings are - is a single unstructured list good enough, or would you benefit from the tree hierarchy given by XML, for example?

How to make use of USE SharpNlp in my C# application

I require POS tagging for my files in the corpus.
I have successfully followed the installation instructions of SharpNlp
I am using the binary version
I created a new c# project in: E:\sharp\sharpapp
location of Models Folder is: E:\sharp\sharpapp\bin\Models
location of my SharpNlp Binary is: E:\sharp\SharpNLP-1.0.2529-Bin
I have also followed the instructions to modify both .config files "ParseTree.Exe" and "ToolsExamples.Exe"
Now in my c# project I have a class called tagging.cs where I have to access my corpus text files and do POS tagging for those files. Can anybody help me how can I make use of SharpNlp to do so
Please provide steps to do so.
In a nutshell, SharpNLP is
a port to C# of OpenNLP Tools and OpenNLP MaxEnt
a connector to WordNet
a set of pre-computed models, mostly for the English language
utility modules such as integration with SQLLite
It should be noted that the port of the OpenNLP libraries is relatively informal, with various class and property name changes, possibly loose preservation of features and semantics and no apparent connection with the original Java projects' lifecycle. This situation will likely ensure that in time the OpenNLP portion of SharpNLP will be more akin to distant cousins than twin sisters...
Never the less, it is possible to use examples and documentation from OpenNLP to complement the relatively thin support material available with SharpNLP. Between the source code of SharpNLP and resources like the OpenNLP API reference and the OpenNLP wiki, one can generally map things and adapt accordingly.
A loose conductor could be the study of this particular source file which makes use of OpenNLP in a way that seems close to what you may need. Note the name changes between OpenNLP and SharpNLP, for example POSTTaggerME class becomes MaximumEntropyPosTagger and the Parse() method and its overload turn to TagSentence() and such.
A more general hint is to understand...
...the sequence of steps typically necessary to perform POS Tagging.
This is a very high-level approximate description but, I think, useful.
get the text to be tagged = string(s) of text
Initialize a text parser
parse it = an "array" (or other container) with individual tokens i.e. words and punctuation characters.
initialize the POS Tagger, in particular tell its which model it should use
feed the [ordered] sequence of tokens to the POS Tagger
Ta dah! Use the POS tags for the eventual purpose of your NLP application.
Note how the above sequence assumes that the model is readily available.
The model is a representation of the statistical "profile" of text in general, obtained from training the Tagger with a set of text readily tagged.
SharpNLP comes with a model for generic English language, but in order to tag other languages or if the specific corpora to be tagged belongs to a particular domain (say medical reports or Tweets or...) it may be preferable to re-train the tagger to improve its precision.
Open/SharpNLP as most POS Taggers, whether stand-alone or their API, typically include features to train them (= to produce a model given a sample set of text readily tagged) and also to verify the quality of the model/tagger so produced (= to compare the tags produced on a test set, with the tags expected for this set).
Kindly read through the article that I have written for this. It will give you a detailed step by step method with sample code snippets.
Easy way of Integrating SharpNLP into your project in Visual Studio
I hope this was useful.

Writing BibTex parser

How should I start writing a parser for BibTex files. As the initial design I see following steps.
List down grammar
Build a tokenizer
Do parsing of token stream against grammar
We also need some error mechanism, so the users uploading bibtex files can know line numbers where is the error in their BibTex files. I am looking for community opinion to target this problem.
(please point if there are any existing open source C# or VB.NET BibTex parsers.)
There are many tools available to assist you with this, such as ANTLR or the GOLD Parsing System. I usually use the latter to create my parser grammars.
I've published an open source library for BibTex format (load/save/export to Excel), allowing both non-typed (Key/Value dictionary) and strong typed access to the BibTex entries.
It might not fit well your purpose, as it is weak on validation (has none of it :) ), but might help anyway:
Nuget Package
GitHub repository
About the package on my web site

Combining Multiple Files Into Single Archive (Silverlight/C#)

In Silverlight one does not have access to the entire .NET Library and therefore I am considering the best way to get the functionality I would have courtesy of System.IO.Packaging.
I have multiple text files and I want to combine them into a single archive. Compression is not important but could wind up being valuable.
By instinct I would select an obscure characters as BOF/EOF tokens and then use a single stream to generate the multiple files, marking off BOF/EOF as a single file. I'd probably come up with a format to retrieve the original file name after the BOF as well.
But before I operate on a poor man's instinctive approach, is there a canonical approach to this? Or anyone who has done this before with some words of advice based on experience?
SharpZipLib works on the compact framework, there is no reason it won't work in SilverLight.
As for licensing (From their page):
"In plain English this means you can
use this library in commercial
closed-source applications."

How to get all file attributes including author, title, mp3 tags, etc, in one sweep

I would like to write all meta data (including advanced summary properties) for my files in a windows folder to a csv file. Is there a way to collect all the attributes? I see mp3 files have a different set of attributes compared to jpg files. (c#)
This can also be a script (vb, perl)
Update: by looking at libextractor (thank you) I can see this can be achieved by writing different plugins for different type of files. I gather this meta data is not a simple collection...
In Perl, you can use MP3::Tag or MP3::Info
If you can cope w/ VB.Net: http://www.codeproject.com/KB/vb/mp3id3v1.aspx
If you can cope w/ C++/.Net: http://www.codeproject.com/KB/audio-video/mp3fileinfo.aspx
For either (assuming the C++) is compiled to .Net, you can use Reflector to disassemble the binary and convert it to C#. Check w/ the respective authors about their licenses first (usually Code Project articles are under an open license like CPOL).
In a library? Try libextractor if your software is GPL.
Ok, after the clarification edits, I would suggest looking at the introspection available in .Net. I will warn you however that I think you will get more satisfying results if you forgo introspection and define the specific properties that you want for the file types that you expect to see.
Since scripting is valid, then if this were my problem to solve I would use Powershell since the .net introspection is baked in.
It may not be worth it to add all of the data from a jpeg file (exif data). I would hand pick what attributes I wanted from those files.

Categories

Resources