My application is developed to take input from text/excel files, do validations and then perform database operations. Similarily it also takes input data from database and write it into text file/excel files. I am currently using stream reader and writer class from system.io namespace for input/output from text files, but I believe there are other options available in current frameworks which will give better performance compare to approach I am currently using. I like to know what are other approaches we can use to perform such activities. Share some link for books or tutorial available for it.
What you're describing is typically referred to as ETL which is short for Extract, transform, and load.
The default ETL tool for the C# programmer is SQL Server Integration Services or SSIS because of its .NET integration. You should note that it doesn't require that either the targets or sources have SQL Database involved it just acts as the broker.
CSV Reader is a C# only solution that comes at a fairly reasonable price. This means that the context of doing the ETL is in your application. If you're writing an application where a user picks a file and loads it manually this is nice option. If you need automation you'll have to write a Windows Service or use a scheduler.
In the Open Source Space there's Rhino ETL
I'll suggest you to use gembox to perform I/O in excel files:
http://www.gemboxsoftware.com/spreadsheet/overview
For text files I believe the system.io is the best aproach...
Related
Maybe I did not fully understand how complex hadoop really is, if there is something incorrect please help me out. So what I got is this:
Hadoop is a great thing to handle a big amount of data. Mostly for data analysis and mining. I can write my own mapreduce functions or using pig or hive. I can even use existing functions, wordcount and stuff like that - I dont even have to write code.
Ok, but what if I would like to use the great power of hadoop for non-analysis/mining things? As example I have a .NET application written in C# that is able to read files and generating pdfs with some barcodes. This application is running on one server, but because the one server cannot handle the big amount of files I need more power. Why not adding some hadoop nodes/clusters to handle this job?
Question: can I take my .NET application and tell hadoop "do this, on every on your nodes/cluster"? -> Running these jobs without coding, is it possible?
If not, do I have to throw away the .NET application and rewrite everything in pig/hive/java-mapreduce? Or how do people solve these issues in my situation?
PS: The important thing here is not the pdf generator and maybe also not .NET/C# - the question is: there is an application in language whatever - can I give it to hadoop just like that? Or do we have to rewrite everything in mapreduce functions?
#Mongo : I'm not sure if I understood correct but I'd try sharing what I know. First of all hadoop is a framework - not an extension or a plugin.
If you want to process the files or perform a task in hadoop, you need to make sure that your requirements are properly put forward so that hadoop understand what to do with your data. To put it simple, let us consider the same word count example. If you want to perform the word count on a file, you can do it using any language. Lets say we have done it in Java, and we want to scale it to larger files- dumping the same code in to a hadoop cluster would not be helpful. Though the java logic remains the same, you will have to write a Map reduce code in java which would be understood by the hadoop framework.
Here's an example of a C# map reduce program for Hadoop processing
Here's another example of MapReduce Without Hadoop Using the ASP.NET Pipeline
Hope this is helpful. I'm assuming that my post adds some value to your question. I'm sure you would be getting better thoughts/suggestions/answers from the wonderful people here...
P.S: You could mostly do anything and everything thats related to file processing/ data analysis in Hadoop. It all depends up on how you do it :)
Cheers !
Any application that can run in Linux can be done in Hadoop, using Hadoop-streaming. And a C# application can run in Linux using Mono.
So you can run your C# application using both Hadoop-streaming and Mono. But still, you need to adapt your logic to the map-reduce paradigm.
However, it should not be a big deal in your case. For instance, you could:
create a Hadoop-streaming job with mappers only (no reducers)
process exactly 1 file per mapper
each mapper would run "mono yourApp.exe", reading the input file in stdin, and writing the output in stdout
Also, Mono must be available on the Hadoop cluster. If not, some admin privileges will be required to install and deploy Mono yourself.
I'm looking for options for connecting to (primarily reading data) UNIX/AIX/Business Basic from Windows systems.
I program in C# mostly so would need a .NET solution.
Solutions or comments are welcome.
It depends upon which Business Basic compiler you are using. the most common is BBx, so I will answer your question based upon this particular compiler. BBx will run seamless on Windows or Unix platforms. You can interchange programs between them without recompiling them. BUT, you must have a compiler to make it run. Remember that BBx, ProvideX, Thorobred basic, etc are all M-code compilers, and not P-code compilers. Most people use the utility program that comes with BBx to interchange data between the BBx environment and a web site. Go find the BBx manuals that come with the compiler. You can also use an OPTIONAL ODBC driver. Call Basis International In New Mexico for the driver. Later versions of BBx can also read/write to SQL databases and other types of file systems. But, most BBx programmers will use the keyed file system that comes with it. You can also read/write to ascii files in BBx.
Please note: BBx and all other Business Basic compilers do NOT use flat ascii files. They can write flat ascii files, but, in my 30 years I have never seen anyone use them as a file structure. They are only used to import/export data to/from BBx. The keyed files have a SIT area, Keyed Area, and a Data area. You NEED to read these files in BBx. If you use C or some other language to alter the data in the file, you corrupt the Checksum and the file will become useless. And, you will have one very pissed off customer.
You might also consider getting a BBx compiler for your Windows environment to help you. Pretty cheap option if you don't have source code on BBx side. Remember this ISN'T Basic...it's Business Basic.
You should also find the data structures of the file system on the BBx side. It is very hard to work with the system without it. Some programs will have DBMS on-board, and you can just print out the record layouts to the hundreds of files on the system.
It all depends a lot on the format that is used to store your data. If the data is just flat files, you could use something like rcp or ftp. A number of .NET componentens, both commercial and open-source are available for this kind of access.
If not, you can look for ODBC drivers. There are some vendors that sell drivers for BBX, C-ISAM, D-ISAM, etc..
I agree with Mike, the easiest way would be to use an ODBC. you can find one on the basis web site www.basis.com (they are bbx creators).
If you want fast on the fly access to the data you would be better of writing your own back-end in bbx and have it talk to you C# program using sockets.
I have written an Internet banking app (ASP.NET) talking to a bbx host in this way.
If you need further help feel free to ask.
BBX does has a MS-Dos version. With this you can fully approach all the databases in MS-Dos. With a tool DOSbox you can emulate ms-dos for windows 7 8 and 10.
Within windows you have to assign a windows driveletter to the network and location of your Unix database.
Within BBX it's possible to mount this drive.
We need to have documents shared between clients (CRM-like functionality). Users need to be able to:
Edit the documents and save them again
Attach new documents
Our application is coded in WPF with WCF for data-transport and NHibernate/SQL for data on the server.
what we're thinking is to use SVN and have the app create a local check-out of parts of the repository (when they click a document, it is checked out by SVN in the background and opened from the local path) - When saved it will silently (using monitoring of the path) be committed back to the repository.
Question: Is this feasible - or are there better solutions to this?
EDIT 1:
Summary so far:
I'll look into using Git/Mercurial instead of SVN
Document size (revisions) might be prohibitive pending tests
SharePoint is an option (although not viable in my case as the cost alone is prohibitive) - I will look into the alternatives for SharePoint, tho.
Not much experience out there about using repositories for many users although it works for small teams..
Wiki software might be an alternative to SVN.
Thanks for all the feedback - I'll keep it open a bit longer.
EDIT 2:
Summary after a few days of work - I have a client working - see my progress here.
Based on the heavy .NET references, are you all set up with MSDN? Perhaps you can make use of SharePoint...which may already be included within your MSDN account.
You might also want to consider using a Wiki for document management - I've seen this done and do it myself for my own organisation. We're using Atlassian's Confluence Wiki. Confluence provides for the versioning and general management of documents.
I wouldn't use SVN for this, SVN is not very efficient when dealing with binary files. By using SVN as a back channel for some content in your application you just complicate things by adding another technology and dependency, but you will not use much of its real potential.
I would store the documents as blobs in the database and get/store them through WCF.
Generally I don't think that SVN or any version control system is a good thing to use for sharing documents. Main disadvantage is the diff system on binary files... Your SVN repo will grow rapidly..
Maybe you should try using some of the commercial tools designed for document sharing (eg. Microsoft Sharepoint). Or some Open Source alternatives... Perhaps you should read this post...
It depends on the kind of documents you are using. If you have lots of changing, compressed binary files, then don't use it.
However, if the documents are in an open format like a Wiki language, (X)HTML, LaTeX or uncompressed ODF, then using a version control system makes absolutely sense. Also, a bunch of compressed ODF files or PDF files are handled very well, especially if the files are mostly smaller than 5 MB or so.
In addition, make sure to check some more recent version control systems like Mercurial and Git before sticking to the conceptually outdated SVN. In your scenario, you won't profit much from the "distributed" part of Mercurial and Git, but they are nevertheless easier to setup - at least to my experience. And they provide very advanced version control features which can save your day in the rare cases when you need them.
In case you stick to SVN, and if your client software runs under a modern Unix system, you can also try SVN-FS. This is a filesystem that uses a remote SVN server. Each read goes to the latest revision. Each write creates a new commit. This seems to be exactly what you wanted to build around SVN.
I think that using ready made and proven tech is great idea. Would like to see it's progress if you really go that way.
I would strongly go AGAINST SharePoint - you'll tie yourself to Microsoft in manners that are hard to describe here. From my point of view, SharePoint is a tech that needs taking care of just for itself.
I've accessed excel files using desktop applications using OleDbReader, interop, and the latest (and my fav), ling-to-sql. However, this time I need to do so from a Web application using asp.net with C# code behind.
I don't need to create excel files, only read them. Is it possible to do this with a .xls(x), or should I be shooting for .csv?
Looks like they have lots of different solutions posted here: Reading Excel files from C#
Looks like a combination of people recommending ADO.NET as long as the Excel file is pretty straightforward (since there might be "quirks" depending on what type of data you are storing) or different 3rd party tools.
I'd recommend going with CSV. We have a lot of difficulty working with xls files on web servers. Mainly because of Microsoft's restrictive licensing. You'd have to have an Excel license for everyone who'd be accessing the file.. well maybe. That's how it is for us anyway. It might be different under other circumstances. Anyhow, it's not really practical.
Does anyone know of a 3rd party data import wizard that can be embedded into applications? It should import from Excel, Access, SQLServer, csv, tab-separated flat file, XML, Oracle etc. We have a fixed data structure within our application and the user should be able to configure the wizard to match his/her import fields to our own data structure.
The wizard should be a library of sorts – preferably a .Net type library. We may want to have it both web-based and desktop based (hence we may need an ASP.Net controls version and a Winforms version). We may also want integration with WPF and Silverlight.
If there’s no UI wizard available, does anyone know of a non-UI library that supports easily configurable import from many, many different datasources?
Possible solution would be to use SQL Server Integration Services (SSIS)
The client can develop the custom package to be run, and the system can run that package in order to import the data.
The primary problem here is that the requirements are nearly impossible to fulfill in a generic way that is still easy enough to use for an end user.
Any import tool would have to be programmed to know a great deal about your data structures, relationships, and business logic.
The actual act of performing the import or having a few screens to step through this is so minor in comparison (usually less than 5% of the work) that there's almost no point in building a "generic" tool for other coders to use.
Now, if you don't mind giving a lot of complexity to your end users AND allowing them to be able to "forget" about your business logic and potentially screw up the imports then by all means hand the something like SSIS.
However, if you need to control what goes in, validate it, and generally make sure it's not going to crater your system then you'll have to code this one yourself.