How to connect to Hadoop/Hive from .NET - c#

I am working on a solution where I will have a Hadoop cluster with Hive running and I want to send jobs and hive queries from a .NET application to be processed and get notified when they are done. I can't find any solutions for interfacing with Hadoop other than directly from a Java app, is there an API I can access that I am just not finding?

Apparently it is possible to connect to Hadoop with non-Java solutions - see Do I have to write my application in Java?

With Hadoop: there is no straight way to connect from C# because Hadoop communication tier is working with java only and is not cross platform. It is probably possible but in very non-trivial ways.
I know there is a patch to add Protocol Buffers support for Hadoop but at the moment of writing (Aug 2011) is is not released yet.
With Hive situation is better because Hive has Thrift interface which supports C#. You can download Hive Thrift interfaces and generate C# client on your own but beware that it requires some hacking of generated code. Instead I would recommend you downloading dll from https://bitbucket.org/vadim/hive-sharp/downloads/hive-sharp-lib.dll or use Nuget package manager, search for "hive": http://nuget.org/List/Packages/Hive.Sharp.Lib
Disclaimer: I'm the author.

There is Hortonworks ODBC driver. I havn't used it personally, but it shall let you work with hive as with any other ODBC datasource. You can use OdbcConnection class to connect to Hive once ODBC driver is installed.
As noted in other answers - you can use Thrift api. For that you need to generate C# classes from interface definition files, which you can download from Hive source repository. This approach works for me.
You can use IKVM, to convert hadoop client java libraries into .Net assemblies which you can use from C#. I havn't used IKVM with Hive client, but I've IKVMed some other hadoop client library and surprisingly it worked.
EDIT:
There's also Apache templeton, which allows submitting Hive jobs (Pig and MR also) using Rest interface. The problem with it is that it spawns another map task to submit Hive job, which makes it slower.

It is possible to access Hive utilizing C# by making use of Microsoft's ODBC connector. Download the Nuget package for "Microsoft.Hadoop.Hive" and follow the example provided at http://msdn.microsoft.com/en-us/library/dn749834.aspx
The trick lies in building the connection string to connect with it. The best way I came up with was to download the Microsoft Hive ODBC Driver (http://www.microsoft.com/en-us/download/details.aspx?id=40886), install it, then use the Server Explorer inside Visual Studio to add a new connection, then build the connection string for me. To do this, I used the following steps:
Change the data source to "Microsoft ODBC Data Source" and ensure you're using the ".NET Framework Data Provider for ODBC" as the data provider.
Under the "Data source specification" portion, check the "Use connection string" then click the "Build" button.
Under the "Machine Data Source" tab, select the "Sample Microsoft Hive DSN" data source name, then click the "OK" button.
A window titled "Microsoft Hive ODBC Driver Connection Dialog" will open. Enter an optional description, then type in the path to your Hive server, the port you will be using, and what database it should connect to. Indicate the Hive Server Type, and specify an authentication mechanism to use, then fill out the appropriate fields.
Finally, click the "Test" button in the bottom to ensure that you're able to successfully connect. If successful, click the "OK" button, then you'll be back in the "Modify Connection" window. Enter the login information for your Hive service here.
Either utilize this data source or copy the connection string that it's built for you and use it within your application.

Thrift API is also another way for other language to access hdfs and hive

See if this helps. I have tried to connect to Hadoop via C#
How to communicate to Hadoop via Hive using .NET/C#

Use Hbase.Net library from https://hbasenet.codeplex.com/
Then you can connect to hbase/hive as shown below:
Client c = new Client("10.20.14.179", 9090, 1000000);
var cli = c.TotalClients;
var tableList = c.GetTableNames();
FYI, we are using hortonworks sandbox and it connects fine.
In above example, 10.20.14.179 is host and 9090 is port.
Also, below might help from https://community.hortonworks.com/questions/25101/is-there-a-way-to-connect-to-hbase-using-c.html
There is no native C# HBase client. however, there are several options for interacting with HBase from C#.
C# HBase Thrift client - Thrift allows for defining service endpoints
and data models in a common format and using code generators to
create language specific bindings. HBase provides a Thirft server and
definitions. There are many examples online for creating a C# HBase
Thrift Client.
Marlin - Marlin is a C# client for interacting with Stargate (HBase
REST API) that ultimately became hbase-sdk-for-net. I have not
personally tested this against HBase 1.x+, but considering it uses
Stargate, I expect it should work. If you are planning to use
Stargate and implement your own client, which I would recommend over
Thrift, make sure to use protobufs to avoid the JSON serialization
overhead. Using a HTTP based approach also makes it much easier to
load balance requests over multiple gateways.
Phoenix Query Server - Phoenix is a SQL skin on HBase. Phoenix Query
Server is a REST API for submitting SQL queries to Phoenix. Here is
some example code, however, I have not yet tested it.
Simba HBase ODBC Driver - Using ODBC to connect to HBase. I've heard
positive feedback on this approach, especially from tools like
Tableau. This is not open source and requires purchasing a license.

Related

Visual FoxPro data in .NET

We are in the process of migrating an old VFP application into a .NET WPF application with SQL server.
During the process we still need to read/write to the DBF files to keep our business working properly.
To do this, we use the standard OLEDB adapter that is available. However, our sysadmin is asking if we have an alternative way to access the DBF files.
Having each user connect to the files is not the best option from a network/security perspective. Specially when connecting from home through a VPN.
I've already tried to move the connection to a single server by exposing the data through an API. But that was slowing down the application too much. In some situations we synchronise the data through background jobs (Hangfire implementation). But this can be time consuming to implement.
Has anybody used any other techniques to do something similar while migrating a VFP application?
OLEDB is still the best option. Within the application, you could impersonate a specific user that has access to files.
Also Sybase Advantage Server can connect and work with VFP data files. Local mode is (was) for free and server mode paid. You might try checking that too.
Locate data on single PC as server. Access via RDP - kludges available to support multiple connections. Increase security if needed by connecting over VPN - then RDP.

How to connect to GoodData ADS using Microsoft .NET

We have an emerging need to modify the schema in our Agile Data Warehouse including adding new tables. We've been able to manually connect to the ADS database using Squirrel SQL and CloudConnect.
However, we would like a way to automate this process so that we can ensure that the schema remains consistent between our development, test and production ADS instances.
We're a .NET shop and most of our code is in C#. Has anyone had any success connecting directly to ADS using .NET (C# or VB)?
I've looked at trying to use the GoodData JDBC driver but it looks like referencing a JDBC driver from .NET is not particularly straight-forward and there is not a GoodData ODBC or ADO.NET driver available.
I'd rather use something like the Vertica driver for ADO.NET available at https://my.vertica.com/vertica-client-drivers/ however, I'm not sure what to use for all of the connection properties. I've attempted to connect using the Host and Port that are returned from the DW connection endpoints API but receive the following error: "SSL Startup Failed." when using the Vertica ADO.NET driver.
Is there a way to connect to the GoodData ADS daatabase using .NET or any better approaches to modifying the ADS schema using a CloudConnect graph or REST API?
Any advice would be appreciated.
GoodData currently provides only a custom JDBC driver for connecting to ADS. Standard Vertica drivers cannot be used. See https://help.gooddata.com/display/doc/Data+Warehouse+Technology
It is not possible to use the JDBC driver in .NET for obvious reasons. Theoretically it could be possible to use a ODBC-JDBC gateway, but I have not tested this solution. Also there is JVM implementation for .NET, but I have not tested it also. It is http://www.ikvm.net/.
So the easiest way is really to use CloudConnect or a SQL client which supports JDBC drivers. For automating a process the easiest ways are probably Java or JRuby.

What are the requirements to setup an SFTP connection in Windows C#?

I am using VS 2012 and I would like to securely transfer files with a server after setting up an SFTP connection. I have heard setting this up in Windows is a big task compared to Linux. Can anyone tell me the exact procedure to follow.
Developments till now
As far as I know there are no .NET assemblies which let you do SFTP straight away. But we can use FtpWebRequest using the System.Net assembly.
But I wish to use SFTP. I found an application called freesshd which helps in implementing SSH server. I have also heard about
SFTP Blackbox and Rebex (both of which are paid versions).
Expected answer pattern
Step wise walkthrough from how to setup SFTP in a server to how to successfully connect with that server from a local machine.
Suggestions for tools, assemblies or 3rd party libraries which should be used to do this task preferably with link.
Detailed walkthrough on client side and server side modification to achieve this task.
I appreciate any kind of help on this one. Thanks in advance.
NOTE: This is for a Windows Form application and not a web application.
It's actually pretty easy, there's not much to it. You can download the Renci SFTP compiled client from https://sshnet.codeplex.com/releases/view/120565 (both .NET 35 and 40).
On the server side, you already mentioned freeFTPd, which will work fine for testing. Wouldn't recommend it for production level. Add a user and password and home directory. Then start the server listening.
On the client side, using Renci, create a new StfpClient object connecting to 127.0.0.1 on port 22 using the username and password. For the client to authenticate the server, you can listen to the client.HostKeyReceived event.
The only pitfall I've found with Renci is when trying to manually open the stream using FileMode.Create. I expected it to overwrite the existing file, but it always appends to the file when uploading. Internally it maintains its own flags, but those flags don't line up with the expected behavior of the System.IO.FileMode enum. I was able to truncate the file by uploading an empty MemoryStream and then use the Open method for Renci. The Delete and DeleteFile methods didn't seem to work either, which could be a bug in freeFTPd, haven't tested using another SFTP server.

Interfacing with MySQL Database via HTTP

I'm working on a project that involves linking my C# application to a MySQL server. The server is running cPanel and the company does not allow all IPs to connect to it (although you can add host IPs to the remote list). The application will be running on Windows 8 Tablets on a 3G connection, so obviously I will not know the IPs if the devices to be able to add them to the list.
A collogue of mine has told me of a system, API or something like that that will allow me to interface with the Database via HTTP effectively bypassing this restriction.
I can provide more information if needed.
Does anyone know of anything similar to this or any way around this?
If you are using MySql 5.7 or higher, there is an experimental "lab" for allowing direct access to MySQL via a REST over HTTP interface, eliminating the need for a middle-tier server or database specific drivers.
You can download the plugin at the MySql Labs site. From the dropdown, select "MySQL HTTP Plugin" and you should be good.
Keep in mind that its part of the "MySql Labs" project, which means its experimental, probably buggy and should be used at your own risk. Think twice before using them on any kind of production server.

Open data access

I am writing a plugin for an application in C#. The plugin allows me full access to the internal information model for the application.
I would like to create a mechanism to allow external applications to be able to connect to the information so they can report on it etc.
In days of old this used to be achieved via ODBC links - is that still the way to go.
I assume it's a significant task to create an ODBC driver for this, are there any easier recommendations or example C# code for cresting a driver.
Looking back I was not very clear in the original question. The requirement is to allow two applications on the same PC to share data. The "host" application use a proprietary storage format and as such access to the data cannot be achieved without using the "Host" application. The "host" applications allows the development of plugins (using C#) and the plugins have access to all of the data within the application. On that basis I was exploring whether a plugin could therefore expose an interface to an other external application and as such could act as a "Data Access Layer"
My reference to ODBC is probably a "red herring" - just shows how out of touch I am in this area.
Probably you are looking for something like Remoting and\or Web Services and\or the more modern WCF (windows communication foundation).
You can write your own services and access to that services from every language you want.
C# support for WCF and Remoting and WebServices is very good and allow you to write your server-client infrastructure in a very clean, object oriented and easy way.
Use HTTP: each services is handled in a serialized object sent in XML through an HTTP server, for example, IIS.
Clients can be written in every kind of language you want, from PHP to C# to C++ to JAVA to wathever, they need only to connect through HTTP and parse\deserialize\serialize XML.
You can choose your architecture. If both clients and servers are written in C# all is transparent to you, serialization and deserialization of XML, remote procedure call and IIS integration are all ready for you to use. You need only to write your applications.
You can export services instead of tables like a relational DBMS does, in this way you can divide the logic of your system from the data layer and the presentation layer.
In this way you can obtain scalability, multiplatform and multisystem support.
Some links to read:
http://en.wikipedia.org/wiki/Windows_Communication_Foundation
http://www.codeproject.com/KB/webservices/myservice.aspx
http://msdn.microsoft.com/en-us/library/aa730857(v=vs.80).aspx
http://msdn.microsoft.com/en-us/library/kwdt6w2k(v=vs.71).aspx
http://blogs.microsoft.co.il/blogs/bursteg/archive/2008/02/10/how-to-build-an-n-tier-application-with-wcf-and-datasets-in-visual-studio-2008.aspx
http://msmvps.com/blogs/williamryan/archive/2008/05/16/doing-tiers-with-wcf.aspx
Instead, if you are in an intranet, for example, or a single computer and you want just to share a DB service, you can just use SQLServer or MySql or PostgreSql and connect to it via TCP/IP.
Is not safe\secure however to expose a DB service on internet or in an intranet where security can be a problem.
Note also that SQLServer Express is free and may be suitable for you if you don't have much users\connections or a DB not greater than 4gb.
MySql and PostgreSql are free and open source.

Categories

Resources