Submit a Spark job from C# and get results - c#

As per title, I would like to request a calculation to a Spark cluster (local/HDInsight in Azure) and get the results back from a C# application.
I acknowledged the existence of Livy which I understand is a REST API application sitting on top of Spark to query it, and I have not found a standard C# API package. Is this the right tool for the job? Is it just missing a well known C# API?
The Spark cluster needs to access Azure Cosmos DB, therefore I need to be able to submit a job including the connector jar library (or its path on the cluster driver) in order for Spark to read data from Cosmos.

As a .NET Spark connector to query data did not seem to exist I wrote one
https://github.com/UnoSD/SparkSharp
It is just a quick implementation, but it does have also a way of querying Cosmos DB using Spark SQL
It's just a C# client for Livy but it should be more than enough.
using (var client = new HdInsightClient("clusterName", "admin", "password"))
using (var session = await client.CreateSessionAsync(config))
{
var sum = await session.ExecuteStatementAsync<int>("val res = 1 + 1\nprintln(res)");
const string sql = "SELECT id, SUM(json.total) AS total FROM cosmos GROUP BY id";
var cosmos = await session.ExecuteCosmosDbSparkSqlQueryAsync<IEnumerable<Result>>
(
"cosmosName",
"cosmosKey",
"cosmosDatabase",
"cosmosCollection",
"cosmosPreferredRegions",
sql
);
}

If your just looking for a way to query your spark cluster using SparkSql then this is a way to do it from C#:
https://github.com/Azure-Samples/hdinsight-dotnet-odbc-spark-sql/blob/master/Program.cs
The console app requires an ODBC driver installed. You can find that here:
https://www.microsoft.com/en-us/download/details.aspx?id=49883
Also the console app has a bug: add this line to the code after the part where the connection string is generated.
Immediately after this line:
connectionString = GetDefaultConnectionString();
Add this line
connectionString = connectionString + "DSN=Sample Microsoft Spark DSN";
If you change the name of the DSN when you install the spark ODBC Driver you will need to change the name in the above line then.
Since you need to access data from Cosmos DB, you could open a Jupyter Notebook on your cluster and ingest data into spark (create a permanent table of your data there) and then use this console app/your c# app to query that data.
If you have a spark job written in scala/python and need to submit it from a C# app then I guess LIVY is the best way to go. I am unsure if Mobius supports that.

Microsoft just released a dataframe based .NET support for Apache Spark via the .NET Foundation OSS. See http://dot.net/spark and http://github.com/dotnet/spark for more details. It is now available in HDInsight per default if you select the correct HDP/Spark version (currently 3.6 and 2.3, soon others as well).

UPDATE:
Long ago I said a clear no to this question.
However times has changed and Microsoft made an effort.
Pleas check out https://dotnet.microsoft.com/apps/data/spark
https://github.com/dotnet/spark
// Create a Spark session
var spark = SparkSession
.Builder()
.AppName("word_count_sample")
.GetOrCreate();
Writing spark applications in C# now is that easy!
OUTDATED:
No, C# is not the tool you should choose if you would like to work with Spark! However if you really want to do the job with it try as mentioned above Mobius
https://github.com/Microsoft/Mobius
Spark has 4 main languages and API-s for them: Scala, Java, Python, R.
If you are looking for a language in production I would not suggest the R API. The Other 3 work well.
For Cosmo DB connection I would suggest: https://github.com/Azure/azure-cosmosdb-spark

Related

How to connect C# and Cassandra with

I am new to .NET C# and Cassandra and I am not able to connect them with each other. I have searched a lot and haven't found a clear explaining on how it works.
I have downloaded Cassandra installed it with Python 2.7, I can run the server and I can run cqlsh. Then I open Visual studio, create a new .NET Core project and install package of Cassandra C# driver.
That's all, I don't know how to create a table, key-spaces from C# to Cassandra.
Can anyone give a simple explanation on how can I create a simple key-space, tables with columns to output the code, so that I can see how it works?
It's five years old, but I wrote-up an article on how to use Cassandra as a backend for a ASP.NET MVC project: http://www.aaronstechcenter.com/aspnet_mvc_cassandra.php
My Git repo for the article is still out there, too: https://github.com/aploetz/ShipCrew
The meat of it will be in the CassandraDAO.cs:
private Cluster Connect() {
string user = getAppSetting("cassandraUser");
string pwd = getAppSetting("cassandraPassword");
string[] nodes = getAppSetting("cassandraNodes").Split(',');
QueryOptions queryOptions = new QueryOptions().SetConsistencyLevel(ConsistencyLevel.One);
Cluster cluster = Cluster.Builder()
.AddContactPoints(nodes)
.WithCredentials(user, pwd)
.WithQueryOptions(queryOptions)
.Build();
return cluster;
}
I'm sure the driver versions are way out-of-date, but it should be enough to get you started.

how to set mongo local server using Mongo DB Driver .NET

I want to add an option to save data locally in my application using the mongo databse tools, I want to configure all the server information from within my application.
I have 2 questions.
the following code is working only after manual setup of mongodb localhost database in this way:
but on A computer that didn't configure the database setting, the code will not work.
my code is :
public void createDB()
{
MongoClient client = new MongoClient();
var db = client.GetDatabase("TVDB");
var coll = db.GetCollection<Media>("Movies");
Media video = new Media("", "");
video.Name = "split";
coll.InsertOne(video);
}
this code works only after manual set the database like the picture above.
without it I get in the last line A timeout exception.
how can I configure it from my application to make it work (define Server) ?
Is the user will be must install MongoDB software on his PC, or the API Package is enough in order to use the database?
Many Thanks!
By using that command you're not "configuring the database", you're running it.
If you don't want to manually run it, but want it to be always running, you should install it as a Windows Service as explained in How to run MongoDB as Windows service?.
You need to install and/or run a MongoDB server in order to use it. Using the API alone is not enough, it's not like SQLite.
The Code you are using will search for local mongodb.

ODBC connection with Universe 9.6

We have a legacy system running on Universe DBMS 9.6 . We are trying to export data from it and we have enable rpc daemon so that we can connect via odbc.
Now we are able to connect to the server but we are not able to run any queries. We are getting following error
query - SELECT * FROM DEBTOR
exception - UniVerse/SQL: syntax error. Unexpected symbol. Token was
";". Scanned command was SELECT
There are two types of database in universe ie Table and File based. But we are able to query the table based database but we cannot query File based and there are some configurations that will enable querying the File based ones. We are stuck at this place.
Using u2Client library in c# to access the db. Any help is appreciated
Code used to connect Universe
U2ConnectionStringBuilder conn_str = new U2ConnectionStringBuilder();
conn_str.UserID = "id";
conn_str.Password = "pwd";
conn_str.Server = "serverIP";
conn_str.Database = "DBNAME";
conn_str.ServerType = "UNIVERSE";
conn_str.Pooling =false;
conn_str.AccessMode = "Uci";
conn_str.RpcServiceType = "uvserver";
string s = conn_str.ToString();
U2Connection con = new U2Connection();
con.ConnectionString = s;
con.Open();
Console.WriteLine("Connected.........................");
U2Command xmd = new U2Command("SELECT * FROM TABLE_NAME", con);
var op = xmd.ExecuteReader();
Exception catches while executing the last statement
We tried a driver from Rocket software Universe side is working and it is sending the data to the client but client cant understand the protocol or some error that causes exception . We confirmed that the server responded to the query with data by checking the TCP packets. But we are out of luck.
So we decided to create our own software which is developed in Universe Pick Basic which will be connected to external system via ssh and created a new protocol that both client and server understands it. And we succeeded ,now we can export and import data to universe.
Officially Universe 9.6 isn't supported for use with the U2 Toolkit for .NET. Per the documentation (page 6):
Supported versions of UniData and UniVerse
UniData 7.1 or later
UniVerse 10.3 or later
You can still use straight ODBC or UniObjects to extract data from your database. If you're planning to use ODBC, in addition to enabling RPC, make sure you have configured your Universe account for ODBC per Rocket's ODBC documentation. Before writing C# code, I have often verified my ODBC setup using Excel's external data access tools.

Limiting records synchronized to mobile device

Similar questions have been asked before but after a day of going through the answers I'm still very confused.
I'm using Microsoft's Sync Framework with SQL2008 on the server and SQL CE on Windows Mobile devices. I would have thought this was a VERY common requirement. I don't want to replicate large tables onto the mobile device. I only want the records that are needed. For example, each user will need their "jobs" out of the jobs table. They don't need any other user's jobs. So I need something like "where jobId = 3" for one device and "where jobId=4" for another etc.
This looked promising: http://jtabadero.spaces.live.com/blog/cns!BF49A449953D0591!1203.entry
but unfortunately it doesn't work with my code. This code from the sample seems to be trying to get the code that contains the SQL:
var remoteProvider = (LocalDataCache1ServerSyncProvider)syncAgent.RemoteProvider;
var selectIncrementalInsertsCommand = remoteProvider.SalesLT_CustomerSyncAdapter.SelectIncrementalInsertsCommand;
BUT the code containing the SQL (generated by VS) is on the server-side and only a proxy is available in the client-side code. This is how the proxy is added:
// The WCF Service
var webSvcProxy = new MicronetCacheSyncService();
// The Remote Server Provider Proxy
var serverProvider = new ServerSyncProviderProxy(webSvcProxy);
// The Sync Agent
var syncAgent = new MicronetCacheSyncAgent();
syncAgent.RemoteProvider = serverProvider;
So how can I get to the server-side code that contains the sql from the client-side? Sorry I'm not explaining this very well but I guess it's unlikely anyone will have an answer. The short version is does anyone know a SIMPLE way to limit the records that are synced to a mobile device is this type of app? I think the example was meant for desktop apps.
It looks to me like this sync framework is another one of Microsoft's half-baked releases that is really just a beta. It's starting to remind me of some previous horrible experiences with Entity Framework 1.0 :(
The tutorial at http://msdn.microsoft.com/en-us/library/dd918848%28SQL.105%29.aspx contains everything you need to provision filtering for a scope.
FYI, that tutorial is for Sync Framework 2.0, whereas from your code above it appears you're using Sync Framework 1.0 -- a legacy product.

Local database, I need some examples

I'm making an app that will be installed and run on multiple computers, my target is to make an empty local database file that is installed with the app and when user uses the app his database to be filled with the data from the app .
can you provide me with the following examples :
what do I need to do so my app can connect to its local database
how to execute a query with variables from the app for example how would you add to the database the following thing
String abc = "ABC";
String BBB = "Something longer than abc";
and etc
Edit ::
I am using a "local database" created from " add > new item > Local database" so how would i connect to that ? Sorry for the dumb question .. i have never used databases in .net
Depending on the needs you could also consider Sql CE. I'm sure that if you specified the database you're thinking of using, or your requirements if you're usure you would get proper and real examples of connection strings etc.
Edit: Here's code for SqlCe / Sql Compact
public void ConnectListAndSaveSQLCompactExample()
{
// Create a connection to the file datafile.sdf in the program folder
string dbfile = new System.IO.FileInfo(System.Reflection.Assembly.GetExecutingAssembly().Location).DirectoryName + "\\datafile.sdf";
SqlCeConnection connection = new SqlCeConnection("datasource=" + dbfile);
// Read all rows from the table test_table into a dataset (note, the adapter automatically opens the connection)
SqlCeDataAdapter adapter = new SqlCeDataAdapter("select * from test_table", connection);
DataSet data = new DataSet();
adapter.Fill(data);
// Add a row to the test_table (assume that table consists of a text column)
data.Tables[0].Rows.Add(new object[] { "New row added by code" });
// Save data back to the databasefile
adapter.Update(data);
// Close
connection.Close();
}
Remember to add a reference to System.Data.SqlServerCe
I'm not seeing anybody suggesting SQL Compact; it's similar to SQLite in that it doesn't require installation and tailors to the low-end database. It grew out of SQL Mobile and as such has a small footprint and limited feature-set, but if you're familiar with Microsoft's SQL offerings it should have some familiarity.
SQL Express is another option, but be aware that it requires a standalone installation and is a bit beefier than you might need for an applciation's local cache. That said it's also quite a bit more powerful than SQL Compact or SQLite.
Seems like you're:
-Making a C# app that will be installed and run on multiple
computers
-That needs a local database (I'm assuming an RDBMS)
-You need to generate a blank database at installation
-You then need to be able to connect to the database and populate it when
the app runs.
In general, it seems that you need to read up on using a small database engine for applications. I'd start by checking out SQLite, especially if you need multi-OS capability (eg., your C# program will run on Microsoft's .NET Framework and Novell's Mono). There are C# wrappers for accessing the SQLite database.
I believe this question is about the "Local Database" item template in Visual Studio:
What are you considering as a database? From what little you've provided in your question, I'd suggest SQLite.
You can get sample code from their site Sqlite.NET
Not sure I fully understand what you're asking but Sqlite is a good option for lightweight, locally deployed database persistence. Have a look here:
http://www.sqlite.org/
and here for an ADO.NET provider..
http://sqlite.phxsoftware.com/
For 1)
The easiest way to provide this functionality is through SQL Server Express User Instances. SQL Server Express is free, so your user does not have to pay additional license for SQL Server, and the User Instance is file-based, which suits your requirement.
For 2)
This is a big topic. You may want to go through some of the tutorials from Microsoft to get the feeling of how to connect to DB, etc.

Categories

Resources