Serialize parquet data with C# - c#

Is there a way to serialize data in Apache Parquet format using C#, I can't find any implementation of that. In the oficial Parquet docs it is said that "Thrift can be also code-genned into any other thrift-supported language." but I'm not sure what this actually means.
Thanks

I have started an opensource project for .NET implementation of Apache Parquet, so anyone is welcome to join. https://github.com/aloneguid/parquet-dotnet

We've just open sourced our .NET wrapper around Apache Parquet C++. It's a different approach compared to Parquet.NET, the latter being a pure .NET implementation.
You're welcome to give it a try and share your feedback:
https://github.com/G-Research/ParquetSharp

Here is another one to the list. Cinchoo ETL - an open source library, can do parquet files read and write.
Method 1: POCO Method
Define POCO class
public partial class Employee
{
public int Id { get; set; }
public string Name { get; set; }
}
Serialization code
List<EmployeeRecSimple> objs = new List<EmployeeRecSimple>();
Employee rec1 = new Employee();
rec1.Id = 1;
rec1.Name = "Mark";
objs.Add(rec1);
Employee rec2 = new Employee();
rec2.Id = 2;
rec2.Name = "Jason";
objs.Add(rec2);
using (var parser = new ChoParquetWriter<Employee>("emp.parquet"))
{
parser.Write(objs);
}
Method 2: Dynamic Method
List<ExpandoObject> objs = new List<ExpandoObject>();
dynamic rec1 = new ExpandoObject();
rec1.Id = 1;
rec1.Name = "Mark";
objs.Add(rec1);
dynamic rec2 = new ExpandoObject();
rec2.Id = 2;
rec2.Name = "Jason";
objs.Add(rec2);
using (var parser = new ChoParquetWriter("emp.parquet"))
{
parser.Write(objs);
}
Disclaimer: I'm author of this library

No there isn't. I've spent a week trying to write my own parquet writer for .NET and it's just too complicated i.e. needs much more time. I ended up using Python and fastparquet library to do any processing outside of Hadoop clusters. I must say fastparquet is an amazing piece of work and very easy to work with but there is a lot of functionality missing i.e. nested columns and ability to effectively append to the file. Not mentioning dependency on Python3 which can be a headache to deploy.
You can generate Thrift protocols into C# but that doesn't get you far, it just means your output will be compatible with Parquet specification.
I'm still keen to create an opensource Parquet library for .NET Core/.NET 4.5 so if anyone is keen to cooperate please let me know.

Related

What is the best way to load a model in Tensorflow.NET

I saved a tensorflow.keras model in python and need to use in in C# / Tensorflow.NET 0.15
var net = tf.keras.models.load_model(net_name) does not seem to be implemented
var session = tf.Session.LoadFromSavedModel(net_name);
var graph = sess.graph;
seems to work but I have then a session / graph not a keras model
I would ideally like to call something like net.predict(x), how can I get there from a graph/session ?
Yes, i Did. The best way is to convert you package to the ONNX format. ONNX is a open source format that is supposed to run on any framework (tensorflow, torch...)
In python, add the package onnx and keras2onnx:
import onnx
import keras2onnx
import onnxruntime
net_onnx = keras2onnx.convert_keras(net_keras)
onnx.save_model(net_onnx, onnx_name)
Then in C# .NET, install the nuget Microsoft.ML.
var context = new MLContext();
var session = new InferenceSession(filename);
float[] sample;
int[] dims = new int[] { 1, sample_size};
var tensor = new DenseTensor<float>(sample,dims);
var xs = new List<NamedOnnxValue>()
{
NamedOnnxValue.CreateFromTensor<float>("dense_input", tensor),
};
using (var results = session.Run(xs))
{
// manipulate the results
}
Note that you need to call explicitly the fist layer or the input layer of your network to pass on the sample. best is to give it a nice name in Keras. You can check the name in python by running net_keras.summary()

The right way to send generic data types with protobuf3 in C#/.NET

I'm developing an application using a plugins architecture and I want to send objects between client and server without knowing the type of the object being sent.
Is there a way to send generic data type ?
According to Microsoft pages, the Any field could be an answer to this problem, instead of using a string and a custom serialization/deserialization implementation to send these objects. However, I didn't find the provided c# examples understandable. I tried to solve the problem this way:
ClassTest myClassTest = new ClassTest();
Any packToSend = Any.Pack(myClassTest);
return Task.FromResult(new UnknownTEST
{
Pathm = hai
}); ;
But it seems that I need to implement the IMessage interface in my class and I don't know how to do this.
If anyone could provide a basic example to help me understand how to do this, that would be great.
Thanks !
You need to create protobuf messages which represent the data you're sending. You don't need to create your own classes as you did with your "ClassTest" class.
Here's an example:
point.proto:
syntax = "proto3";
option csharp_namespace = "MyProject.Namespace";
message Point {
int32 x = 1;
int32 y = 2;
}
generic_dto_message.proto:
syntax = "proto3";
import "google/protobuf/any.proto";
option csharp_namespace = "MyProject.Namespace";
message GenericDtoMessage {
google.protobuf.Any data = 1;
}
C# code:
// packing
var point = new Point
{
X = 1,
Y = 22
};
var genericDtoMessage = new GenericDtoMessage();
genericDtoMessage.Data = Any.Pack(point);
// unpacking
var unpackedData = genericDtoMessage.Data.Unpack<Point>();
Console.WriteLine($"X: {unpackedData.X}{Environment.NewLine}Y: {unpackedData.Y}");
Console.WriteLine($"Press any key to continue...");
Console.ReadKey();
In case you are using Grpc.Tools NuGet package to generate C# code for the above written .proto files, don't forget to add this ItemGroup section to your .csproj file:
<ItemGroup>
<Protobuf Include="point.proto" Link="point.proto" />
<Protobuf Include="generic_dto_message.proto" Link="generic_dto_message.proto" />
</ItemGroup>
Hope it helps!

parsing and display xml data in asp.net core

Im still working on my first asp.net core project and now I want to display "a qoute of the day".
I have the qoutes in a xml file stored in a folder called File under wwwroot.
Im planning on my making this a View Component.
Im used to working with web forms so it seems like Im spending alot of time on small issues, but I guess its the only way to learn.
I've created a folder named Custom where I plan to hold all my custom classes. the QuoteController.cs is located in the Controllers folder.
So yeah, I think I know how to crate the View Component. "I think" is an important factor here.
Im also used to using XmlDocument, so Im trying my best to get XmlReader to work. But any hint or tips would be highly appreciated.
This is what I got so far. QuoteController.cs
public class QuoteController : Controller
{
public Custom.Quote Index()
{
Custom.Quote result = new Custom.Quote();
XmlReader rdr = XmlReader.Create(#"\File\qoutes.xml");
Random rnd = new Random(DateTime.Now.Millisecond);
int tmp = rdr.AttributeCount;
int count = rnd.Next(0, tmp);
int i = 0;
while (rdr.Read())
{
if (count.Equals(i))
{
result = new Custom.Quote(rdr.GetAttribute("q"), rdr.GetAttribute("author"));
break;
}
i++;
}
rdr.Dispose();
rdr = null;
rnd = null;
return result;
}
}
I guess the next step will be to add some visuals, but I cant imagine that my code actully works. Does anybody know how to easily parse through and xml file i CORE? Should I go for async?
I guess it doesnt matter, but the xml file is formated like:
<quotes>
<q>Be Strong</b>
<author>Stein The Ruler</author>
</quotes>
Again, I will be very happy if you take the time to look at this :)
Thank you!
My way to implement this:
1)convert the xmldocument to look like this
<quotes>
<quote Content="Be Strong" Author="Stein..."/>
</quotes>
2) Fix the Custom.Quote object to contain these 2 (public getters, setters string) fields: Content and Author,
and then,3) use this code to turn the xml to a list:
XDocument quotesDoc = XDocument.Parse('your path');
List<Custom.Quote> quotes = quotesDoc.Root
.Elements("quote")
.Select(x => new Speaker
{
Content= (string)x.Attribute("Content"),
Author = (string)x.Attribute("Author")
})
.ToList<Custom.Quote>();
Hope this helps!

How to cache reading .csv files in C#

This may be a noob question, but I need some help. I have written two simple methods in C#: ReadCsv_IT and GetTranslation. The ReadCsv_IT method reads from a csv file. The GetTransaltion method calls the ReadCsv_IT method and returns the translated input (string key).
My problem is that in the future I will have to request a lot of times GetTranslation, but I obviously don't want to read the .csv files every time. So I was thinking about ways to use cache Memory to optimize my program, so that I don't have to read the .csv file on every request. But I am not sure how to do it and what I could do to optimize my program. Can anyone please help ?
public string ReadCsv_IT(string key)
{
string newKey = "";
using (var streamReader = new StreamReader(#"MyResource.csv"))
{
CsvReader csv = new CsvReader(streamReader);
csv.Configuration.Delimiter = ";";
List<DataRecord> rec = csv.GetRecords<DataRecord>().ToList();
DataRecord record = rec.FirstOrDefault(a => a.ORIGTITLE == key);
if (record != null)
{
//DOES THE LOCALIZATION with the help of the .csv file.
}
}
return newKey;
}
Here is the GetTranslation Method:
public string GetTranslation(string key, string culture = null)
{
string result = "";
if (culture == null)
{
culture = Thread.CurrentThread.CurrentCulture.Name;
}
if (culture == "it-IT")
{
result = ReadCsv_IT(key);
}
return result;
}
Here is also the class DataRecord.
class DataRecord
{
public string ORIGTITLE { get; set; }
public string REPLACETITLE { get; set; }
public string ORIGTOOLTIP { get; set; }
}
}
Two options IMO:
Turn your stream into an object?
In other words:
Make a class stream so you can refer to that object of the class stream.
Second:
Initialize your stream in the scope that calls for GetTranslation, and pass it on as an attribute to GetTranslation and ReadCSV_IT.
Brecht C and Thom Hubers have already given you good advice. I would like to add one more point, though: using csv files for localization in .NET is not really a good idea. Microsoft recommends using a resource-based approach (this article is a good starting point). It seems to me that you are trying to write code for something that is already built into .NET.
From a translation point of view csv files are not the best possible format either. First of all, they are not really standardized: many tools have slightly different ways to handle commas, quotes, and line breaks that are part of the translated text. Besides, translators will be tempted to open them in Excel, and -unless handled with caution- Excel will write out translations in whatever encoding it deems best.
If the project you are working on is for learning please feel free to go ahead with it, but if you are developing software that will be used by customers, updated, translated into several target languages, and redeployed, I would recommend to reconsider your internationalization approach.
#Brecht C is right, use that answer to start. When a variable has to be cached to be used by multiple threads or instances: take a look at InMemoryCache or Redis when perfomance and distribution over several clients gets an issue.

Api for working with a classes as OOP?

I'm writing a 3rd party app that needs to read in .cs files and be able to manipulate classes, then ultimately save back to file.
The type of code I am looking at would be something like:
var classManager = new classManager();
var classes = classManager.LoadFromFile(filePath);
var class = classes[0]; // Just illustrating more than 1 class can exist in a file
var prop = new ClassProperty {Type=MyType.GetType() };
prop.AddGet("return x+y < 50");
//stuff like prop.ReadOnly = true;
class.AddProperty(prop);
var method = new ClassMethod {signature="int id, string name"};
method.MethodBody = GetMethodBodyAsString(); //not writing out an entire method body here
class.AddMethod(method);
class.SaveToFile(true); //Format code
Does such a library exist?
The .NET Compiler Platform Roslyn is what you're looking for. It supports parsing and editting cs files. Check out this post for an example

Categories

Resources