How to read numpy .npz file directly into C#

How to read numpy .npz file directly into C# - c#

I have a load of data files in numpy .npz format written from python.
I want to read them directly into C# for a few reasons.
The data files contain a number of 1D arrays of different types - some will by byte arrays, and other double arrays.
Can anyone give me some advice on how to achieve this? Or otherwise what I might be doing wrong below?
I have tried using Accord.NET.NPZFormat but can't figure out how to make it work. I think probably because you have to give it a type to return, and because the arrays are of different types it fails.
Here is a link to it:
http://accord-framework.net/docs/html/M_Accord_IO_NpzFormat_Load__1.htm
I am struggling with syntax here, unsure of what to use as "T". The closest I have got is with the following, but doesn't seem to have any data in the result. Accord.IO has no example code.
public static void LoadNPZ(string zip_file, string npz_file)
{
byte[] ret = new byte[0];
using (ZipArchive zip = ZipFile.OpenRead(zip_file))
{
foreach (ZipArchiveEntry entry in zip.Entries)
{
if (entry.Name == npz_file + ".npz")
{
Stream fs = entry.Open();
ret = new byte[fs.Length];
fs.Read(ret, 0, (int)fs.Length);
}
}
}
if (ret.Length==0)
{
return;
}
var ret2 = NpzFormat.Load<object[]>(ret);
};

You can use the NumSharp library.
Let say you have this data created in Python.
import numpy as np
arr = np.array([1,2,3,4])
single = arr.astype(np.single)
double = arr.astype(np.double)
np.savez('single.npz', data=single)
np.savez('double.npz', data=double)
The C# code to read them is below.
using NumSharp;
var singleContent = np.Load_Npz<float[]>("single.npz"); // type is NpzDictionary
var singleArray = singleContent["data.npy"]; // type is float[]
var doubleContent = np.Load_Npz<double[]>("double.npz"); // type is NpzDictionary
var doubleArray = doubleContent["data.npy"]; // type is double[]
If you don't specify name for your array then the default name is arr_0, and the C# code would be like this.
var singleArray = singleContent["arr_0.npy"];
var doubleArray = doubleContent["arr_0.npy"];
Note that NumSharp has the following limitation.
The size of each dimension must be smaller than 2,147,483,591 bytes. Example: for integer (4 bytes), each dimension must have less than 536,870,898 elements.
If you are using .NET Framework, the maximum array size is 2GB (all dimensions considered). On 64-bit platform this limit can be avoided by enabling the gcAllowVeryLargeObjects flag.
More information can be found in this answer and this blog post (Disclaimer: I'm the author of both of them).

I work with C# and python quite a bit, and my reccomendation is to create a COM Server
http://timgolden.me.uk/pywin32-docs/html/com/win32com/HTML/QuickStartServerCom.html
then in python you could simply have something like
import numpy as np
class NPtoCSharp:
_reg_clsid_ = "{7CC9F362-486D-11D1-BB48-0000E838A65F}"
_public_methods_ = ['load_file']
_public_attrs_ = ['arr', 'the_file']
_reg_desc_ = "Python NPZ Loader"
_reg_progid_ = "NPtoCSharp"
def __init__(self):
self.arr = None
self.the_file = None
def load_file(self):
self.arr = np.load(self.the_file)
return self.arr
Then in C#
public void init_python()
{
Type NPtoCSharp = Type.GetTypeFromProgID("NPtoCSharp");
NPtoCSharpInst = Activator.CreateInstance(NPtoCSharp);
NPtoCSharpInst.the_file = 'myfile.npz';
}
Not complete but I hope you get the idea.

Related

What is the .NET System.Numerics.BigInteger Equivalent of Org.BouncyCastle.Math.BigInteger.ToByteArrayUnsigned?

I am currently working with the .NET port of BouncyCastle and I am having some trouble converting a big integer into a System.Guid using the native .NET BigInteger.
For some context, I am using BouncyCastle in one ("source") application to convert a System.Guid to a Org.BouncyCastle.Math.BigInteger. This value is then saved as a string in the format 3A2B847A960F0E4A8D49BD62DDB6EB38.
Then, in another ("destination") application, I am taking this saved string value of a BigInteger and am trying to convert it back into a System.Guid.
Please note that in the destination application I do not want BouncyCastle as a reference and would like to use core .NET libraries for the conversion. However, since I am running into problems converting with core .NET classes, I am using BouncyCastle and the following code does exactly what I would like:
var data = "3A2B847A960F0E4A8D49BD62DDB6EB38";
var integer = new Org.BouncyCastle.Math.BigInteger( data, 16 );
var bytes = integer.ToByteArrayUnsigned();
var guid = new Guid( bytes ); // holds expected value: (7A842B3A-0F96-4A0E-8D49-BD62DDB6EB38)
As you can see, there is a ToByteArrayUnsigned method on the Org.BouncyCastle.Math.BigInteger that makes this work. If I use the ToByteArray on the System.Numerics.BigInteger (even when resizing the array as discussed in this question) it does not work and I get a different System.Guid than expected.
So, what is the best way to perform the equivalent to the above operation using native .NET classes?
Solution
Thanks to #John-Tasler's suggestion, it turns out this was due to endianess... darn you endianess... will your endiness ever end? :P
var parsed = System.Numerics.BigInteger.Parse( "3A2B847A960F0E4A8D49BD62DDB6EB38", NumberStyles.HexNumber ).ToByteArray();
Array.Resize( ref parsed, 16 );
var guid = new Guid( parsed.Reverse().ToArray() ); // Hoorrrrayyyy!

What's the actual value of the resulting Guid when using .NET's BigIntegrer?
It could be that the two implementations are just storing the bytes differently. Endianess, etc.
To see what comes back from each implementation's array of bytes:
var sb = new StringBuilder();
foreach (var b in bytes)
{
sb.AppendFormat("{0:X2} ", b);
}
sb.ToString();
It'd be an interesting comparison.

I would avoid BigInteger here entirely. Your data bytes are already in the correct order so you can convert to byte[] directly.
var data = "3A2B847A960F0E4A8D49BD62DDB6EB38";
var bytes = new byte[16];
for (var i = 0; i < 16; i++)
bytes[i] = byte.Parse(data.Substring(i * 2, 2), NumberStyles.HexNumber);
var guid = new Guid(bytes); // 7a842b3a-0f96-4a0e-8d49-bd62ddb6eb38

How to write numbers to a file and make them readable between Java and C#

I'm into a "compatibility" issue between two versions of the same program, the first one written in Java, the second it's a port in C#.
My goal is to write some data to a file (for example, in Java), like a sequence of numbers, then to have the ability to read it in C#. Obviously, the operation should work in the reversed order.
For example, I want to write 3 numbers in sequence, represented with the following schema:
first number as one 'byte' (4 bit)
second number as one 'integer' (32 bit)
third number as one 'integer' (32 bit)
So, I can put on a new file the following sequence: 2 (as byte), 120 (as int32), 180 (as int32)
In Java, the writing procedure is more or less this one:
FileOutputStream outputStream;
byte[] byteToWrite;
// ... initialization....
// first byte
outputStream.write(first_byte);
// integers
byteToWrite = ByteBuffer.allocate(4).putInt(first_integer).array();
outputStream.write(byteToWrite);
byteToWrite = ByteBuffer.allocate(4).putInt(second_integer).array();
outputStream.write(byteToWrite);
outputStream.close();
While the reading part it's the following:
FileInputStream inputStream;
ByteBuffer byteToRead;
// ... initialization....
// first byte
first_byte = inputStream.read();
// integers
byteToRead = ByteBuffer.allocate(4);
inputStream.read(byteToRead.array());
first_integer = byteToRead.getInt();
byteToRead = ByteBuffer.allocate(4);
inputStream.read(byteToRead.array());
second_integer = byteToRead.getInt();
inputStream.close();
C# code is the following. Writing:
FileStream fs;
byte[] byteToWrite;
// ... initialization....
// first byte
byteToWrite = new byte[1];
byteToWrite[0] = first_byte;
fs.Write(byteToWrite, 0, byteToWrite.Length);
// integers
byteToWrite = BitConverter.GetBytes(first_integer);
fs.Write(byteToWrite, 0, byteToWrite.Length);
byteToWrite = BitConverter.GetBytes(second_integer);
fs.Write(byteToWrite, 0, byteToWrite.Length);
Reading:
FileStream fs;
byte[] byteToWrite;
// ... initialization....
// first byte
byte[] firstByteBuff = new byte[1];
fs.Read(firstByteBuff, 0, firstByteBuff.Length);
first_byte = firstByteBuff[0];
// integers
byteToRead = new byte[4 * 2];
fs.Read(byteToRead, 0, byteToRead.Length);
first_integer = BitConverter.ToInt32(byteToRead, 0);
second_integer = BitConverter.ToInt32(byteToRead, 4);
Please note that both the procedures works when the same Java/C# version of the program writes and reads the file. The problem is when I try to read a file written by the Java program from the C# version and viceversa. Readed integers are always "strange" numbers (like -1451020...).
There's surely a compatibility issue regarding the way Java stores and reads 32bit integer values (always signed, right?), in contrast to C#. How to handle this?

It's just an endian-ness issue. You can use my MiscUtil library to read big-endian data from .NET.
However, I would strongly advise a simpler approach to both your Java and your .NET:
In Java, use DataInputStream and DataOutputStream. There's no need to get complicated with ByteBuffer etc.
In .NET, use EndianBinaryReader from MiscUtil, which extends BinaryReader (and likewise EndianBinaryWriter for BinaryWriter)
Alternatively, consider just using text instead.

I'd consider using a standard format like XML or JSON to store your data. Then you can use standard serializers in both Java and C# to read/write the file. This sort of approach lets you easily name the data fields, read it from many languages, be easily understandable if someone opens the file in a text editor, and more easily add data to be serialized.
E.g. you can read/write JSON with Gson in Java and Json.NET in C#. The class might look like this in C#:
public class MyData
{
public byte FirstValue { get; set; }
public int SecondValue { get; set; }
public int ThirdValue { get; set; }
}
// serialize to string example
var myData = new MyData { FirstValue = 2, SecondValue = 5, ThirdValue = -1 };
string serialized = JsonConvert.SerializeObject(myData);
It would serialize to
{"FirstValue":2,"SecondValue":5,"ThirdValue":-1}
The Java would, similarly, be quite simple. You can find examples of how to read/write files in each library.
Or if an array would be a better model for your data:
string serialized = JsonConvert.SerializeObject(new[] { 2, 5, -1 }); // [2,5,-1]

Can't use the .length property with IronJS and ArrayObject

I am using IronJs's latest version (0.2.0.1) and my js scripts do not properly retrieve the length of an array that has been set to the js engine using an IronJs.Runtime.ArrayObject. However, my variable is well recognized as an array, as shown in C# code below.
var jsCode = #"myArray.length;";
var javascriptEngine = new IronJS.Hosting.CSharp.Context();
var array = new ArrayObject(javascriptEngine.Environment, 2);//array of size 2
array.Put(0, 12.0);//mock values
array.Put(1, 45.1);
javascriptEngine.SetGlobal<ArrayObject>("myArray", array);
var result = javascriptEngine.Execute(jsCode);
Console.WriteLine(result);
var jsCode2 = #"myArray instanceof Array;";
var result2 = javascriptEngine.Execute<bool>(jsCode2);
Console.WriteLine(result2);
We get the following output
undefined
True

This is a bug in IronJS Runtime. You should open an issue in the appropriate GitHub repository : https://github.com/fholm/IronJS/
A workaround is to force a reallocation of the whole array. In that case, the .length property seems to be correctly set. A hackish way to accomplish that is to create a smaller than needed ArrayObject (e.g. a 0-sized ArrayObject), then put some values in it. The following test passes :
[TestMethod]
public void TestWithZeroSizedArray()
{
string jsCode = #"myArray.length;";
var javascriptEngine = new IronJS.Hosting.CSharp.Context();
var array = new ArrayObject(javascriptEngine.Environment, 0); // Creates a 0-sized Array
array.Put(0, 12.0);
array.Put(1, 45.1);
javascriptEngine.SetGlobal<ArrayObject>("myArray", array);
var result = javascriptEngine.Execute(jsCode);
Assert.AreEqual(2, result);
}
Keep in mind that the multiple copy/reallocations of the underlying .NET arrays will lead to performance issues.

C# & MATLAB interoperability for non-matrix datatypes

I'm writing a C# program that needs to call MATLAB processing routines. I've been looking at MATLAB's COM interface. Unfortunately, the COM interface appears to be rather limited in terms of the types of data that can be exchanged. Matrices and character arrays are supported, but there doesn't seem to be support for exchanging struct data or cell arrays between C# and MATLAB using the COM interface. For example, in the following code (assuming that a DICOM image named IM000000 is present in the appropriate file folder), the MATLAB variables 'img' and 'header' are a 256x256 int16 matrix and a struct, respectively. The GetWorkspaceData call works fine for 'img', but returns null for 'header' because 'header' is a struct.
public class MatlabDataBridge
{
MLApp.MLAppClass matlab;
public MatlabDataBridge()
{
matlab = new MLApp.MLAppClass();
}
public void ExchangeData()
{
matlab.Execute(#"cd 'F:\Research Data\'");
matlab.Execute(#"img = dicomread('IM000000');");
matlab.Execute(#"header = dicominfo('IM000000');");
matlab.GetWorkspaceData(#"img", "base", out theImg); // correctly returns a 2D array
matlab.GetWorkspaceData(#"header", "base", out theHeader); // fails, theHeader is still null
}
}
Is there a suitable workaround for marshalling struct data to/from MATLAB using the COM interface? If not, is this functionality well supported by the MATLAB Builder NE add-on?

I ended up using the MATLAB Builder NE add-on to solve the problem. The code ends up looking something like this:
using MathWorks.MATLAB.NET.Arrays;
using MathWorks.MATLAB.NET.Utility;
using MyCompiledMatlabPackage; // wrapper class named MyMatlabWrapper is here
...
matlab = new MyMatlabWrapper();
MWStructArray foo = new MWStructArray(1, 1, new string[] { "field1", "field2" });
foo["field1", 1] = "some data";
foo["field2", 1] = 5.7389;
MWCellArray bar = new MWCellArray(1, 3);
bar[1, 1] = foo;
bar[1, 2] = "The quick brown fox jumped over the lazy dog.";
bar[1, 3] = 7.9;
MWArray result[];
result = matlab.MyFunction(foo, bar);
// Test the result to figure out what kind of data it is and then cast
// it to the appropriate MWArray subclass to extract and use the data

Consider having a look to LabSharp (a wrapper around the Matlab engine API). You can then exchange structure like this:
var engine = Engine.Open(false);
var array = MxArray.CreateStruct();
array.SetField("MyField1", "toto");
array.SetField("MyField2", 12.67);
engine.SetVariable("val", array);
NB: This LGPL wrapper is not mine, please have a look to its API for more details.

turn javascript array into c# array

Hey. I have this javascript file that I'm getting off the web and it consists of basically several large javascript arrays. Since I'm a .net developer I'd like for this array to be accessible through c# so I'm wondering if there are any codeplex contributions or any other methods that I could use to turn the javascript array into a c# array that I could work with from my c# code.
like:
var roomarray = new Array(194);
var modulearray = new Array(2055);
var progarray = new Array(160);
var staffarray = new Array(3040);
var studsetarray = new Array(3221);
function PopulateFilter(strZoneOrDept, cbxFilter) {
var deptarray = new Array(111);
for (var i=0; i<deptarray.length; i++) {
deptarray[i] = new Array(1);
}
deptarray[0] [0] = "a/MPG - Master of Public Governance";
deptarray[0] [1] = "a/MPG - Master of Public Governance";
deptarray[1] [0] = "a/MBA_Flex MBA 1";
deptarray[1] [1] = "a/MBA_Flex MBA 1";
deptarray[2] [0] = "a/MBA_Flex MBA 2";
deptarray[2] [1] = "a/MBA_Flex MBA 2";
deptarray[3] [0] = "a/cand.oecon";
deptarray[3] [1] = "a/cand.oecon";
and so forth
This is what I'm thinking after overlooking the suggestions:
Retrieve the javascript file in my c# code by making an httprequest for it
paste it together with some code i made myself
from c# call an execute on a javascript function selfmade function that will turn the javascript array into json (with help from json.org/json2.js), and output it to a new file
retrieve the new file in c# parsing the json with the DataContractJsonSerializer resulting hopefully resulting in a c# array
does it sound doable to you guys?

I'm not in front of a computer with c# right now so I'm not able to fully try this.
What you're going to need to do #Jakob is the following:
Write a parser that will download the file and store it in memory.
For each section that you want to "parse" into a c# array (for example zonearray), you need to setup bounds to begin searching and end searching the file. Example: We know that zonearray starts building the array the two lines after zonearray[i] = new Array(1); and ends on zonearray.sort().
So with these bounds we can then zip through each line between and parse a C# array. This is simple enough I think that you can figure out. You'll need to keep track of sub-index as well remember.
Repeat this 2-3 for each array you want to parse (zonearray, roomarray..etc).
If you can't quite figure out how to code the bounds or how to parse the line and dump them into arrays, I might be able to write something tomorrow (even though it's a holiday here in Canada).
EDIT: It should be noted that you can't use some JSON parser for this; you have to write your own. It's not really that difficult to do, you just need to break it into small steps (first figure out how to zip through each line and find the right "bounds").
HTH
EDIT: I just spent ~20 minutes writing this up for you. It should parse the file and load each array into a List<string[]>. I've heavily commented it so you can see what's going on. If you have any questions, don't hesitate to ask. Cheers!
private class SearchBound
{
public string ArrayName { get; set; }
public int SubArrayLength { get; set; }
public string StartBound { get; set; }
public int StartOffset { get; set; }
public string EndBound { get; set; }
}
public static void Main(string[] args)
{
//
// NOTE: I used FireFox to determine the encoding that was used.
//
List<string> lines = new List<string>();
// Step 1 - Download the file and dump all the lines of the file to the list.
var request = WebRequest.Create("http://skema.ku.dk/life1011/js/filter.js");
using (var response = request.GetResponse())
using(var stream = response.GetResponseStream())
using(var reader = new StreamReader(stream, Encoding.GetEncoding("ISO-8859-1")))
{
string line = null;
while ((line = reader.ReadLine()) != null)
{
lines.Add(line.Trim());
}
Console.WriteLine("Download Complete.");
}
var deptArrayBounds = new SearchBound
{
ArrayName = "deptarray", // The name of the JS array.
SubArrayLength = 2, // In the JS, the sub array is defined as "new Array(X)" and should always be X+1 here.
StartBound = "deptarray[i] = new Array(1);",// The line that should *start* searching for the array values.
StartOffset = 1, // The StartBound + some number line to start searching the array values.
// For example: the next line might be a '}' so we'd want to skip that line.
EndBound = "deptarray.sort();" // The line to stop searching.
};
var zoneArrayBounds = new SearchBound
{
ArrayName = "zonearray",
SubArrayLength = 2,
StartBound = "zonearray[i] = new Array(1);",
StartOffset = 1,
EndBound = "zonearray.sort();"
};
var staffArrayBounds = new SearchBound
{
ArrayName = "staffarray",
SubArrayLength = 3,
StartBound = "staffarray[i] = new Array(2);",
StartOffset = 1,
EndBound = "staffarray.sort();"
};
List<string[]> deptArray = GetArrayValues(lines, deptArrayBounds);
List<string[]> zoneArray = GetArrayValues(lines, zoneArrayBounds);
List<string[]> staffArray = GetArrayValues(lines, staffArrayBounds);
// ... and so on ...
// You can then use deptArray, zoneArray etc where you want...
Console.WriteLine("Depts: " + deptArray.Count);
Console.WriteLine("Zones: " + zoneArray.Count);
Console.WriteLine("Staff: " + staffArray.Count);
Console.ReadKey();
}
private static List<string[]> GetArrayValues(List<string> lines, SearchBound bound)
{
List<string[]> values = new List<string[]>();
// Get the enumerator for the lines.
var enumerator = lines.GetEnumerator();
string line = null;
// Step 1 - Find the starting bound line.
while (enumerator.MoveNext() && (line = enumerator.Current) != bound.StartBound)
{
// Continue looping until we've found the start bound.
}
// Step 2 - Skip to the right offset (maybe skip a line that has a '}' ).
for (int i = 0; i <= bound.StartOffset; i++)
{
enumerator.MoveNext();
}
// Step 3 - Read each line of the array.
while ((line = enumerator.Current) != bound.EndBound)
{
string[] subArray = new string[bound.SubArrayLength];
// Read each sub-array value.
for (int i = 0; i < bound.SubArrayLength; i++)
{
// Matches everything that is between an equal sign then the value
// wrapped in quotes ending with a semi-colon.
var m = Regex.Matches(line, "^(.* = \")(.*)(\";)$");
// Get the matched value.
subArray[i] = m[0].Groups[2].Value;
// Move to the next sub-item if not the last sub-item.
if (i < bound.SubArrayLength - 1)
{
enumerator.MoveNext();
line = enumerator.Current;
}
}
// Add the sub-array to the list of values.
values.Add(subArray);
// Move to the next line.
if (!enumerator.MoveNext())
{
break;
}
}
return values;
}

If I understand your question right, you are asking whether you can execute JavaScript code from C#, and then pass the result (which in your example would be a JavaScript Array object) into C# code.
The answer is: Of course it’s theoretically possible, but you would need to have an actual JavaScript interpreter to execute the JavaScript. You’ll have to find one or write your own, but given that JavaScript is a full-blown programming language, and writing interpreters for such a large and full-featured programming language is quite an undertaking, I suspect that you won’t find a complete ready-made solution, nor will you be able to write one unless your dedication exceeds that of all other die-hard C#-and-JavaScript fans worldwide.
However, with a bit of trickery, you might be able to coerce an existing JavaScript interpreter to do what you want. For obvious reasons, all browsers have such an interpreter, including Internet Explorer, which you can access using the WinForms WebBrowser control. Thus, you could try the following:
Have your C# code generate an HTML file containing the JavaScript you downloaded plus some JavaScript that turns it into JSON (you appear to have already found something that does this) and outputs it in the browser.
Open that HTML file in the WebBrowser control, have it execute the JavaScript, and then read the contents of the website back, now that it contains the result of the executed JavaScript.
Turn the JSON into a C# array using DataContractJsonSerializer as you suggested.
This is a pretty roundabout way to do it, but it is the best I can think of.
I have to wonder, though, why you are retrieving a JavaScript file from the web in the first place. What generates this JavaScript file? Whatever generates it, surely could generate some properly readable stuff instead (e.g. an XML file)? If it is not generated but written by humans, then why is it written in JavaScript instead of XML, CSV, or some other data format? Hopefully with these thoughts you might be able to find a solution that doesn’t require JavaScript trickery like the above.

Easiest solution is to just execute the Javascript function that makes the array. Include there a function that makes it an JSON (http://www.json.org/js.html). After that make a XMLHttpRequest (AJAX) to the server and from there extract the JSON to a custom class.
If I may use jQuery, here's an example of the needed Javascript:
var myJSONText = JSON.stringify(deptarray);
(function($){
$.ajax({
type: "POST",
url: "some.aspx",
data: myJSONText,
success: function(msg){
alert( "Data Saved: " + msg );
}
});
})(jQuery);
Only now need some code to rip the JSON string to an C# Array.
EDIT:
After looking around a bit, I found Json.NET: http://json.codeplex.com/
There are also a lot of the same questions on Stackoverflow that ask the same.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.