I'm looking to transform my in memory Plain old C# classes into a neo4j database.
(Class types are node types and derive from, nodes have a List for "linkedTo")
Rather than write a long series of cypher queries to create nodes and properties then link them with relationships I am wondering if there is anything more clever I can do.
For example can I serialize them to json and then import that directly into neo4j?
I understand that the .unwind function in the C# neo4j driver may be of help here but do not see good examples of its use and then relationships need to be matched and created separately
Is there an optimal method for doing this? i expect to have around 50k nodes
OK, first off, I'm using Neo4jClient for this and I've added an INDEX to the DB using:
CREATE INDEX ON :MyClass(Id)
This is important for the way this works, as it makes inserting the data a lot quicker.
I have a class:
public class MyClass
{
public int Id {get;set;}
public string AValue {get;set;}
public ICollection<int> LinkToIds {get;set;} = new List<int>();
}
Which has an Id which I'll be keying off, and a string property - just because. The LinkToIds property is a collection of Ids that this instance is linked to.
To generate my MyClass instances I'm using this method to randomly generate them:
private static ICollection<MyClass> GenerateMyClass(int number = 50000){
var output = new List<MyClass>();
Random r = new Random((int) DateTime.Now.Ticks);
for (int i = 0; i < number; i++)
{
var mc = new MyClass { Id = i, AValue = $"Value_{i}" };
var numberOfLinks = r.Next(1, 10);
for(int j = 0; j < numberOfLinks; j++){
var link = r.Next(0, number-1);
if(!mc.LinkToIds.Contains(link) && link != mc.Id)
mc.LinkToIds.Add(link);
}
output.Add(mc);
}
return output;
}
Then I use another method to split this into smaller 'batches':
private static ICollection<ICollection<MyClass>> GetBatches(ICollection<MyClass> toBatch, int sizeOfBatch)
{
var output = new List<ICollection<MyClass>>();
if(sizeOfBatch > toBatch.Count) sizeOfBatch = toBatch.Count;
var numBatches = toBatch.Count / sizeOfBatch;
for(int i = 0; i < numBatches; i++){
output.Add(toBatch.Skip(i * sizeOfBatch).Take(sizeOfBatch).ToList());
}
return output;
}
Then to actually add into the DB:
void Main()
{
var gc = new GraphClient(new Uri("http://localhost:7474/db/data"), "neo4j", "neo");
gc.Connect();
var batches = GetBatches(GenerateMyClass(), 5000);
var now = DateTime.Now;
foreach (var batch in batches)
{
DateTime bstart = DateTime.Now;
var query = gc.Cypher
.Unwind(batch, "node")
.Merge($"(n:{nameof(MyClass)} {{Id: node.Id}})")
.Set("n = node")
.With("n, node")
.Unwind("node.LinkToIds", "linkTo")
.Merge($"(n1:{nameof(MyClass)} {{Id: linkTo}})")
.With("n, n1")
.Merge("(n)-[:LINKED_TO]->(n1)");
query.ExecuteWithoutResults();
Console.WriteLine($"Batch took: {(DateTime.Now - bstart).TotalMilliseconds} ms");
}
Console.WriteLine($"Total took: {(DateTime.Now - now).TotalMilliseconds} ms");
}
On my aging (5-6 years old now) machine it takes about 20s to put 50,000 nodes in and around about 500,000 relationships.
Let's break into that important call to Neo4j above. The key things are as you rightly suggesting UNWIND - here I UNWIND a batch and give each 'row' in that collection the identifier of node. I can then access the properties (node.Id) and use that to MERGE a node. In the first unwind - I always SET the newly created node (n) to be the node so all the properties (in this case just AValue) are set.
So up to the first With we have a new Node created with a MyClass label, and all it's properties set. Now. This does include having an array of LinkToIds which if you were a tidy person - you might want to remove. I'll leave that to yourself.
In the second UNWIND we take advantage of the fact that the LinkToIds property is an Array, and use that to create a 'placeholder' node that will be filled later, then we create a relationship between the n and the n1 placeholder. NB - if we've already created a node with the same id as n1 we'll use that node, and when we get to the same Id during the first UNWIND we'll set all the properties of the placeholder.
It's not the easiest to explain, but in the best things to look at are MERGE and UNWIND in the Neo4j Documentation.
I am trying to read and store data from an xml file. I have been reading about various methods to read the data such as XmlReader, XmlTextReader, LinQ, etc.
My XML file is
<?xml version="1.0" encoding="utf-8" ?>
<configuration>
<circuit name="local">
<Device>device1</Device>
<Point>point1></Point>
</circuit>
<circuit name ="remote">
<Device>device2</Device>
<Point>point2</Point>
</circuit>
</configuration>
I am trying to extract Device and Point set so I can pass those along to be used in a database query. I used this code and the foreach loop to verify the contents, but it only gets the first set.
XDocument msrDoc = XDocument.Load("BNOC MSR.config");
var data = from item in msrDoc.Descendants("circuit")
select new
{
device = item.Element("Device").Value,
point = item.Element("Point").Value
};
foreach (var p in data)
Console.WriteLine(p.ToString());
I have also tried this, but my arrays were all null
String[] deviceList = new String[1];
String[] pointList = new String[1];
int n = 0;
XmlDocument msrDoc = new XmlDocument();
msrDoc.Load("BNOC MSR.config");
var itemNodes = msrDoc.SelectNodes("circuit");
foreach (XmlNode node in itemNodes)
{
var circuit = node.SelectNodes("circuit");
foreach (XmlNode cir in circuit)
{
deviceList[n] = cir.SelectSingleNode("Device").InnerText;
pointList[n] = cir.SelectSingleNode("Point").InnerText;
}
}
Any help would be greatly appreciated.
Are you sure you don't want to use the built-in Properties.Settings for this?
Circuit local = Properties.Settings.Default.localCircuit;
Circuit remote = Properties.Settings.Default.remoteCircuit;
https://learn.microsoft.com/en-us/dotnet/framework/winforms/advanced/using-application-settings-and-user-settings
I believe there is something wrong with the way you are testing the result. The code:
void Main()
{
var fileLocation = #"C:\BrianTemp\input.txt";
var xml = File.ReadAllText(fileLocation);
XDocument msrDoc = XDocument.Load(fileLocation);
var data = from item in msrDoc.Descendants("circuit")
select new
{
device = item.Element("Device").Value,
point = item.Element("Point").Value
};
foreach (var p in data)
{
//It is best practice to use statement blocks {} to prevent silly errors.
//Sometimes you want to execute multiple statements, especially as code changes later
Console.WriteLine($"{p}");
}
}
Produces the expected output:
{ device = device1, point = point1 }
{ device = device2, point = point2 }
You said:
I used this code and the foreach loop to verify the contents, but it
only gets the first set.
As you can see the code produces 2 results as it should.
Note: I corrected the XML file to remove the extra >
<Point>point1></Point><==
I see two problems in your code (and I only tried the second method you posted):
Your string arrays are too small, change to:
String[] deviceList = new String[1];
String[] pointList = new String[1];
The line var itemNodes = msrDoc.SelectNodes("circuit"); should be
var itemNodes = msrDoc.SelectNodes("configuration");
I am using the following code snippet to parse and convert some XML data to CSV. I can convert the entire XML data and dump it into a file, however my requirements have changed and now I'm confused.
public void xmlToCSVfiltered(string p, int e)
{
string all_lines1 = File.ReadAllText(p);
all_lines1 = "<Root>" + all_lines1 + "</Root>";
XmlDocument doc_all = new XmlDocument();
doc_all.LoadXml(all_lines1);
StreamWriter write_all = new StreamWriter(FILENAME2);
XmlNodeList rows_all = doc_all.GetElementsByTagName("XML");
List<string[]> filtered = new List<string[]>();
foreach (XmlNode rowtemp in rows_all)
{
List<string> children_all = new List<string>();
foreach (XmlNode childtemp in rowtemp.ChildNodes)
{
children_all.Add(Regex.Replace(childtemp.InnerText, "\\s+", " ")); // <------- Fixed the Bug , Advisories dont span
}
string.Join(",", children_all.ToArray());
//write_all.WriteLine(string.Join(",", children_all.ToArray()));
if (children_all.Contains(e.toString()))
{
filtered.Add(children_all.ToArray());
write_all.WriteLine(children_all);
}
}
write_all.Flush();
write_all.Close();
foreach (var res in filtered)
{
Console.WriteLine(string.Join(",", res));
}
}
My input looks something like the following... My objective now is to only convert those "events" and compile into a CSV which have a certain number. Lets say, for example, I only want to convert to CSV those events who's 2nd data value under element <EVENT> is 4627. It would only convert those events and in the case of the input below, both mentioned below.
<XML><HEADER>1.0,770162,20121009133435,3,</HEADER>20121009133435,721,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,1872161156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell
tham ALL out. For some reason
that is not the case
please press the on button
when trying to activate
device codes also available on
list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-71-80,-66</LOCATION><ROUTE></ROUTE><SITE></SITE><POWER>0,50</POWER></XML>
<XML><HEADER>2.0,773162,20121009133435,3,</HEADER>20121004133435,761,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,18735166156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell
tham ALL out. For some reason
that is not the case
please press the on button
when trying to activate
device codes also available on
list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-71-80,-66</LOCATION><ROUTE></ROUTE><SITE></SITE><POWER>0,50</POWER></XML>
.. goes on
What my approach has been so far is to convert everything to CSV and store it in some sort of data structure and then query that data structure line by line and look if that number exists and if yes, write it to a file line by line. My function takes the path of the XML file and the number we are looking for in the XML data as parameters. I'm new to C# and I cannot understand how I would go about changing my function above. Any help will be appreciated!
EDIT:
Sample Input:
<XML><HEADER>1.0,770162,20121009133435,3,</HEADER>20121009133435,721,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,1872161156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell
tham ALL out. For some reason
that is not the case
please press the on button
when trying to activate
device codes also available on
list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-
<XML><HEADER>1.0,770162,20121009133435,3,</HEADER>20121009133435,721,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4623,</EVENT><DRUG>1,1872161156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell
tham ALL out. For some reason
that is not the case
please press the on button
when trying to activate
device codes also available on
list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-
Required Output:
1.0,770162,20121009133435,3,,20121009133435,721,5,1,0,0,0,00:00,00:00,,00032134 26064957,4627,1,,1872161156,7,0,10000,1,0,5000000,0,10000000,0,1 ,,Keep it simple or spell
tham ALL out. For some reason
that is not the case
please press the on button
when trying to activate
device codes also available on
list,,,20121009133435,00-1d-71-0a-71-80,-66,,,0,50
The above will be the case if I call xmlToCSVfiltered(file, 4627);
Also note that, the output will be a single horizontal line as in CSV files but I can't really format it here for it to look like that.
I changed XmlDocumnet to XDocument so I can use Xml Linq. I also for testing used a StringReader to read the string instead of reading from a file. You can convert code back to your original File.ReadAlltext.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using System.IO;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
const string FILENAME2 = #"c:\temp\test.txt";
static void Main(string[] args)
{
string input =
"<XML><HEADER>1.0,770162,20121009133435,3,</HEADER>20121009133435,721,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,1872161156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell\n" +
"tham ALL out. For some reason \n" +
"that is not the case\n" +
"please press the on button\n" +
"when trying to activate\n" +
"device codes also available on\n" +
"list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-71-80,-66</LOCATION><ROUTE></ROUTE><SITE></SITE><POWER>0,50</POWER></XML>\n" +
"<XML><HEADER>2.0,773162,20121009133435,3,</HEADER>20121004133435,761,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,18735166156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell\n" +
"tham ALL out. For some reason\n" +
"that is not the case\n" +
"please press the on button\n" +
"when trying to activate\n" +
"device codes also available on\n" +
"list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-71-80,-66</LOCATION><ROUTE></ROUTE><SITE></SITE><POWER>0,50</POWER></XML>\n";
xmlToCSVfiltered(input, 4627);
}
static public void xmlToCSVfiltered(string p, int e)
{
//string all_lines1 = File.ReadAllText(p);
StringReader reader = new StringReader(p);
string all_lines1 = reader.ReadToEnd();
all_lines1 = "<Root>" + all_lines1 + "</Root>";
XDocument doc_all = XDocument.Parse(all_lines1);
StreamWriter write_all = new StreamWriter(FILENAME2);
List<XElement> rows_all = doc_all.Descendants("XML").Where(x => x.Element("EVENT").Value.Split(new char[] {','}).Skip(1).Take(1).FirstOrDefault() == e.ToString()).ToList();
List<string[]> filtered = new List<string[]>();
foreach (XElement rowtemp in rows_all)
{
List<string> children_all = new List<string>();
foreach (XElement childtemp in rowtemp.Elements())
{
children_all.Add(Regex.Replace(childtemp.Value, "\\s+", " ")); // <------- Fixed the Bug , Advisories dont span
}
string.Join(",", children_all.ToArray());
//write_all.WriteLine(string.Join(",", children_all.ToArray()));
if (children_all.Contains(e.ToString()))
{
filtered.Add(children_all.ToArray());
write_all.WriteLine(children_all);
}
}
write_all.Flush();
write_all.Close();
foreach (var res in filtered)
{
Console.WriteLine(string.Join(",", res));
}
}
}
}
I have made some assumptions since it was not clear to me from the question
Assumptions
1. I am assuming you know that you need to check node event and you need to second position element from there.
2. You know the delimiter between the values in node. for eg. ',' here in events
public void xmlToCSVfiltered(string p, int e, string nodeName, char delimiter)
{
//get the xml node
XDocument xml = XDocument.Load(p);
//get the required node. I am assuming you would know. For eg. Event Node
var requiredNode = xml.Descendants(nodeName);
foreach (var node in requiredNode)
{
if (node == null)
continue;
//Also here, I am assuming you have the delimiter knowledge.
var valueSplit = node.Value.Split(delimiter);
foreach (var value in valueSplit)
{
if (value == e.ToString())
{
AddToCSV();
}
}
}
}
I have a Shapefile that contains several thousand polygons.
I need to read from this file in C# and output a list of WKT formatted strings.
I looked at DotSpatial and the "CatFood" ESRI Shapefile Reader. I can get either to load the shapefile just fine, but I cannot figure out how to then export as WKT.
In DotSpatial, the only examples I could find use a WktWriter which takes a Geometry. I couldn't figure out how to get a Geometry from a Shape.
Is there a library that's more appropriate for this?
Update
Thanks to mdm20's answer, I was able to write the following:
using (var fs = FeatureSet.Open(path))
{
var writer = new WktWriter();
var numRows = fs.NumRows();
for (int i = 0; i < numRows; i++)
{
var shape = fs.GetShape(i, true);
var geometry = shape.ToGeometry();
var wkt = writer.Write((Geometry) geometry);
Debug.WriteLine(wkt);
}
}
The reason I missed it originally is because I was following this sample, which uses fs.ShapeIndices instead of fs.GetShape(). That returns not a Shape, but a ShapeRange, which I couldn't convert to a geometry.
New Questions
Should I be setting fs.IndexMode = true? Why or why not? It doesn't seem to have any performance or results impact.
fs.GetShape() takes a boolean called getAttributes. I do have attributes on my shapes, and they seem to come through whether this is set true or false. Again, there is no noticeable performance impact either way. Is that expected?
By getting them in this way, does the WKT represent the actual values stored in the shapefile? Or are they transformed in any way? Is it taking any default settings from dotSpatial into account, and should I be concerned about changing them?
The shapefile I am importing is the world timezone map. It does contain a .prj file. Does dotSpatial take this into account, and if not - do I need to do anything extra?
Many Thanks!
In DotSpatial, the Shape class has a ToGeometry method.
/// <summary>
/// Converts this shape into a Geometry using the default factory.
/// </summary>
/// <returns>The geometry version of this shape.</returns>
public IGeometry ToGeometry()
{
return ToGeometry(Geometry.DefaultFactory);
}
Edit
I've only used the dotspatial stuff for projections, so I can't really help you too much.
1-2: Not sure. The code is open source if you want to look and see what they do
3: WKT is a human readable text representation of the geometry. I would assume that it's the same value as the file, but I don't know. Again.. check out the dotspatial source code
4: The prj file tells you what projection the geometry is in. Depending on what you want to do with it, you might have to re-project it. Things like Bing Maps and Google Earth use a mercator projection, for instance. The dotspatial projections library is good and makes it easy to transform the geometry from one projection to another.
I've done quite a bit of work with shapefiles.. let me know if you have more questions.
try this:
private void button1_Click(object sender, EventArgs e)
{
String result = "";
OpenFileDialog openfile = new OpenFileDialog();
openfile.Filter = "Shapefile (*.shp)|*.shp|All files (*.*)|*.*";
openfile.ShowDialog();
String filePath = openfile.FileName.Replace(".shp", "").Replace(#"\", #"\\");
String[] a = filePath.Split('\\');
String shpName = a[a.Length-1];
try
{
SQLiteConnection.CreateFile(openfile.FileName.Replace(".shp", "")+".sqlite");
System.Data.SQLite.SQLiteConnection connection = new SQLiteConnection(#"Data Source=" + openfile.FileName.Replace(".shp", "") + ".sqlite");
connection.Open();
object returnvalue = new SQLiteCommand("SELECT load_extension('libspatialite-2.dll')", connection).ExecuteScalar();
System.Data.SQLite.SQLiteCommand commande = new SQLiteCommand(connection);
commande.CommandText = "CREATE virtual TABLE "+shpName+"VT USING VirtualShape('" + filePath + "', 'CP1252', 4326);";
commande.ExecuteScalar();
commande.CommandText = "CREATE TABLE geom AS SELECT * FROM " + shpName + "VT;";
commande.ExecuteScalar();
commande.CommandText = "drop table " + shpName + "VT";
commande.ExecuteScalar();
commande.CommandText = "ALTER TABLE geom ADD COLUMN WKT TEXT;";
commande.ExecuteScalar();
commande.CommandText = " UPDATE geom set WKT= ST_AsText(Geometry);";
commande.ExecuteScalar();
// the test commande
commande.CommandText = "SELECT WKT FROM geom;";
result = (string)commande.ExecuteScalar();
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
MessageBox.Show(result);
}
First open the shapefile and then get its features basic geometry.......
IFeatureSet fb = FeatureSet.Open("F:\\Test_value\\test.shp");
List<string> str = new List<string>();
foreach (IFeature ff in fb.Features)
{
Geometry geometry = ff.BasicGeometry as Geometry;
WktWriter wktWriter = new WktWriter();
str.Add(wktWriter.Write(geometry));
}