Exporting MongoDB Documents to CSV in C# - c#

I want to export a CSV table from the items of an IMongoCollection from MongoDB.Driver using C#.
How would I be able to do this efficiently? I was thinking of doing this by retrieving the documents from the collection and either convert them to a JSON-like format or use a StringBuilder to create the CSV file using and array of PropertyInfo to access the fields of the retrieved object.
Can someone come with an example of how I would be able to do this?

Seems like the obvious way is to get all header data somehow (see further below), and then iterate through the collection and if you were to write by hand (which people don't encourage), string build, writing to file in batches (if your collection were quite large).
HashSet<string> fields = new HashSet<string>();
BsonDocument query = BsonDocument.Parse(filter);
var result = database.GetCollection<BsonDocument>(collection).Find(new BsonDocument());
// Populate fields with all unique fields, see below for examples how.
var csv = new StringBuilder();
string headerLine = string.Join(",", fields);
csv.AppendLine(headerLine);
foreach (var element in result.ToListAsync().Result)
{
string line = null;
foreach (var field in fields)
{
BsonValue value;
if (field.Contains("."))
{
value = GetNestedField(element, field);
}
else
{
value = element.GetElement(field).Value;
}
// Example deserialize to string
switch (value.BsonType)
{
case BsonType.ObjectId:
line = line + value.ToString();
break;
case BsonType.String:
line = line + value.ToString();
break;
case BsonType.Int32:
line = line + value.AsInt32.ToString();
break;
}
line = line + ",";
}
csv.AppendLine(line);
}
File.WriteAllText("D:\\temp.csv", csv.ToString());
In the case of your own objects you'd have to use your own deserializer.
HOWEVER I'd recommend using the mongoexport tool if you can.
You could simply run the exe from your application, feeding in arguments as required. Keep in mind though, that it requires explicit fields.
ProcessStartInfo startInfo = new ProcessStartInfo();
startInfo.FileName = "C:\mongodb\bin\mongoexport.exe";
startInfo.Arguments = "-d testDB -c testCollection --type csv --fields name,address.street,address.zipCode --out .\output.csv";
startInfo.UseShellExecute = false;
Process exportProcess= new Process();
exportProcess.StartInfo = startInfo;
exportProcess.Start();
exportProcess.WaitForExit();
More on mongoexport such as paging, additional queries and field file:
https://docs.mongodb.com/manual/reference/program/mongoexport/
Getting Unique Field Names
In order to find ALL field names you could do this a number of ways. Using BsonDocument as a generic data example.
Recursively traverse through your IMongoCollection results. This is going to have to be through the entire collection, so performance may not be great.
Example:
HashSet<string> fields = new HashSet<string>();
var result = database.GetCollection<BsonDocument>(collection).Find(new BsonDocument());
var result = database.GetCollection<BsonDocument>(collection).Find(new BsonDocument());
foreach (var element in result.ToListAsync().Result)
{
ProcessTree(fields, element, "");
}
private void ProcessTree(HashSet<string> fields, BsonDocument tree, string parentField)
{
foreach (var field in tree)
{
string fieldName = field.Name;
if (parentField != "")
{
fieldName = parentField + "." + fieldName;
}
if (field.Value.IsBsonDocument)
{
ProcessTree(fields, field.Value.ToBsonDocument(), fieldName);
}
else
{
fields.Add(fieldName);
}
}
}
Perform a MapReduce operation to return all fields. Scanning nested fields becomes more complex with this method however. See this.
Example:
string map = #"function() {
for (var key in this) { emit(key, null); }
}";
string reduce = #"function(key, stuff) { return null; }";
string finalize = #"function(key, value){
return key;
}";
MapReduceOptions<BsonDocument, BsonValue> options = new MapReduceOptions<BsonDocument, BsonValue>();
options.Finalize = new BsonJavaScript(finalize);
var results = database.GetCollection<BsonDocument>(collection).MapReduceAsync(
new BsonJavaScript(map),
new BsonJavaScript(reduce),
options).Result.ToListAsync().Result;
foreach (BsonValue result in results.Select(item => item["_id"]))
{
Debug.WriteLine(result.AsString);
}
Perform an Aggregation operation. You'd need to unwind as many times as required to get all nested fields.
Example:
string[] pipeline = new string[3];
pipeline[0] = "{ '$project':{ 'arrayofkeyvalue':{ '$objectToArray':'$$ROOT'}}}";
pipeline[1] = "{ '$unwind':'$arrayofkeyvalue'}";
pipeline[2] = "{ '$group':{'_id':null,'fieldKeys':{'$addToSet':'$arrayofkeyvalue.k'}}}";
var stages = pipeline.Select(s => BsonDocument.Parse(s)).ToList();
var result = await database.GetCollection<BsonDocument>(collection).AggregateAsync<BsonDocument>(stages);
foreach (BsonValue fieldName in result.Single().GetElement("fieldKeys").Value.AsBsonArray)
{
Debug.WriteLine(fieldName.AsString);
}
Nothing perfect here and I couldn't tell you which is the most efficient but hopefully something to help.

Related

C# how to write JSON with enumerated identifiers

I am writing a code to convert Excel to JSON (so far it works).
But I got a problem, I need to number each line that I am writing after the word Match_ (Aka Match_1, Match_2, Match_3).
If you look towards the end of the code, I tried to maybe put For? but than it gives me all Match_i..
How can I use Replace command so I can actually put corresponding numbers after the word Match_?
IP = another string I am adding to the sentence. Ignore it
row[0] = the text its taking as is from the row from the excel
Match_ is not a var, its literally a text taken, I can also write there Oded_ and then it will write Oded_ = (IP string) + (excel text on row[0])
Match_ is a text I am actually trying to replace from within the text, as I cannot do FOR inside the Link Query.
using (var conn = new OleDbConnection(connectionString))
{
conn.Open();
var cmd = conn.CreateCommand();
cmd.CommandText = $"SELECT * FROM [{sheetName}$]";
using (var rdr = cmd.ExecuteReader())
{
if (rdr != null)
{
//LINQ query - when executed will create anonymous objects for each row
var query = rdr.Cast<DbDataRecord>().Select(row => new
{
Match_ = IP + row[0]
});
//Generates JSON from the LINQ query
var json = JsonConvert.SerializeObject(query);
//Write the file to the destination path
for (int i = 1; i<200; i++)
{
json = json.Replace("match_", "match_" + i );
}
File.WriteAllText(destinationPath, json);
}
}
So, after it is assigned query is an IEnumerable<> of your anonymous type that will have 0 to many rows. Those rows are not actually evaluated yet. The important think to remember is that you are making an anonymous type, not an anonymous object, so all enumerations of your result must be of that type, you can't switch one by one.
There are many way to achieve what you want but possibly the most expedient is to include the iterator in your select enumerator, then return a JObject something like this,
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
...
var query = rdr.Cast<DbDataRecord>().Select((row, i) => {
var result = new JObject();
result.Add( $"match_{i}", IP + row[0]);
return result;
});
Then you won't have to do any error prone and costly string manipulation on your JSON, it will already be formatted correctly.
Here is a full working example of this in action,
using System;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using System.Linq;
public class Program
{
public static void Main()
{
var query = Enumerable
.Range(1,5)
.Select( (n, i) =>
{
var result = new JObject();
result.Add($"match_{i}", n);
return result;
});
Console.WriteLine(
JsonConvert.SerializeObject(
query,
Formatting.Indented));
}
}
It is possible to do this with the more modern System.Text.Json but you'll have to embed the work in a writer.
Try regex.
class Program
{
int i = 0;
static void Main(string[] args)
{
string json = "match_ abc match_ def match_ hijmatch_";
string pattern = "match_";
Program p = new Program();
MatchEvaluator myEvaluator = new MatchEvaluator(p.ReplaceCC);
Regex r = new Regex(pattern);
string output = r.Replace(json, myEvaluator);
}
public string ReplaceCC(Match m)
// Replace each Regex cc match with the number of the occurrence.
{
i++;
return m.Value + i.ToString();
}
}

How to change the value of a StringBuilder Parsing through JArray

I have struggled to finish this task, please if anyone can give me a hint I would be so thankful.
My main task is to get data from database using (FOR JSON AUTO) which is working :)
select filed1, field2, field3 from table FOR JSON AUTO;
And then after connecting to Data base I use the StringBuilder() to build a Json Array of objects which is working :)
var jsonResult = new StringBuilder();
if(!r.HasRows)
{
jsonResult.Append("[]");
}
else
{
while(r.Read())
{
jsonResult.Append(r.GetValue(0).ToString());
}
// JArray array = JArray...
}
After that I am trying to change the value of filed1 for each object inside the Json Array
JArray array = JArray.Parse(jsonResult.ToString());
foreach (JObject obj in array.Children<JObject>())
{
foreach (JProperty singleProp in obj.Properties())
{
string name = singleProp.Name;
string value = singleProp.Value.ToString();
if(name.ToString() == "field1")
{
Int64 newID = 1234;
value = newID.ToString();
}
}
}
This is working but My BIG QUESTION is how can I get it changed inside the jsonResult?
You simply have to replace the value that you want to update. Since StringBuilder has a .Replace inbuilt method, you can implement that method.
`JArray arr = JArray.Parse(jsonResult.ToString());
foreach (JObject obj in arr.Children<JObject>())
{
foreach(JProperty singleProp in obj.Properties())
{
string name = singleProp.Name;
string value = singleProp.Value.ToString();
if (name.ToString().Equals("field1")) //good practice
{
Int64 newID = 1234;
jsonResult.Replace(value, newID.ToString());//replacing old value with new value and directly updates jsonResult
}
//not necesssary, explanation is given below
var jsonElement = JsonSerializer.Deserialize<JsonElement>(jsonResult.ToString());
result = JsonSerializer.Serialize(jsonElement, options);
}
}`
And for better formatting, I used JsonSerializer so that your output will look like json object rather than whole string without any lines.
` var options = new JsonSerializerOptions()
{
WriteIndented = true
};
var result = ""
while loop{
jsonResult.Append(r.GetValue(0).ToString());
(Above code)
}
`

UWP - Compare data on JSON and database

I have a database called ebookstore.db as below:
and JSON as below:
I want when slug on JSON is not the same as a title in the database, it will display the amount of data with a slug on JSON which is not same as a title in the database in ukomikText.
Code:
string judulbuku;
try
{
string urlPath1 = "https://...";
var httpClient1 = new HttpClient(new HttpClientHandler());
httpClient1.DefaultRequestHeaders.TryAddWithoutValidation("KIAT-API-KEY", "....");
var values1 = new List<KeyValuePair<string, string>>
{
new KeyValuePair<string, string>("halaman", 1),
new KeyValuePair<string, string>("limit", 100),
};
var response1 = await httpClient1.PostAsync(urlPath1, new FormUrlEncodedContent(values1));
response1.EnsureSuccessStatusCode();
if (!response1.IsSuccessStatusCode)
{
MessageDialog messageDialog = new MessageDialog("Memeriksa update Komik gagal", "Gangguan Server");
await messageDialog.ShowAsync();
}
string jsonText1 = await response1.Content.ReadAsStringAsync();
JsonObject jsonObject1 = JsonObject.Parse(jsonText1);
JsonArray jsonData1 = jsonObject1["data"].GetArray();
foreach (JsonValue groupValue in jsonData1)
{
JsonObject groupObject = groupValue.GetObject();
string id = groupObject["id"].GetString();
string judul = groupObject["judul"].GetString();
string slug = groupObject["slug"].GetString();
BukuUpdate file1 = new BukuUpdate();
file1.ID = id;
file1.Judul = judul;
file1.Slug = slug;
List<String> title = sqlhelp.GetKomikData();
foreach (string juduldb in title)
{
judulbuku = juduldb.Substring(juduldb.IndexOf('.') + 1);
if (judulbuku != file1.Slug.Replace("-", "_") + ".pdf")
{
BukuData.Add(file1);
ListBuku.ItemsSource = BukuData;
}
else
{
ukomikText.Text = "belum tersedia komik yang baru";
ukomikText.Visibility = Visibility.Visible;
}
}
}
if (ListBuku.Items.Count > 0)
{
ukomikText.Text = BukuData.Count + " komik baru";
ukomikText.Visibility = Visibility.Visible;
jumlahbuku = BukuData.Count;
}
else
{
ukomikText.Text = "belum tersedia komik yang baru";
ukomikText.Visibility = Visibility.Visible;
}
public static List<String> GetKomikData()
{
List<String> entries = new List<string>();
using (SqliteConnection db =
new SqliteConnection("Filename=ebookstore.db"))
{
db.Open();
SqliteCommand selectCommand = new SqliteCommand
("SELECT title FROM books where folder_id = 67", db);
SqliteDataReader query = selectCommand.ExecuteReader();
while (query.Read())
{
entries.Add(query.GetString(0));
}
db.Close();
}
return entries;
}
BukuUpdate.cs:
public string ID { get; set; }
public string Judul { get; set; }
public string Slug { get; set; }
I have a problem, that is when checking slugs on JSON, then the slug that is displayed is the first slug is displayed repeatedly as much data in the database, after that show the second slug repeatedly as much data on the database, and so on, as below:
How to solve it so that slug on JSON is not displayed repeatedly (according to the amount of data on JSON)?
The problem is that you have two nested foreach loops. What the code does in simplified pseudocode:
For each item in JSON
Load all rows from DB
And for each loaded row
Check if the current JSON item matches the row from DB and if not, output
As you can see, if you have N items in the JSON and M rows in the database, this inevitably leads to N*M lines of output except for those rare ones where the JSON item matches a specific row in database.
If I understand it correctly, I assume that you instead want to check if there is a row that matches the JSON item and if not, output it. You could do this the following way:
List<String> title = sqlhelp.GetKomikData();
HashSet<string> dbItems = new HashSet<string>();
foreach (string juduldb in title)
{
judulbuku = juduldb.Substring(juduldb.IndexOf('.') + 1);
dbItems.Add( judulbuku );
}
...
foreach ( JsonValue groupValue in jsonData1 )
{
...
//instead of the second foreach
if ( !dbItems.Contains( file1.Slug.Replace("-", "_") + ".pdf" ) )
{
//item is not in database
}
else
{
//item is in database
}
}
Additional tips
Avoid calling GetKomikData inside the foreach. This method does not have any arguments and that means you are just accessing the database again and again without a reason, which takes time and slows down the execution significantly. Instead, call GetKomikData only once before the first foreach and then just use title variable.
Don't assign ItemsSource every time the collection changes. This will unnecessarily slow down the UI thread, as it will have to reload all the items with each loop. Instead, assign the property only once after the outer foreach
write your code in one language. When you start mixing variable names in English with Indonesian, the code becomes confusing and less readable and adds cognitive overhead.
avoid non-descriptive variable names like file1 or jsonObject1. The variable name should be clear and tell you what it contains. When there is a number at the end, it usually means it could be named more clearly.
use plurals for list variable names - instead of title use titles

Creating a csv file from json with different header values per record

I have a massive json file that is very nested. I need to write the multiple csv files depending on the name of a certain field, if it exists then add the values to the headers I've created if it does not then create a new one. This is working just fine. However I have ran into a problem where the headers do not match because this particular header doesn't exist for that record. Example:
Header: Dog Cat Mouse Horse
Record1: yes yes yes yes
// above is an example of a file with all values
Adding record Two where a header value is not listed at all
Header: Dog Cat Mouse Horse
Record1: yes yes yes yes
Record2: yes yes yes ***
Record2 above does not have a mouse on the record but because it doesn't line up the yes shifted left. I need to write a Null under than header before spitting out the values to the file. Below is my code if you could help that would be great as I'm lost at this point:
static List<string> headList = new List<string>();
static List<string> newHeaderList = new List<string>();
static List<string> valueList = new List<string>();
static List<string> oldHeadList = new List<string>();
static void Main()
{
var data = JsonConvert.DeserializeObject<dynamic>(File.ReadAllText(
#"C:\Users\nphillips\workspace\2016R23\UITestAutomation\SeedDataGenerator\src\staticresources\seeddata.resource"));
string fileName = "";
var bundles = data.RecordSetBundles;
foreach (var bundle in bundles)
{
var records = bundle.Records;
foreach (var record in records)
{
var test = record.attributes;
foreach (var testagain in test)
{
// Getting the object Name Ex. Location, Item, etc.
var jprop = testagain as JProperty;
if (jprop != null)
{
fileName = jprop.First.ToString().Split('_')[2] + ".csv";
}
break;
}
string header = "";
string value = "";
foreach (var child in record)
{
var theChild = child as JProperty;
if (theChild != null && !theChild.Name.Equals("attributes"))
{
// adding the name and values to list
headList.Add(child.Name);
valueList.Add(child.Value.ToString());
}
}
// calling method to write columns and values
writeCSV(headList, valueList, fileName);
valueList.Clear();
headList.Clear();
}
}
}
public static void writeCSV(List<string> headList, List<string> valList, string fileName)
{
string headerString = "";
string value = "";
if (!File.Exists(fileName))
{
foreach (var header in headList)
{
foreach (var val in valList)
{
value += val + ",";
}
oldHeadList.Add(header);
headerString += header + ',';
}
headerString += "+" + Environment.NewLine;
File.WriteAllText(fileName, headerString);
}
else
{
foreach (var header in headList)
{
foreach (var oldHeader in oldHeadList)
{
foreach (var val in valList)
{
if (header != oldHeader)
{
value += "null,";
}
else
{
value += val + ",";
}
}
}
}
}
File.AppendAllText(fileName, value);
value += Environment.NewLine;
}
}
My horrific json file that I cannot change as its used by my company: https://codeshare.io/rGL6K5
Is there some kind of pattern?
Reason I am asking is, maybe you could create a service that serialize the JSON to a complex object(s). Once that is done have a service that serializes that object to csv. The service would know to write multiple csv files as needed.
I'd stay away from using {dynamic}, if there is a reliable pattern to the JSON. I'd get a sample JSON file, copy it into the clipboard and using the Paste JSON to Classes feature in Visual Studio.
https://blogs.msdn.microsoft.com/webdev/2012/12/18/paste-json-as-classes-in-asp-net-and-web-tools-2012-2-rc/
After that, deserialize it with Newtonsoft.JSON into a nice reliable object from which to build your CSV.

C# compare fields from different lines in csv

I am trying to compare the value in the 0 index of an array on one line and the 0 index on the following line. Imagine a CSV where I have a unique identifier in the first column, a corresponding value in the second column.
USER1, 1P
USER1, 3G
USER2, 1P
USER3, 1V
I would like to check the value of [0] the next line (or previous if that's easier) to compare and if they are the same (as they are in the example) concatenate it to index 1. That is, the data should read as
USER1, 1P, 3G
USER2, 1P
USER3, 1V
before it gets passed onto the next function. So far I have
private void csvParse(string path)
{
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.Delimiters = new string[] { "," };
while (!parser.EndOfData)
{
string[] parts = parser.ReadFields();
if (parts == null)
{
break;
}
contact.ContactId = parts[0];
long nextLine;
nextLine = parser.LineNumber+1;
//if line1 parts[0] == line2 parts[0] etc.
}
}
}
Does anyone have any suggestions? Thank you.
How about saving the array into a variable:
private void csvParse(string path)
{
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.Delimiters = new string[] { "," };
string[] oldParts = new string[] { string.Empty };
while (!parser.EndOfData)
{
string[] parts = parser.ReadFields();
if (parts == null || parts.Length < 1)
{
break;
}
if (oldParts[0] == parts[0])
{
// concat logic goes here
}
else
{
contact.ContactId = parts[0];
}
long nextLine;
nextLine = parser.LineNumber+1;
oldParts = parts;
//if line1 parts[0] == line2 parts[0] etc.
}
}
}
If I understand you correctly, what you are asking is essentially "how do I group the values in the second column based on the values in the first column?".
A quick and quite succinct way of doing this would be to Group By using LINQ:
var linesGroupedByUser =
from line in File.ReadAllLines(path)
let elements = line.Split(',')
let user = new {Name = elements[0], Value = elements[1]}
group user by user.Name into users
select users;
foreach (var user in linesGroupedByUser)
{
string valuesAsString = String.Join(",", user.Select(x => x.Value));
Console.WriteLine(user.Key + ", " + valuesAsString);
}
I have left out the use of your TextFieldParser class, but you can easily use that instead. This approach does, however, require that you can afford to load all of the data into memory. You don't mention whether this is viable.
The easiest way to do something like this is to convert each line to an object. You can use CsvHelper, https://www.nuget.org/packages/CsvHelper/, to do the work for you or you can iterate each line and parse to an object. It is a great tool and it knows how to properly parse CSV files into a collection of objects. Then, whether you create the collection yourself or use CsvHelper, you can use Linq to GroupBy, https://msdn.microsoft.com/en-us/library/bb534304(v=vs.100).aspx, your "key" (in this case UserId) and Aggregate, https://msdn.microsoft.com/en-us/library/bb549218(v=vs.110).aspx, the other property into a string. Then, you can use the new, grouped by, collection for your end goal (write it to file or use it for whatever you need).
You're basically finding all the unique entries so put them into a dictionary with the contact id as the key. As follows:
private void csvParse(string path)
{
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.Delimiters = new string[] { "," };
Dictionary<string, List<string>> uniqueContacts = new Dictionary<string, List<string>>();
while (!parser.EndOfData)
{
string[] parts = parser.ReadFields();
if (parts == null || parts.Count() != 2)
{
break;
}
//if contact id not present in dictionary add
if (!uniqueContacts.ContainsKey(parts[0]))
uniqueContacts.Add(parts[0],new List<string>());
//now there's definitely an existing contact in dic (the one
//we've just added or a previously added one) so add to the
//list of strings for that contact
uniqueContacts[parts[0]].Add(parts[1]);
}
//now do something with that dictionary of unique user names and
// lists of strings, for example dump them to console in the
//format you specify:
foreach (var contactId in uniqueContacts.Keys)
{
var sb = new StringBuilder();
sb.Append($"contactId, ");
foreach (var bit in uniqueContacts[contactId])
{
sb.Append(bit);
if (bit != uniqueContacts[contactId].Last())
sb.Append(", ");
}
Console.WriteLine(sb);
}
}
}

Categories

Resources