Import csv into mongo db - c#

I have to import my file excel csv to mongodb, but I want use code c#.
public class AnimalRetriever : IAnimalRetriever
{
private readonly MongoClient _mongoClient;
public AnimalRetriever()
{
_mongoClient = new MongoClient("mongodb://localhost:27017");
}
private List<Animal> GetByContinent(string continent)
{
_mongoClient.GetDatabase("local")
.GetCollection<Animal>("Animal")
.ReplaceOne(
filter: new BsonDocument("Continent", continent),
options: new UpdateOptions { IsUpsert = true },
replacement: animal.csv); //file di testo da leggere invece di newDoc(csv extension)
return _mongoClient.GetDatabase("local")
.GetCollection<Animal>("Continent")
.Find("{\"Continent\":\"" + continent + "\"}")
.ToList();
}

using System;
using System.Collections.Generic;
using System.IO;
using FactoryExample.Continent;
using MongoDB.Bson;
using MongoDB.Driver;
using CsvHelper;
namespace FactoryExample
{
class MainApp
{
public static void Main()
{ try
{
var client = new MongoClient("mongodb://localhost:27017");
var db = client.GetDatabase("local");
var coll = db.GetCollection<BsonDocument>("Animal");
var reader = new StreamReader("animal.csv");
var csv = new CsvReader(reader);
csv.Configuration.HasHeaderRecord = true;
var records = csv.GetRecords<Animals>();
var dizionario = new Dictionary<string,Animal>();
foreach (var animal in records)
{
if (dizionario.ContainsKey(animal.Continent))
{
dizionario[animal.Continent].Carnivor.Add(animal.Carnivor);
dizionario[animal.Continent].Herbivor.Add(animal.Herbivor);
}
else
{
var newanimal = new Animal
{
Continent = animal.Continent,
Carnivor = new List<string>(),
Herbivor = new List<string>()
};
newanimal.Carnivor.Add(animal.Carnivor);
newanimal.Herbivor.Add(animal.Herbivor);
dizionario.Add(newanimal.Continent,newanimal);
}
}
}
catch (Exception err)
{
Console.WriteLine("Error!!");
Console.WriteLine(err.Message);
}
Console.ReadKey();
//Console.Read();
var continentFactory = ContinentFactory.Get(ContinentType.AMERICA);
var carnivore = continentFactory.GetCarnivore();
var herbivore = continentFactory.GetHerbivore();
foreach (var h in herbivore)
{
Console.WriteLine(h);
}
foreach (var c in carnivore)
{
Console.WriteLine(c);
}
Console.ReadKey();
}
}
}

Related

After serialization only the latest Customer is saved

I need to build a console app that allows a user to input (via terminal) details of a new user. This then needs to be written into XML through serialization (required). I constructed my customer class and have a method for building the new user - but it will always forget the last entry and only write one instance of user into my list.
Here is the method I built for adding the user:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Xml.Serialization;
namespace Hausarbeit_Autovermietung_Gierow
{
class Addcustomer
{
public void AddCustomer()
{
var customer = new Customer();
List<Customer> customers = new List<Customer>() { customer };
//DeserializeFromXML(customer);
DeserializeFromXML(customers);
var listCount = customers.Count;
int maxID = FindMaxValue(customers, x => x.ID);
Console.WriteLine("Vorname eingeben");
customer.Firstname = Console.ReadLine();
Console.WriteLine("Nachname eingeben");
customer.Lastname = Console.ReadLine();
// int ID;
customer.ID = maxID + 1;
//List<Customer> customers = new List<Customer>() { customer };
SerializeToXML(customer);
SerializeToXML(customers);
foreach (var cust in customers)
{
Console.WriteLine($"Vorname: {cust.ID} {cust.Firstname} {cust.Lastname}");
Console.ReadKey();
}
}
static public void SerializeToXML(Customer customer)
{
XmlSerializer serializer = new XmlSerializer(typeof(Customer));
using (TextWriter textWriter = new StreamWriter(#"customer.xml"))
{
serializer.Serialize(textWriter, customer);
}
}
public static void SerializeToXML(List<Customer> customers)
{
//var customer = new Customer("Vorname", "Nachname");
XmlSerializer serializer = new XmlSerializer(typeof(List<Customer>));
using (System.IO.TextWriter textWriter = new System.IO.StreamWriter(#"List.xml"))
{
serializer.Serialize(textWriter, customers);
}
}
static List<Customer> DeserializeFromXML(List<Customer> customers)
{
XmlSerializer deserializer = new XmlSerializer(typeof(List<Customer>));
List<Customer> customerslist;
using (TextReader textReader = new StreamReader(#"List.xml"))
{
customerslist = (List<Customer>)deserializer.Deserialize(textReader);
return customerslist;
}
}
public int FindMaxValue<T>(List<T> list, Converter<T, int> projection)
{
if (list.Count == 0)
{
throw new InvalidOperationException("Empty list");
}
int maxValue = int.MinValue;
foreach (T item in list)
{
int value = projection(item);
if (value > maxValue)
{
maxValue = value;
}
}
return maxValue;
}
}
}
Here you create a list containing only your new customer:
List<Customer> customers = new List<Customer>() { customer };
Then you override it with a new instance by deserializing the xml:
DeserializeFromXML(customers);
But you never add the new customer (add this line):
customers.Add(customer);
So when you serialize, then new customer will be part of the list:
SerializeToXML(customers);

How to read a CSV file from SFTP and use CSVHelper to parse the content without saving CSV locally

How to read a CSV file from SFTP and use CSVHelper to parse the content without saving CSV locally?
Is this possible, or do we have to save it locally, parse and delete the file?
I am using SSH.Net and CSVHelper.
It needs to rely on Stream-processing of file:
public async Task ProcessRemoteFilesAsync()
{
var credentials = new Credentials("host", "username", "password");
var filePaths = new List<string>();
// initializing filePaths ..
var tasks = filePaths
.Select(f => ParseRemoteFileAsync(credentials, f))
.ToArray();
var results = await Task.WhenAll(tasks).ConfigureAwait(false);
// traverse through results..
}
public async Task<FileContent> ParseRemoteFileAsync(Credentials credentials, string filePath)
{
using (var sftp = new SftpClient(credentials.host, credentials.username, credentials.password))
{
sftp.Connect();
try
{
using (var remoteFileStream = sftp.OpenRead(filePath))
{
using (var reader = new StreamReader(remoteFileStream))
{
using (var csv = new CsvReader(reader))
{
/*
// Example of CSV parsing:
var records = new List<Foo>();
csv.Read();
csv.ReadHeader();
while (csv.Read())
{
var record = new Foo
{
Id = csv.GetField<int>("Id"),
Name = csv.GetField("Name")
};
records.Add(record);
}
*/
}
}
}
}
finally {
sftp.Disconnect();
}
}
}
Modified version that uses pool of SftpClient
See C# Object Pooling Pattern implementation.
Implementation of pool borrowed from How to: Create an Object Pool by Using a ConcurrentBag:
/// <summary>
/// Implementation borrowed from [How to: Create an Object Pool by Using a
/// ConcurrentBag](https://learn.microsoft.com/en-us/dotnet/standard/collections/thread-safe/how-to-create-an-object-pool).
/// </summary>
/// <typeparam name="T"></typeparam>
public class ObjectPool<T> : IDisposable
where T : IDisposable
{
private readonly Func<T> _objectGenerator;
private readonly ConcurrentBag<T> _objects;
public ObjectPool(Func<T> objectGenerator)
{
_objectGenerator = objectGenerator ?? throw new ArgumentNullException(nameof(objectGenerator));
_objects = new ConcurrentBag<T>();
}
public void Dispose()
{
while (_objects.TryTake(out var item))
{
item.Dispose();
}
}
public T GetObject()
{
return _objects.TryTake(out var item) ? item : _objectGenerator();
}
public void PutObject(T item)
{
_objects.Add(item);
}
}
The simplest Pool-based implementation (it doesn't care about exception processing, retry-policies):
internal class SftpclientTest
{
private readonly ObjectPool<SftpClient> _objectPool;
public SftpclientTest(Credentials credentials)
{
_objectPool = new ObjectPool<SftpClient>(() =>
{
var client = new SftpClient(credentials.host, credentials.username, credentials.password);
client.Connect();
return client;
});
}
public void GetDirectoryList()
{
var client = _objectPool.GetObject();
try
{
// client.ListDirectory() ..
}
finally
{
if (client.IsConnected)
{
_objectPool.PutObject(client);
}
}
}
public async Task ProcessRemoteFilesAsync()
{
var filePaths = new List<string>();
// initializing filePaths ..
var tasks = filePaths
.Select(f => ParseRemoteFileAsync(f))
.ToArray();
var results = await Task.WhenAll(tasks).ConfigureAwait(false);
// traverse through results..
}
public Task<FileContent> ParseRemoteFileAsync(string filePath)
{
var client = _objectPool.GetObject();
try
{
using (var remoteFileStream = client.OpenRead(filePath))
{
using (var reader = new StreamReader(remoteFileStream))
{
using (var csv = new CsvReader(reader))
{
// ..
}
}
return Task.FromResult(new FileContent());
}
}
finally
{
if (client.IsConnected)
{
_objectPool.PutObject(client);
}
}
}
}

C# Duplicates <k, v>

I am new to learning C# and have a question.
I have a txt file with tests and scores like below
ACT
21.0
SAT
478.9
CLEP
69.1
ACT 32.0
How do I parse this txt to dictionary and display as below (removing any duplicates)
ACT 21.0
SAT 478.9
CLEP 69.1
Here is what I have attempted
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using System.Text.RegularExpressions;
namespace Generate
{
class generateInputStream
{
static void Main(string[] args)
{
FileManager objfileManager = new FileManager();
FileStream fs = null;
Console.Write("Enter the file path: ");
while (fs == null)
{
string Path = Console.ReadLine();
fs = objfileManager.OpenFile(Path);
}
int
}
}
public class FileManager
{
public FileStream OpenFile(string Path)
{
try
{
return new FileStream(Path, FileMode.Open, FileAccess.Read);
}
catch (Exception e)
{
Console.Write("Problem opening file {0}, please enter a valid path: ", Path);
}
return null;
}
public List<string> ReadLines(FileStream File)
{
List<string> text = new List<string>();
try
{
var streamReader = new StreamReader(File);
}
catch (Exception e)
{
Console.Write(e.Message);
}
return text;
}
static readFileIntoDictionary()
{
StreamReader generateInputStream;
var streamReader = new StreamReader();
SortedDictionary<string, double> dic = new SortedDictionary<string, double>();
string Key = string.Empty;
double Value = 0.0;
while ((Key = streamReader.ReadLine())!= null)
{
Value = Convert.ToInt32(streamReader.ReadLine());
dic.Add(Key, Value);
}
streamReader.Close();
return dic;
}
static displayScoreData()
{
readFileIntoDictionary();
foreach (KeyValuePair<string, double> pair in dic)
{
Console.WriteLine(pair.Key, "-",pair.Value);
}
}
}
}
This method should help you a bit. There is no error handling here, so we assume your file is always in correct format.
public static KeyValuePair<string, decimal>? ReadPair (StreamReader sr)
{
if (sr.EndOfStream) return null;
string key = sr.ReadLine ();
decimal value = decimal.Parse (sr.ReadLine ());
return new KeyValuePair<string, decimal> (key, value);
}
I don't know what is a duplicate for you, the same key? The same key + value?

Turn BsonArray into List<T>

I've been working at this and have managed to get the json parsed into a C# object. Now the next step was to parse a json array into a List. I have managed to get the job done but I'm pretty sure there is a better way to convert from BsonArray to a List
using (StreamReader file = File.OpenText(filename))
{
try
{
//first up convert to bson
var jsonSampleData = file.ReadToEnd();
//var bsonSampleData = BsonDocument.Parse(jsonSampleData);
//this would be for a single BSOnDocument
var bsonSampleData = BsonSerializer.Deserialize<BsonArray>(jsonSampleData);
var x = bsonSampleData.ToList();
List<ThePlan> lst = new List<ThePlan>();
foreach (var doc in x)
{
var t = BsonSerializer.Deserialize<ThePlan>(doc.AsBsonDocument);
lst.Add(t);
}
}
catch (Exception ex)
{
throw;
}
Edit-Additional Information
To be clear what I am needing to accomplish is taking the given json document and rehydrate it to List. This is further complicated by my being new to mongo and T is a mongo entity representation.
As Andrei pointed out it works fine:
using (StreamReader file = File.OpenText(filename))
{
var jsonSampleData = file.ReadToEnd();
_thePlan = BsonSerializer.Deserialize<List<ThePlan>>(jsonSampleData);
}
Thinking about my struggles yesterday I think it actually had to do with my json where on my early attempts it looked like this:
{
"_id": "57509afbc6b48d3f33b2dfcd",
...
}
In the process of figuring it all out my json matured to:
{
"_id": { "$oid": "57509afbc6b48d3f33b2dfcd" },
.....
}
The troubles I was having with BsonSerializer was likely my bad json and once that was worked out I wasn't astute enough to go back to the BsonSerielizer and try again.
Either go strongly typed all the way or not typed at all.
strongly typed
Assuming these are your types:
public class BaseObject {
[BsonId] public ObjectId id { get; set; }
[BsonElement("plans")] public List<ThePlan> Plans { get; set; }
}
public class ThePlan {
[BsonElement("i")] public int Integer { get; set; }
[BsonElement("s")] public string String { get; set; }
}
and these test utilities:
void ToJsonTyped(BaseObject bo)
{
var sb = new StringBuilder();
using (TextWriter tw = new StringWriter(sb))
using (BsonWriter bw = new JsonWriter(tw))
{
BsonSerializer.Serialize<BaseObject>(bw, bo);
}
string jsonObject = sb.ToString();
BaseObject bo2 = BsonSerializer.Deserialize<BaseObject>(jsonObject);
Assert.AreEqual(bo, bo2);
}
void ToBsonTyped(BaseObject bo)
{
byte[] bsonObject = null;
using (var ms = new MemoryStream())
using (BsonWriter bw = new BsonBinaryWriter(ms))
{
BsonSerializer.Serialize<BaseObject>(bw, bo);
bsonObject = ms.ToArray();
}
BaseObject bo1 = BsonSerializer.Deserialize<BaseObject>(bsonObject);
Assert.AreEqual (bo, bo1);
}
you can test:
BaseObject bo = new BaseObject() {
Plans = new List<ThePlan>() {
new ThePlan() {Integer=1, String="one" },
new ThePlan() {Integer=2, String="two" },
new ThePlan() {Integer=3, String="three" } } };
ToBsonTyped(bo);
ToJsonTyped(bo);
not typed at all, combo of BsonDocument and BsonArray
test:
BsonDocument doc = new BsonDocument();
var bsonArray = new BsonArray();
bsonArray.Add(new BsonDocument("one", 1));
bsonArray.Add(new BsonDocument("two", 2));
bsonArray.Add(new BsonDocument("three", 3));
doc.Add( new BsonElement("plans", bsonArray));
ToBsonUnTyped(doc);
ToJsonUnTyped(doc);
test utils:
void ToBsonUnTyped(BsonDocument doc) {
byte[] bsonObject = null;
using (var ms = new MemoryStream())
using (BsonWriter bw = new BsonBinaryWriter(ms))
{
BsonSerializer.Serialize<BsonDocument>(bw, doc);
bsonObject = ms.ToArray();
}
BsonDocument docActual = BsonSerializer.Deserialize<BsonDocument>(bsonObject);
Assert.AreEqual (doc, docActual);
}
void ToJsonUnTyped(BsonDocument doc)
{
var sb = new StringBuilder();
using (TextWriter tw = new StringWriter(sb))
using (BsonWriter bw = new JsonWriter(tw))
{
BsonSerializer.Serialize<BsonDocument>(bw, doc);
}
string jsonObject = sb.ToString();
BsonDocument doc2 = BsonSerializer.Deserialize<BsonDocument>(jsonObject);
Assert.AreEqual(doc, doc2);
}

Elastic Search to search for words that starts with phrase

I'm trying to create a search function for my website using Elastic Search and NEST. You can see my code below and I get results if I search for complete (and almost comlete) words.
Ie, if I search for "Buttermilk" or "Buttermil" I get a hit on my document containing the word "Buttermilk".
However, what I try to accomplish is if I search for "Butter", I should have a result with all three documents which have words that starts with "Butter". I thought this was solved by using FuzzyLikeThis?
Can anyone see what I'm doing wrong and point me in the right direction?
I created a console-app and the complete code you can see here:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Nest;
using Newtonsoft.Json;
namespace ElasticSearchTest
{
class Program
{
static void Main(string[] args)
{
var indexSettings = new IndexSettings();
indexSettings.Analysis.Analyzers["text-en"] = new SnowballAnalyzer { Language = "English" };
ElasticClient.CreateIndex("elastictesting", indexSettings);
var testItem1 = new TestItem {
Id = 1,
Name = "Buttermilk"
};
ElasticClient.Index(testItem1, "elastictesting", "TestItem", testItem1.Id);
var testItem2 = new TestItem {
Id = 2,
Name = "Buttercream"
};
ElasticClient.Index(testItem2, "elastictesting", "TestItem", testItem2.Id);
var testItem3 = new TestItem {
Id = 3,
Name = "Butternut"
};
ElasticClient.Index(testItem3, "elastictesting", "TestItem", testItem3.Id);
Console.WriteLine("Write search phrase:");
var searchPhrase = Console.ReadLine();
var searchResults = Search(searchPhrase);
Console.WriteLine("Number of search results: " + searchResults.Count());
foreach (var item in searchResults) {
Console.WriteLine(item.Name);
}
Console.WriteLine("Press any key to exit");
Console.ReadKey();
}
private static List<TestItem> Search(string searchPhrase)
{
var query = BuildQuery(searchPhrase);
var result = ElasticClient
.Search(query)
.Documents
.Select(d => d)
.Distinct()
.ToList();
return result;
}
public static ElasticClient ElasticClient
{
get
{
var localhost = new Uri("http://localhost:9200");
var setting = new ConnectionSettings(localhost);
setting.SetDefaultIndex("elastictesting");
return new ElasticClient(setting);
}
}
private static SearchDescriptor<TestItem> BuildQuery(string searchPhrase)
{
var querifiedKeywords = string.Join(" AND ", searchPhrase.Split(' '));
var filters = new BaseFilter[1];
filters[0] = Filter<TestItem>.Bool(b => b.Should(m => m.Query(q =>
q.FuzzyLikeThis(flt =>
flt.OnFields(new[] {
"name"
}).LikeText(querifiedKeywords)
.PrefixLength(2)
.MaxQueryTerms(1)
.Boost(2))
)));
var searchDescriptor = new SearchDescriptor<TestItem>()
.Filter(f => f.Bool(b => b.Must(filters)))
.Index("elastictesting")
.Type("TestItem")
.Size(500);
var jsons = JsonConvert.SerializeObject(searchDescriptor, new JsonSerializerSettings { NullValueHandling = NullValueHandling.Ignore });
return searchDescriptor;
}
}
class TestItem {
public int Id { get; set; }
[ElasticProperty(Analyzer = "text-en", Index = FieldIndexOption.analyzed)]
public string Name { get; set; }
}
}
Edited 2014-04-01 11:18
Well, I ended up using MultiMatch and QueryString, so this it how my code looks now. Hope it mey help anyone in the furure. Also, I added a Description property to my TestItem to illustrate multimatch.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Nest;
using Newtonsoft.Json;
namespace ElasticSearchTest
{
class Program
{
static void Main(string[] args)
{
var indexSettings = new IndexSettings();
ElasticClient.CreateIndex("elastictesting", indexSettings);
var testItem1 = new TestItem {
Id = 1,
Name = "Buttermilk",
Description = "butter with milk"
};
ElasticClient.Index(testItem1, "elastictesting", "TestItem", testItem1.Id);
var testItem2 = new TestItem {
Id = 2,
Name = "Buttercream",
Description = "Butter with cream"
};
ElasticClient.Index(testItem2, "elastictesting", "TestItem", testItem2.Id);
var testItem3 = new TestItem {
Id = 3,
Name = "Butternut",
Description = "Butter with nut"
};
ElasticClient.Index(testItem3, "elastictesting", "TestItem", testItem3.Id);
Console.WriteLine("Write search phrase:");
var searchPhrase = Console.ReadLine();
var searchResults = Search(searchPhrase);
Console.WriteLine("Number of search results: " + searchResults.Count());
foreach (var item in searchResults) {
Console.WriteLine(item.Name);
Console.WriteLine(item.Description);
}
Console.WriteLine("Press any key to exit");
Console.ReadKey();
}
private static List<TestItem> Search(string searchPhrase)
{
var query = BuildQuery(searchPhrase);
var result = ElasticClient
.Search(query)
.Documents
.Select(d => d)
.Distinct()
.ToList();
return result;
}
public static ElasticClient ElasticClient
{
get
{
var localhost = new Uri("http://localhost:9200");
var setting = new ConnectionSettings(localhost);
setting.SetDefaultIndex("elastictesting");
return new ElasticClient(setting);
}
}
private static SearchDescriptor<TestItem> BuildQuery(string searchPhrase)
{
var searchDescriptor = new SearchDescriptor<TestItem>()
.Query(q => q
.MultiMatch(m =>
m.OnFields(new[] {
"name",
"description"
}).QueryString(searchPhrase).Type(TextQueryType.PHRASE_PREFIX)
)
)
.Index("elastictesting")
.Type("TestItem")
.Size(500);
var jsons = JsonConvert.SerializeObject(searchDescriptor, new JsonSerializerSettings { NullValueHandling = NullValueHandling.Ignore });
return searchDescriptor;
}
}
class TestItem {
public int Id { get; set; }
public string Name { get; set; }
public string Description { get; set; }
}
}
Instead of using FuzzyLikequery.. use prefix query its more fast and accurate..!
for more information refer
curl -XPOST "http://localhost:9200/try/indextype/_search" -d'
{
"query": {
"prefix": {
"field": {
"value": "Butter"
}
}
}
}'
create above query in NEST and try again..!
This has nothing to do with FuzzyLikeThis.
You can use prefixquery as suggested by #BlackPOP out of the box.
You could also opt for using EdgeNGrams, this will tokenize your input on index-time. The result faster performance as compared to prefixquery, offset against increased index size.
One thing to keep in mind is that prefixquery only works on non-analyzed fields, so if you want to do any anaylzing at indexing-time, you're probably better off using EdgeNGrams.
Please read up on anaylzers etc, if you don't know what they are.
Some refs:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-analyzers.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html
See How can I do a prefix search in ElasticSearch in addition to a generic query string? for a similar question.

Categories

Resources