Handling data quality issues on medium csv report. Best Practices - c#

Need help with a better pratices question
I have an azure function that brings data form differents APIs and match them toguether to create a final csv report. I have a poblation of 60k-100k and 30 columns
For the sake of the explanation, I'm going to use a small School example.
public Student {
string Grade {get; set;}
Name LegName {get; set;}
string FatherName {get; set;}
string TeacherId {get; set;}
string SchoolId {get; set;}
}
public Name {
string FirstName {get; set;}
string LastName {get; set;}
}
Before constructing the report, I create two Dictionary with <Id, Name> from two APIs that expose Schools and Teachers information. And of course, a list of Student that comes from the Student APIs. I have no control of this trhee APIs, design, data quality, nothing.
Now, when I have all the data, I start to create the report.
string GenerateTXT(Dictionary<string, string> schools, Dictionary<string, string> teachers, Student students){
StringBuilder content = new StringBuilder();
foreach(var student in students){
content.Append($"{student.Grade}\t");
content.Append($"{student.LegName.FirstName}\t");
content.Append($"{student.LegName.LastName}\t");
content.Append($"{schools.TryGetValue(student.TeacherId)}\t");
content.Append($"{teachers.TryGetValue(student.SchoolId)}t";
content.Append($"{student.FatherNme}\t");
content.AppendLine();
}
return content.ToString();
}
Now here comes the problem. I started noticing data quality issues so the function started throwing exceptions. For example, students who do not have a valid school or teacher, or a student who does not have a name. I tried to solve expected scenarios and exception handling.
string GenerateTXT(Dictionary<string, string> schools, Dictionary<string, string> teachers, Student students){
StringBuilder content = new StringBuilder();
var value = string.Empty;
foreach(var student in students){
try {
content.Append($"{student.Grade}\t");
content.Append($"{student.LegName.FirstName}\t");
content.Append($"{student.LegName.LastName}\t");
if(teachers.TryGetValue(student.TeacherId))
content.Append($"{teachers[student.TeacherId]}\t");
else
content.Append($"\t");
if(schools.TryGetValue(student.SchoolId))
content.Append($"{schools[student.SchoolId]}\t");
else
content.Append($"\t");
content.Append($"{student.FatherNme}\t");
content.AppendLine();
}
catch(Exception ex) {
log.Error($"Error reading worker {student.FirstName}");
}
}
return content.ToString();
}
The problem with this is that when an unexpected error happens, I stop reading the next columns of data that maybe I have and instead jump to the next worker. Therefore, if a student for some random reason does not have a name, that row in the report will only have the grade, and nothing else, but I actually had the rest of the values. So here comes the question. I could put a try catch on each column, but remember that my real scenario has like 30 columns and could be more... so I think it's a really bad solution. Is there a pattern to solve this in a better way?
Thanks in advance!

So the first bit of advice I am going to give you is to use CsvHelper. This is a tried and true library as it handles all those edge cases you will never think of. So, saying that, give this a shot:
public class Student
{
public string Grade { get; set; }
public Name LegName { get; set; }
public string FatherName { get; set; }
public string TeacherId { get; set; }
public string SchoolId { get; set; }
}
public class Name
{
public string FirstName { get; set; }
public string LastName { get; set; }
}
public class NormalizedData
{
public string Grade { get; set; }
public string FirstName { get; set; }
public string LastName { get; set; }
public string School { get; set; }
public string Teacher { get; set; }
public string FatherName { get; set; }
}
static void GenerateCSVData(CsvHelper.CsvWriter csv, Dictionary<string, string> schools,
Dictionary<string, string> teachers, Student[] students)
{
var normalizedData = students.Select(x => new NormalizedData
{
Grade = x.Grade,
FatherName = x.FatherName,
FirstName = x.LegName?.FirstName, // sanity check incase LegName is null
LastName = x.LegName?.LastName, // ...
School = schools.ContainsKey(x.SchoolId ?? string.Empty) ? schools[x.SchoolId] : null,
Teacher = teachers.ContainsKey(x.TeacherId ?? string.Empty) ? teachers[x.TeacherId] : null
});
csv.WriteRecords(normalizedData);
}
private static string GenerateStringCSVData(Dictionary<string, string> schools,
Dictionary<string, string> teachers, Student[] students)
{
using(var ms = new MemoryStream())
{
using(var sr = new StreamWriter(ms, leaveOpen: true))
using (var csv = new CsvHelper.CsvWriter(sr,
new CsvConfiguration(CultureInfo.InvariantCulture)
{
Delimiter = ",", // change this to "\t" if you want to use tabs
Encoding = Encoding.UTF8
}))
{
GenerateCSVData(csv, schools, teachers, students);
}
ms.Position = 0;
return Encoding.UTF8.GetString(ms.GetBuffer(), 0, (int)ms.Length);
}
}
private static int Main(string[] args)
{
var teachers = new Dictionary<string, string>
{
{ "j123", "Jimmy Carter" },
{ "r334", "Ronald Reagan" },
{ "g477", "George Bush" }
};
var schools = new Dictionary<string, string>
{
{ "s123", "Jimmy Carter University" },
{ "s334", "Ronald Reagan University" },
{ "s477", "George Bush University" }
};
var students = new Student[]
{
new Student
{
FatherName = "Bob Jimmy",
SchoolId = "s477",
Grade = "5",
LegName = new Name{ FirstName = "Apple", LastName = "Jimmy" },
TeacherId = "r334"
},
new Student
{
FatherName = "Jim Bobby",
SchoolId = null, // intentional
Grade = "", // intentional
LegName = null, // intentional
TeacherId = "invalid id" // intentional
},
new Student
{
FatherName = "Mike Michael",
SchoolId = "s123",
Grade = "12",
LegName = new Name{ FirstName = "Peach", LastName = "Michael" },
TeacherId = "g477"
},
};
var stringData = GenerateStringCSVData(schools, teachers, students);
return 0;
}
This outputs:
Grade,FirstName,LastName,School,Teacher,FatherName
5,Apple,Jimmy,George Bush University,Ronald Reagan,Bob Jimmy
,,,,,Jim Bobby
12,Peach,Michael,Jimmy Carter University,George Bush,Mike Michael
So, you can see, one of the students has invalid data in it, but it recovers just fine by placing blank data instead of crashing or throwing exceptions.
Now I haven't seen your original data, so there may be more tweaks you have to make to this to cover all edge cases, but it will be a lot easier to tweak this when using CsvHelper as your writer.

Related

Seeding with ASP.NET Core with one to many relationships

I am trying to create a seed for my database in ASP.NET Core but I am having trouble with the relationships between the models. I have 3 models with 2 relationships. I have the following models:
public enum Grade
{
A, B, C, D, F
}
public class Enrollment
{
public Guid ID { get; set; } = Guid.NewGuid();
public Course Course { get; set; }
public Student Student { get; set; }
public Grade Grade { get; set; }
}
public class Course
{
public Guid ID { get; set; } = Guid.NewGuid();
public string Title { get; set; }
public int Credits { get; set; }
public List<Enrollment>? Enrollments { get; set; }
}
public class Student
{
public Guid ID { get; set; } = Guid.NewGuid();
public string LastName { get; set; }
public string FirstName { get; set; }
public DateTime EnrollmentDate { get; set; } = DateTime.Now;
public List<Enrollment>? Enrollments { get; set; }
}
On my DBContext I try to create the seed:
List<Student> students = new List<Student>()
{
new Student {FirstName = "Jaimie", LastName = "Vos", EnrollmentDate = DateTime.Now },
new Student {FirstName = "Bas", LastName = "Milius", EnrollmentDate = DateTime.Now },
new Student {FirstName = "Rien", LastName = "Bijl", EnrollmentDate = DateTime.Now },
new Student {FirstName = "Rajeck", LastName = "Massa", EnrollmentDate = DateTime.Now }
};
modelBuilder.Entity<Student>().HasData(students);
List<Course> courses = new List<Course>()
{
new Course {Title = "Wiskunde", Credits = 20},
new Course {Title = "Nederlands", Credits = 15},
new Course {Title = "Frans", Credits = 10},
};
modelBuilder.Entity<Course>().HasData(courses);
Enrollment test = new Enrollment();
test.Grade = Grade.A;
test.Course = courses[0];
test.Student = students[1];
modelBuilder.Entity<Enrollment>().HasData(test);
But when I run this I get the error:
The seed entity for entity type 'Enrollment' cannot be added because no value was provided for the required property 'CourseID'.
I followed the documentation for relations, does someone know a way to fix this issue?
I've found that you can just create a new object (not specifying the type) and give it a property specifying the related ID. So you should be able to do something like the following:
modelBuilder.Entity<Course>().HasData(new [] {
new { Title = "Frans", Credits = 10, Id = <courseGUID1> },
new { Title = "Nederlands", Credits = 15, Id = <courseGUID2> }
});
modelBuilder.Entity<Enrollment>().HasData(new [] {
new { Grade = Grade.A, CourseId = <courseGUID1>, StudentId = <studentGUID1> }
});
In conclusion, I hate seed data, but have used it a fair amount and the above solution seems to get me by. It's not intuitive and would be nice if the code you had worked!
I would assume that the IDs should be generated when created, but obviously that doesn't work.
PS: If using auto-incrementing integer IDs, I usually just use negative integers to avoid conflict with the generated IDs.

Adding an Array of Objects to another Class c#

I am attemping to read a text file in the format of
(The # at end is just the number of classes they're in, but I dont save the course name with the fac/students class)
Course Biology
Faculty Taylor Nate 0
Student Doe John 3
Student Sean Big 0
Course Art
Faculty Leasure Dan 1
The first input should be a course, followed by the faculty and students of the specific course. The Course class should contain a collection of faculty members and a collection of students.
I have been able to put each course/student/faculty into their respective class, but I am having trouble visualizing a way to add the students/faculty to the course.
My current idea putting the data into their respective classes would be to keep the current index of the course- therefore I have it saved as
courses[currentCourse++]
so when I parse the next line, (being a faculty/student) I already know what the course index should be.
using (StreamReader reader = new StreamReader(fileName))
{
while (!reader.EndOfStream)
{
lineCounter++;
line = reader.ReadLine();
string[] words = line.Split(' ');
Console.WriteLine(words[0]);
if (words[0] == "Course")
{
string nameOfCourse = words[1];
courses[currentCourse++] = new Course
{
Name = nameOfCourse
};
}
if (words[0] == "Faculty")
{
string firstName = words[1];
string lastName = words[2];
string numOfClasses = words[3];
faculty[currentFaculty++] = new Faculty
{
FirstName = firstName,
LastName = lastName,
NumOfClasses = numOfClasses,
};
}
if (words[0] == "Student")
{
string firstName = words[1];
string lastName = words[2];
string numOfClasses = words[3];
students[currentStudent++] = new Student
{
FirstName = firstName,
LastName = lastName,
NumOfClasses = numOfClasses,
};
}
I know the problem lies in the courses class itself- but i'm not sure the terminology to add a class to another class.
public class Course
{
public override string ToString()
{
return $"{Name}";
}
public string Name { get; set; }
}
public class Student
{
public override string ToString()
{
return $"{FirstName} {LastName} {NumOfClasses}";
}
public string FirstName { get; set; } = string.Empty;
public string LastName { get; set; } = string.Empty;
public string NumOfClasses { get; set; } = string.Empty;
}
Thanks for reading!
You want to add a collection of Student and Faculty to the course class, correct? You can do so like this by simply adding a List<T> to your Course class and then initializing it in a constructor.
public class Course
{
public override string ToString()
{
return $"{Name}";
}
public string Name { get; set; }
public List<Student> Students { get; set; }
public List<Faculty> FacultyMems { get; set; }
public Course()
{
Students = new List<Student>();
FacultyMems = new List<Faculty>();
}
}
And in your using block, you can add each student/faculty to the course as so:
if (words[0] == "Course")
{
string nameOfCourse = words[1];
currentCourse++;
courses[currentCourse] = new Course
{
Name = nameOfCourse
};
}
if (words[0] == "Faculty")
{
string firstName = words[1];
string lastName = words[2];
string numOfClasses = words[3];
courses[currentCourse].FacultyMems.Add(new Faculty
{
FirstName = firstName,
LastName = lastName,
NumOfClasses = numOfClasses,
});
}
if (words[0] == "Student")
{
string firstName = words[1];
string lastName = words[2];
string numOfClasses = words[3];
courses[currentCourse].Students.Add(new Student
{
FirstName = firstName,
LastName = lastName,
NumOfClasses = numOfClasses,
});
}
With this, each time you encounter "Course" your course list will add a new item and then you can append students/faculty/etc when those values occur.
This can be simplified even further but the concept is there for you to follow. Hope this helps.
If I'm understanding you correctly, you want your courses to have a list of faculty and students?
public class Course
{
public override string ToString()
{
return $"{Name}";
}
public string Name { get; set; }
public List<Student> Students { get; set; }
public List<Faculty> FacultyMembers {get; set;}
}
Just be sure to initialize the Lists before trying to add things to them otherwise you'll get a null ref exception.

Inner join with two select clauses LinQ MVC

Query result from search
Greetings, i am new using linq syntax and i need help translating the query in the picture to get the needed result in c#. I have two questions. First of all How do i do inner joins using linq syntax in c# in order to get the desired result showed in the image. Second, in order to show the data obtained from the query, do i need to create a ViewModel that has 3 ViewModels from the different tables used in the query search?
Thank you so very much for your help.
As levelonehuman said, linq is designed to query data. lets say you have a couple classes:
public class Person
{
public static class Factory
{
private static int currentId = 0;
public static Person Create(string firstName, string lastName, string phoneNumber, int companyId)
{
return new Person()
{
Id = ++currentId,
FirstName = firstName,
LastName = lastName,
PhoneNumber = phoneNumber,
CompanyId = companyId
};
}
}
public int Id { get; private set; }
public string FirstName { get; private set; }
public string LastName { get; private set; }
public string PhoneNumber { get; private set; }
public int CompanyId { get; private set; }
}
public class Company
{
public static class Factory
{
private static int companyId=0;
public static Company Create(string name, string city, string state, string phoneNumber)
{
return new Company()
{
Id = ++ companyId,
City = city,
State = state,
Name = name,
PhoneNumber = phoneNumber
};
}
}
public int Id { get; set; }
public string Name { get; set; }
public string City { get; set; }
public string State { get; set; }
public string PhoneNumber { get; set; }
}
and then you want to see only people from a certain area code you could do something like this:
class Program
{
static void Main(string[] args)
{
var companies = new[]
{
Company.Factory.Create("ABC", "Indianapolis", "In", "(317) 333 5555"),
Company.Factory.Create("Def", "Bloominton", "In", "(812) 333 5555"),
};
var people = new[]
{
Person.Factory.Create("Jane", "Doe", "(317) 555 7565", 1),
Person.Factory.Create("Paul", "Smith", "(812) 555 7565", 2),
Person.Factory.Create("Sean", "Jackson", "(317) 555 7565", 2),
Person.Factory.Create("Jenny", "Gump", "(812) 555 7565", 1)
};
var peopleFromIndianapolis =
(
from company in companies
join person in people on company.Id equals person.CompanyId
where person.PhoneNumber.StartsWith("(317)")
orderby person.LastName, person.FirstName
select new
{
person.FirstName,
person.LastName,
company.Name
}
).ToList();
foreach (var person in peopleFromIndianapolis)
{
Console.WriteLine($"PersonName: {person.LastName}, {person.FirstName} - Company:{person.Name}");
}
}
}
Hope this helps!

Add values to Dictionary<int,class> not all at once

I'm trying to store data in a dictionary where the key is an ID and the values are of a class type. The class properties are not all added at the same time so I haven't used a constructor - unless there is a way to add new values using a constructor at a different times? The code below compiles, but I get a run time error saying the key has already been added.Thanks for the help.
public class Students
{
public string FirstName { get; set; }
public string SurName { get; set; }
public int Age { get; set; }
public double Score { get; set; }
}
public void cmdTEST_Click(object sender, EventArgs e)
{
Dictionary<int, Students> Data = new Dictionary<int, Students>();
Data.Add(5, new Students { FirstName = "Bob" });
Data.Add(5, new Students { Age = 34 }); // run time error - "key already added"
Data.Add(5, new Students { Score = 62 });
// extract data
double Score5 = Data[5].Score;
double Age5 = Data[5].Age;
}
You are adding same key multiple times which is not allowed. You can add all properties at once like below
Dictionary<int, Students> Data = new Dictionary<int, Students>();
Data.Add(5, new Students { FirstName = "Bob", Age = 34, Score = 62 });
And if you want to add values later you can use key to add values
Data.Add(5, new Students { FirstName = "Bob"});
Data[5].Age = 34;
Data[5].Score = 62;

Storing more than two values with the NameValueCollection class

I have a set of string that I need to store in a set, such as:
id, firstname, lastname, city, country, language
All of the above apply to a single person (represented by the ID)
Now I have 60 - 70 of these (and growing), how could I organize them? I have looked at the NameValueCollection class - and it does exactly what I want (if I only had two fields), but since I have 6 fields, I can't use it. E.g.:
public NameValueCollection personCollection = new NameValueCollection
{
{ "harry", "townsend", "london", "UK", "english" },
{ "john", "cowen", "liverpool", "UK", "english" },
// and so on...
};
Although this does not work :( Could someone suggest another way of achieving this?
how about you make a Person class with the attributes you need?
public class Person
{
public int id { get; set; }
public string firstname { get; set; }
public string lastname { get; set; }
// more attributes here
}
then, just instantiate the Person class and make new Person objects.
You can then add those Persons to a List.
Person somePerson = new Person();
somePerson.firstname = "John";
somePerson.lastname = "Doe";
somePerson.id = 1;
List<Person> listOfPersons = new List<Person>();
listOfPersons.Add(somePerson);
If you absolutely don’t want to create any new classes, you could use a dictionary of lists, keyed by your ID:
IDictionary<string, IList<string>> personCollection =
new Dictionary<string, IList<string>>
{
{ "1", new [] { "harry", "townsend", "london", "UK", "english" }},
{ "2", new [] { "john", "cowen", "liverpool", "UK", "english" }},
};
…which you could then access using dictionary and list indexers:
Console.WriteLine(personCollection["1"][0]); // Output: "harry"
Console.WriteLine(personCollection["2"][2]); // Output: "liverpool"
However, the correct OOP approach would be to define a class with properties for your respective strings:
public class Person
{
public string Id { get; set; }
public string FirstName { get; set; }
public string LastName { get; set; }
public string City { get; set; }
public string Country { get; set; }
public string Language { get; set; }
public Person() { }
public Person(string id, string firstName, string lastName,
string city, string country, string language)
{
this.Id = id;
this.FirstName = firstName;
this.LastName = lastName;
this.City = city;
this.Country = country;
this.Language = language;
}
}
You could then create a list of persons:
IList<Person> persons = new List<Person>()
{
new Person("1", "harry", "townsend", "london", "UK", "english"),
new Person("2", "john", "cowen", "liverpool", "UK", "english"),
};

Categories

Resources