How do I find related lectures? - c#

I have a web site that shows info about lectures that are available. Each lecture has a title, an associated speaker, and (potentially) multiple categories. The database schema looks something like this (warning: this is air-code, as I don't have the database in front of me)...
create table Lectures (
ID int not null identity(1,1) primary key,
Title varchar(max) not null default '',
SpeakerID int not null foreign key references Speakers(ID)
)
create table Categories (
ID int not null identity(1,1) primary key,
Name varchar(max) not null default ''
)
create table Lectures_Categories (
ID int not null identity(1,1) primary key,
LectureID int not null foreign key references Lectures(ID),
CategoryID int not null foreign key references Categories(ID)
)
When viewing details about a lecture, I would like to be able to recommend related lectures, but am not sure how to code this. My initial thought is that the following criteria would be used to calculate relevance (most important first)...
Common categories - ie the more categories shared by the two lectures, the more likely they are to be related
Similarity in title - ie the more words shared by the two lectures, the more likely they are to be related.
Same speaker
If two lectures were equally ranked according to the above criteria, I would like to rank newer ones above older ones.
Anyone any idea how I would go about coding this? I'm doing this in C#, using an Entity Framework model against an SQL Server database if any of that is relevant.

Let me sktech out the basic idea: Assuming all three criteria can be expressed in sql queries you should get weighed result sets which you then union together.
The first would simply be select ID, 10 as weight from lectures where ID <> ourLectureID and speakerID = ourSpeakerID
The second will be a join over Lectures and Topics with a lesser weight, maybe 4.
Let's ignore the problems with the 3rd query for now.
Now that we have a set result1 of IDs and weights we do a group & sum. My sql is rather rusty today, but I'm thinking of something like this: select max(ID), sum(weight) as ranking from result1 group by ID order by ranking.. Done!
Now I haven't touched SQL server in almost 20 years ;-) but I think it is not well suited for creating the 3rd query. And the db designer will only give you the funny look and tell you that querying the Title is bad bad bad; and 'why didn't you add a keywords table..??
If you don't want to da that, as I assume you can pull all Titles into your C# application and use its string/collections/LINQ abilities to filter out interesting words and create the third query with a third ranking; maybe only capitalized words with 4 letters or more..?
Update
Here is a tiny example of how you can find a best fitting line among a list of lines:
List<string> proverbs = new List<string>();
List<string> cleanverbs = new List<string>();
List<string> noverbs = new List<string>();
private void button1_Click(object sender, EventArgs e)
{
noverbs.AddRange(new[] { "A", "a", "by", "of", "all", "the", "The",
"it's", "it", "in", "on", "is", "not", "will", "has", "can", "under" });
proverbs = File.ReadLines("D:\\proverbs\\proverbs.txt").ToList();
cleanverbs = proverbs.Select(x => cleanedLine(x)).ToList();
listBox1.Items.AddRange(proverbs.ToArray());
listBox2.Items.AddRange(cleanverbs.ToArray());
}
string cleanedLine(string line)
{
var words = line.Split(' ');
return String.Join(" ", words.ToList().Except(noverbs) );
}
int countHits(string line, List<string> keys)
{
var words = line.Split(' ').ToList();
return keys.Count(x => words.Contains(x));
}
private void listBox2_SelectedIndexChanged(object sender, EventArgs e)
{
string line = listBox2.SelectedItem.ToString();
int max = 0;
foreach (string proverb in cleanverbs)
{
var keys = proverb.Split(' ').ToList();
int count = countHits(line, keys);
if (count > max && proverb != line)
{
max = count;
Text = proverb + " has " + max + " hits";
}
}
}
It makes use of two listboxes and a text file of proverbs. When loaded you can click on the second listbox and the window title will display the line with the most hits.
You will want to make a few changes:
pull your titles from your DB, including their keys
create a more extensive and expandable file with non-verbs
decide on mixed-case
create not one result but an ordered set of lines
maybe optimize a few things so you don't have to split the body of titles more than once

Related

Dynamic linq backslash in where clause

I use System.Linq.Dynamic to query entities with dynamic 'where' expressions. I'm querying object that has property "newValue" of string type. Exemplary value would be : "{\"ProcessId\":764, \"ProcessLength\":1000}".
I can't use == because I want to find all hits where the property contains "ProcessId:764", regardless on the rest of the string. The thing is, that stored string contains escape sign "\" and double quotes and I can't figure out what it should like exactly..
dbContext.Processes.Where("#newValue.Contains(\"ProcessId\":764\")") brings error, however dbContext.Processes.Where("#newValue.Contains(\":764\")") works correctly. I guess it must be something with backslashes or double quotes in my query but can't figure it out on my own..
There are two things to note here:
If you know at compile time the column that should be queried (i.e., newValue), just use standard Linq: var list = items.Where(i => i.NewValue.Contains("904")).ToList().
If you do want to use dyanmic Linq, What you'd usually want is to apply Where on some column, e.g. Where("SomeColumn.Contains("something")"), or Where("SomeColumn.Contains(#0)", new string[] {"something"}).
So, in your case, this should work: items.Where("newValue.Contains(\"904\")").
Doing Where("#newValue.Contains("something")") doesn't really make sense, since #newValue would be parsed as a string literal. See also this comment on a similiar question.
Here' a quick example:
public static void Main(string[] args)
{
var items = new []
{
new { Id = "1", Title = "ProcessId: 123"},
new { Id = "4", Title = "ProcessId: 456"},
new { Id = "7", Title = "ProcessId: 789"},
}.ToList();
// returns null, because the string "Title" doesn't contain the string "7"
var res1 = items.Where("#0.Contains(\"7\")", new string[] {"Title"}).FirstOrDefault();
// works - returns the 3rd element of the array
var res2a = items.Where("Title.Contains(#0)", new string[] {"ProcessId: 789"}).FirstOrDefault();
var res2b = items.Where("Title.Contains(\"ProcessId: 789\")").FirstOrDefault();
}
#HeyJude Thanks for the effort, but I still can't get it to work. It has somehow gone wronger and now I can't even fetch correct rows giving only ProcessId number..
Let me give you more detailed description of my setup. In the database there's a table with column "NewValue", I use this column to store json string of current (for the time of creating row in the table) representation of some object e.g. object Process. So the column stores for example string of {"ProcessId":904,"ProcessLength":1000}. To fetch this data from db I create collection of table's records: var items = (from l in db.JDE_Logs
join u in db.JDE_Users on l.UserId equals u.UserId
join t in db.JDE_Tenants on l.TenantId equals t.TenantId
where l.TenantId == tenants.FirstOrDefault().TenantId && l.Timestamp >= dFrom && l.Timestamp <= dTo
orderby l.Timestamp descending
select new //ExtLog
{
LogId = l.LogId,
TimeStamp = l.Timestamp,
TenantId = t.TenantId,
TenantName = t.TenantName,
UserId = l.UserId,
UserName = u.Name + " " + u.Surname,
Description = l.Description,
OldValue = l.OldValue,
NewValue = l.NewValue
});. Then I query it to find matching rows for given ProcessId number e.g. query = "#NewValue.Contains(\"904,)\")";
items = items.Where(query);
This should fetch back all records where NewValue column contains the query string, but this doesn't work. It compiles and 'works' but no data are fetched or fetched are only those records where 904 appears later in the string. Sounds stupid but this is what it is.
What should the query string look like to fetch all records containing "ProcessId":904?

Splitting a large dataset into smaller groups

I'm trying to build a sidebar search navigation filters of check boxes and radio buttons. I'm getting the values from a database. Something like the following, but with 12 filter categories total:
Color
[] red
[] green
[] blue
Size
[] small
[] medium
[] large
Shape
[] square
[] circle
[] triangle
It is working for me using something like the code below. But it seems really inefficient to make a database call for each of the sub-categories:
public ActionResult Index ()
{
SearchBarViewModel model = new SearchBarViewModel();
model.Color = GetValuesFromDb();
model.Size = GetValuesFromDb();
model.Shape = GetValuesFromDb();
return View(model)
}
I'm guessing there is a more efficient way to do this by making a single database query, returning a large dataset that contains all the category values and them split then into groups with linq? I'm just not sure how this would be done?
Database Schema*
SearchKey SearchValue
--------- -----------------
Id Name Id KeyId Value
--------- -----------------
1 Color 1 1 Red
2 Size 2 1 Green
3 Shape 3 1 Blue
4 2 Small
5 2 Medium
6 2 Large
Sql Query
SELECT sv.Id, sv.Value
FROM SearchKey sk
JOIN SearchValue sv ON sv.KeyId = sk.Id
WHERE sk.Name = #ValuePassedToSP
It might or might not be a little early in your development to be concerned about performance of db calls. If the menu values are not changing often or in different contexts, it can make more sense to have the menu structure stored in the database like you do. If the menu values are not changing often, it might be better to store them in a program code or settings file that is only loaded when your app is first loaded, or maybe at demand after that.
I think the linq in context you are looking for might go something like this, where the "GetALLSearchValuesFromDb()" method returns an IEnumerable generated by a SQL statement like you have already, only without the WHERE clause:
public ActionResult Index ()
{
SearchBarViewModel model = new SearchBarViewModel();
var searchvalues = GetALLSearchValuesFromDb();
model.Color = searchvalues.Where(sv => sv.Name == "Color");
model.Size = searchvalues.Where(sv => sv.Name == "Size");
model.Shape = searchvalues.Where(sv => sv.Name == "Shape");
return View(model)
}

Splitting a string by a string and inserting into a list C#

So I'm using C# and Visual Studio. I am reading a file of students and their information. The number of students is variable, but I want to grab their information. At the moment I just want to segment the student's information based off of the string "Student ID" because each student's section starts with Student ID. I'm using ReadAllText and setting it equal to a string and then feeding that string to my function splittingStrings. The file will look like this:
student ID 1
//bunch of info
student ID 2
//bunch of info
student ID 3
//bunch of info
.
.
.
I'm wanting to split each segment into a list since the number of students will be unknown, and the information for each student will vary. So I looked into both Regular string split and Regex string splitting. For regular strings I tried this.
public static List<string> StartParse = new List<string>();
public static void splittingStrings(string v)
{
string[] DiagDelimiters = new string[] {"Student ID "};
StartParse.Add(v.Split(DiagDelimiters, StringSplitOptions.None);
}
And this is what I tried with Regex:
StartParse.Add(Regex.Split("Student ID ");
I haven't used Lists before, but from what I've read they are dynamic and easy to use. My only trouble I'm getting is that all examples I see with split are in combination with an array so syntactically I'm not sure how to do a split on a string and insert it into a list. For output my goal is to have the student segments divided so that if I need to I can call a particular segment later.
Let me verify that I'm after that batch of information not the ID's alone. A lot of the questions seem to be focused on that so I felt I needed to verify that.
To those suggesting other storage bodies:
example of what list will hold:
position 0 will hold [<id> //bunch of info]
position 1 will hold [<anotherID> //bunch of info]
.
.
.
So I'm just using the List to do multiple operations on for information that I need. The information will be FAR more manageable if I can segment them into the list as shown above. I'm aware of dictionaries, but I have to store this information either in sql tables or inside text files depending on the contents of the segments. An example would be if one segment was really funky then I would send an error report that one student's information is bad. Otherwise insert neccessary information into sql table. But I'm having to work with multiple things from the segments so I felt the List was the best way to go since I'll have to also go back and forth in the segment to cross check bits of information with earlier things in that segment I found.
There is no need to use RegEx here and I would recommend against it. Simply splitting on white space will do the trick. Lets pretend you have a list which contains each of those lines (student ID 1, student ID 2, ect) you can get a list of the id's very simply like so;
List<string> ids = students.Select(x => x.Split(' ')[2]).ToList();
The statement above essentially says, for each string in students split the string and return the third token (index 2 because it's 0 indexed). I then call ToList because Select by default returns an IEnumerable<T> but I wouldn't worry about those details just yet. If you don't have a list with each of the lines you showed the idea stays much the same, only you would add the items to you ids list one by one as you split the string. For an given string in the form of student id x I would get x on it's own with myString.Split(' ')[2] that is the basis of the expression I pass into Select.
Based on the OP's comment here is a way to get all of the data without the Student Id part of each batch.
string[] batches = input.Split(new string[] { "student id " } StringSplitOptions.RemoveEmptyEntries);
If you really need a list then you can just call ToList() and change type of batches to List<string> but that would probably just be a waste of CPU cycles.
Here's some pseudo-code, and what i'd do:
List<Integer> ids;
void ParseStudentId(string str) {
var spl = str.split(" ");
ids.add(Integer.parseInt(spl[spl.length-1])); // this will fetch "1" from "Student Id 1"
}
void main() {
ParseStudentId("Student Id 1");
ParseStudentId("Student Id 2");
ParseStudentId("Student Id 3");
foreach ( int id in ids )
Console.WriteLin(id); // will result in:
// 1
// 2
// 3
}
forgive me. i'm a java programmer, so i'm mixing Pascal with camel casing :)
Try this one:
StartParse = new List<string>(Regex.Split(v, #"(?<!^)(?=student ID \d+)"));
(?<!^)(?=student ID \d+) which means Splitting the string at the point student ID but its not at the beginning of the string.
Check this code
public List<string> GetStudents(string filename)
{
List<string> students = new List<string>();
StringBuilder builder = new StringBuilder();
using (StreamReader reader = new StreamReader(filename)){
string line = "";
while (!reader.EndOfStream)
{
line = reader.ReadLine();
if (line.StartsWith("student ID") && builder.Length > 0)
{
students.Add(builder.ToString());
builder.Clear();
builder.Append(line);
continue;
}
builder.Append(line);
}
if (builder.Length > 0)
students.Add(builder.ToString());
}
return students;
}

Indexing subcategories vs finding them dynamically (performance)

I'm building a web-based store application, and I have to deal with many nested subcategories within each other. The point is, I have no idea whether my script will handle thousands (the new system will replace the old one, so I know what traffic I have to expect) - at the present day, respond lag from the local server is 1-2 seconds more than other pages with added about 30 products in different categories.
My code is the following:
BazaArkadiaDataContext db = new BazaArkadiaDataContext();
List<A_Kategorie> Podkategorie = new List<A_Kategorie>();
public int IdKat { get; set; }
protected void Page_Load(object sender, EventArgs e)
{
if (!IsPostBack)
{
List<A_Produkty> Produkty = new List<A_Produkty>(); //list of all products within the category and remaining subcategories
if (Page.RouteData.Values["IdKategorii"] != null)
{
string tmpkat = Page.RouteData.Values["IdKategorii"].ToString();
int index = tmpkat.IndexOf("-");
if (index > 0)
tmpkat = tmpkat.Substring(0, index);
IdKat = db.A_Kategories.Where(k => k.ID == Convert.ToInt32(tmpkat)).Select(k => k.IDAllegro).FirstOrDefault();
}
else
return;
PobierzPodkategorie(IdKat);
foreach (var item in Podkategorie)
{
var x = db.A_Produkties.Where(k => k.IDKategorii == item.ID);
foreach (var itemm in x)
{
Produkty.Add(itemm);
}
}
//data binding here
}
}
List<A_Kategorie> PobierzPodkategorie(int IdKat, List<A_Kategorie> kat = null)
{
List<A_Kategorie> Kategorie = new List<A_Kategorie>();
if (kat != null)
Kategorie.Concat(kat);
Kategorie = db.A_Kategories.Where(k => k.KatNadrzedna == IdKat).ToList();
if (Kategorie.Count() > 0)
{
foreach (var item in Kategorie)
{
PobierzPodkategorie(item.IDAllegro, Kategorie);
Podkategorie.Add(item);
}
}
return Kategorie;
}
TMC;DR*
My function PobierzPodkategorie recursively seeks through subcategories (subcategory got KatNadrzedna column for its parent category, which is placed in IDAllegro), selects all the products with the subcategory ID and adds it to the Produkty list. The database structure is pretty wicked, as the category list is downloaded from another shop service server and it needed to get our own ID column in case the foreign server would change the structure.
There are more than 30 000 entries in the category list, some of them will have 5 or more parents, and the website will show only main categories and subcategories ("lower" subcategories are needed by external shop connected with SOAP).
My question is
Will adding index table to the database (Category 123 is parent for 1234, 12738...) will improve the performance, or is it just waste of time? (The index should be updated when version of API changes and I have no idea how often would it be) Or is there other way to do it?
I'm asking because changing the script will not be possible in production, and I don't know how the db engine handles lots of requests - I'd really appreciate any help with this.
The database is MSSQL
*Too much code; didn't read
The big efficiency gain you can get is to load all subproducts in a single query. The time saved by reducing network trips can be huge. If 1 is a root category and 12 a child category, you can query all root categories and their children like:
select *
from Categories
where len(Category) <= 2
An index on Category would not help with the above query. But it's good practice to have a primary key on any table. So I'd make Category the primary key. A primary key is unique, preventing duplicates, and it is indexed automatically.
Moving away from RBAR (row by agonizing row) has more effect than proper tuning of the database. So I'd tackle that first.
You definitely should move the recursion into database. It can be done using WITH statement and Common Table Expressions. Then create a view or stored procedure and map it to you application.
With that you should be able to reduce SQL queries to two (or even one).

Best way to dynamically get column names from oracle tables

We are using an extractor application that will export data from the database to csv files. Based on some condition variable it extracts data from different tables, and for some conditions we have to use UNION ALL as the data has to be extracted from more than one table. So to satisfy the UNION ALL condition we are using nulls to match the number of columns.
Right now all the queries in the system are pre-built based on the condition variable. The problem is whenever there is change in the table projection (i.e new column added, existing column modified, column dropped) we have to manually change the code in the application.
Can you please give some suggestions how to extract the column names dynamically so that any changes in the table structure do not require change in the code?
My concern is the condition that decides which table to query. The variable condition is
like
if the condition is A, then load from TableX
if the condition is B then load from TableA and TableY.
We must know from which table we need to get data. Once we know the table it is straightforward to query the column names from the data dictionary. But there is one more condition, which is that some columns need to be excluded, and these columns are different for each table.
I am trying to solve the problem only for dynamically generating the list columns. But my manager told me to make solution on the conceptual level rather than just fixing. This is a very big system with providers and consumers constantly loading and consuming data. So he wanted solution that can be general.
So what is the best way for storing condition, tablename, excluded columns? One way is storing in database. Are there any other ways? If yes what is the best? As I have to give at least a couple of ideas before finalizing.
Thanks,
A simple query like this helps you to know each column name of a table in Oracle.
Select COLUMN_NAME from user_tab_columns where table_name='EMP'
Use it in your code :)
Ok, MNC, try this for size (paste it into a new console app):
using System;
using System.Collections.Generic;
using System.Linq;
using Test.Api;
using Test.Api.Classes;
using Test.Api.Interfaces;
using Test.Api.Models;
namespace Test.Api.Interfaces
{
public interface ITable
{
int Id { get; set; }
string Name { get; set; }
}
}
namespace Test.Api.Models
{
public class MemberTable : ITable
{
public int Id { get; set; }
public string Name { get; set; }
}
public class TableWithRelations
{
public MemberTable Member { get; set; }
// list to contain partnered tables
public IList<ITable> Partner { get; set; }
public TableWithRelations()
{
Member = new MemberTable();
Partner = new List<ITable>();
}
}
}
namespace Test.Api.Classes
{
public class MyClass
{
private readonly IList<TableWithRelations> _tables;
public MyClass()
{
// tableA stuff
var tableA = new TableWithRelations { Member = { Id = 1, Name = "A" } };
var relatedclasses = new List<ITable>
{
new MemberTable
{
Id = 2,
Name = "B"
}
};
tableA.Partner = relatedclasses;
// tableB stuff
var tableB = new TableWithRelations { Member = { Id = 2, Name = "B" } };
relatedclasses = new List<ITable>
{
new MemberTable
{
Id = 3,
Name = "C"
}
};
tableB.Partner = relatedclasses;
// tableC stuff
var tableC = new TableWithRelations { Member = { Id = 3, Name = "C" } };
relatedclasses = new List<ITable>
{
new MemberTable
{
Id = 2,
Name = "D"
}
};
tableC.Partner = relatedclasses;
// tableD stuff
var tableD = new TableWithRelations { Member = { Id = 3, Name = "D" } };
relatedclasses = new List<ITable>
{
new MemberTable
{
Id = 1,
Name = "A"
},
new MemberTable
{
Id = 2,
Name = "B"
},
};
tableD.Partner = relatedclasses;
// add tables to the base tables collection
_tables = new List<TableWithRelations> { tableA, tableB, tableC, tableD };
}
public IList<ITable> Compare(int tableId, string tableName)
{
return _tables.Where(table => table.Member.Id == tableId
&& table.Member.Name == tableName)
.SelectMany(table => table.Partner).ToList();
}
}
}
namespace Test.Api
{
public class TestClass
{
private readonly MyClass _myclass;
private readonly IList<ITable> _relatedMembers;
public IList<ITable> RelatedMembers
{
get { return _relatedMembers; }
}
public TestClass(int id, string name)
{
this._myclass = new MyClass();
// the Compare method would take your two paramters and return
// a mathcing set of related tables that formed the related tables
_relatedMembers = _myclass.Compare(id, name);
// now do something wityh the resulting list
}
}
}
class Program
{
static void Main(string[] args)
{
// change these values to suit, along with rules in MyClass
var id = 3;
var name = "D";
var testClass = new TestClass(id, name);
Console.Write(string.Format("For Table{0} on Id{1}\r\n", name, id));
Console.Write("----------------------\r\n");
foreach (var relatedTable in testClass.RelatedMembers)
{
Console.Write(string.Format("Related Table{0} on Id{1}\r\n",
relatedTable.Name, relatedTable.Id));
}
Console.Read();
}
}
I'll get back in a bit to see if it fits or not.
So what you are really after is designing a rule engine for building dynamic queries. This is no small undertaking. The requirements you have provided are:
Store rules (what you call a "condition variable")
Each rule selects from one or more tables
Additionally some rules specify columns to be excluded from a table
Rules which select from multiple tables are satisfied with the UNION ALL operator; tables whose projections do not match must be brought into alignment with null columns.
Some possible requirements you don't mention:
Format masking e.g. including or excluding the time element of DATE columns
Changing the order of columns in the query's projection
The previous requirement is particularly significant when it comes to the multi-table rules, because the projections of the tables need to match by datatype as well as number of columns.
Following on from that, the padding NULL columns may not necessarily be tacked on to the end of the projection e.g. a three column table may be mapped to a four column table as col1, col2, null, col3.
Some multi-table queries may need to be satisfied by joins rather than set operations.
Rules for adding WHERE clauses.
A mechanism for defining default sets of excluded columns (i.e. which are applied every time a table is queried) .
I would store these rules in database tables. Because they are data and storing data is what databases are for. (Unless you already have a rules engine to hand.)
Taking the first set of requirements you need three tables:
RULES
-----
RuleID
Description
primary key (RuleID)
RULE_TABLES
-----------
RuleID
Table_Name
Table_Query_Order
All_Columns_YN
No_of_padding_cols
primary key (RuleID, Table_Name)
RULE_EXCLUDED_COLUMNS
---------------------
RuleID
Table_Name
Column_Name
primary key (RuleID, Table_Name, Column_Name)
I've used compound primary keys just because it's easier to work with them in this context e.g. running impact analyses; I wouldn't recommend it for regular applications.
I think all of these are self-explanatory except the additional columns on RULE_TABLES.
Table_Query_Order specifies the order in which the tables appear in UNION ALL queries; this matters only if you want to use the column_names of the leading table as headings in the CSV file.
All_Columns_YN indicates whether the query can be written as SELECT * or whether you need to query the column names from the data dictionary and the RULE_EXCLUDED_COLUMNS table.
No_of_padding_cols is a simplistic implementation for matching projections in those UNION ALL columns, by specifying how many NULLs to add to the end of the column list.
I'm not going to tackle those requirements you didn't specify because I don't know whether you care about them. The basic thing is, what your boss is asking for is an application in its own right. Remember that as well as an application for generating queries you're going to need an interface for maintaining the rules.
MNC,
How about creating a dictionary of all the known tables involved in the application process up front (irrespective of the combinations - just a dictionary of the tables) which is keyed on tablename. the members of this dictionary would be a IList<string> of the column names. This would allow you to compare two tables on both the number of columns present dicTable[myVarTableName].Count as well as iterating round the dicTable[myVarTableName].value to pull out the column names.
At the end of the piece, you could do a little linq function to determine the table with the greatest number of columns and create the structure with nulls accordingly.
Hope this gives food for thought..

Categories

Resources