DB2/C# Fun with queries and strings - Problems building query

DB2/C# Fun with queries and strings - Problems building query - c#

I've been tasked with what so far seems not possible but I was hoping that someone better at sql queries then me could figure this out if it's possible. I'm having problems querying data from a table. The source of the problem is I'm forced to query on names an date of birth. The date of birth is working and is out of the scope of my issue. My goal is to query using common values in both the source (an excel report) and the destination (the database) and thats the last name.
Fields: Name1, Name2
Table: Participant
Name2 in the database contains last name but if the person uses a middle name or a suffix it also contains those values. The source (report) for the most part contains only last names but sometimes it also has a small number of middle names also mixed into the last name. My goal is to strip out the middle name and suffix from the database query and also from the report last name string.
From the Database:
I need to strip the middle name which is to the left of the last name in the name2 field. They are separated by a space. I will also need to strip the suffix if it exists after the last name.
From the Report:
I need to strip out the middle name which would be to the left of the last name separated by a space. This would be done in c#
Please let me know if I can provide any more info to help with an answer.
My first guess with the query part is to do a wildcard search in which I would obtain the last name from the report and query the table using a (like '%%'). I think this will obtain the record I'm looking for in a search but I'm not sure how well it will work.

I have a similar table with an... integrated 'last name'.
I ended up writing a UDF that did the best it could, but there are still situations I haven't coded for which pop up from time to time. I wrote my UDF in ILE RPG. Unlike most databases, DB2 for i allows me to write in a HLL and simply register it as as UDF. I mention this because it's possible that the developers on the IBM side have already written the code to split the name parts - all they need to do now is make a UDF. Then you could
select getLast(combinedName) from ...

As John Clifford suggests, you could grab the last name using split like so (pseudocode):
// If the string contains a space, split it
string surname = Name2;
int spacePos = surname.IndexOf(" ");
if (spacePos > 0)
{
string[] words = surname.Split(' ');
surname = words[1];
}
Or you could find the space, and then get a substring:
int spaceTruckin = Name2.IndexOf(" ");
string surname = Name2.Substring(spaceTruckin);

Related

Using a classification algorithm for splitting a full name into first and last names?

I have a customers table with 2 columns for the name (firstname, lastname) and it contains around 100k records.
I have a scenario where I have to import new customers but their names come as a single column. Most names are simple (first and last), but some names are double names (with a space or hyphen), double surnames (with a space or hyphen) or even both.
Does using a ML.NET classification algorithm make sense to split the fullname based on a trained model from the 100k records?

I think it would be unnecessary to use machine learning methods for such a problem. You should try a rule-based method here.
Assuming the data comes in 1 column:
For example: After splitting the Text by space, is the length of the word count equal to 2? If equal, the 1st word is the name and the 2nd word is the surname.
Example 2: Does the text contain hyphen or not? If yes, what should I do? How can I determine my name and surname?
1-) What you need to do here is to create a training, validation and test set for yourself.
2-) Doing a coding with the rules you extracted from the data in the train set. (Here you need to make clever deductions by examining the data)
3-) You need to determine the most ideal rules with validation data.
Finally, you should evaluate your work by getting results on the test set with the rule you find most ideal.

Filter string match without regex

I want to filter a column with type string in a a collection result with Like here the logic :
so my logic I want to display all matches when the char before and after my search word in the sentence is in any of my delimiters, if not i don't display the row.

The thing you want is hard to impossible to do using like and string operations in database. And even when you get it right it will be slow. What you should do is use mysql feature full text search. And yes Entity Framework nor linq is supporting this syntax so you will have to circumvent the EF and write that query in a string.
_supplierDbContext.Database.SqlQuery<DTO>(
"select something from table match (column) against (:keywords)",
new SqlParameter(":keywords",keywords))

Design Advice: Parsing lines of text and dumping certain fields and assign to "buckets" - LINQ, RegEx,

I am seeking some advice on how best to tackle the following scenario:
There are these text files that contain certain metadata about accidents from the city. The majority of the contents of these text files are dumped into a database. The main piece, i.e. body, of the text files are entered into this one field, say ReportBody, in the Accidents table.
I have to read/parse this one column's contents into specific fields depending on what is being parsed. For example, here is a typical line from the text file that is entered into the ReportBody field in the database:
"Report for Mar 11, 2014 at 19:23 a traffic incident was submitted by officer - Badge 8394 Speeding through school zone - Employee ID:FOWSL, xxx, yyy"
In general, all of these "reports" follow this format:
"Report for [Date] at [Time] a [Incident_Type] incident was submitted by officer - [Badge_Number] [Category] - Employee ID:[ID_Number], [other_Property1], [other_Property2]"
where everything within the brackets are things I am collecting into an Accident class and submitting the Collection of Accidents into the database. Everything not enclosed within a bracket, then I am thinking I can use those strings as reference points that I have to collect something near by, e.g. I know that every time I come across the string "by officer - ", I will be assigning that [Badge_Number] string into the setter of the Accident class. Same logic with when I come across " - Employee ID:", I know I need to extract what is between the ":" and the first white space after that, " ", which in this case is the Accident's EmployeeId property.
It's been about two years since I worked heavily with LINQ, but recall I was able to do some heavy parsing and thought it was just elegant. Looking at it now, not sure what is the best path since I've been working in another framework and not sure if Lambdas, Predicates, RegEx, or so on is the best viable path.
Any advice will be appreciated!

Taking duplicate lines and squeezing them into 1 row on SQL Server 2008 R2?

I'm trying to write a stored procedure in SQL Server that will eliminate some logic in my C# program. What I'm doing right now is the query is in a view. Then I'm making a list with the view.
List<MyView> listOrdered = new List<MyView>();
Here's where it gets hairy. The query returns rows that are duplicates. I don't want to delete the duplicate rows I want to combine them into 1 row. The rows are identical except for 1 column.
Example:
UID Name Age Child
1 John 50 Sally
1 John 50 Steve
2 Joseph 42 Timmy
2 Joseph 42 Billy
So what I'm doing in C# is writing logic that says: (pseudo code)
foreach(item in list)
{
if (UID != UIDCurrent)
{
Build Row
AppendRow to list
}
else
{
Append Child Column to Current Child Column
}
}
Basically it gives me:
UID Name Age Children
1 John 50 Sally, Steve
But instead of doing this logic in C# I would like to do this a stored procedure. So how I can I get SQL Server to combine the children column for each row instead of multiple rows.
If you need anything else to help you help me I will respond.
Oh guys believe me I don't want to do it this way either. The Database I'm using is huge and complex and doing this with C# was sensible and works but I've been asked to turn my function that does this in C# into a stored procedure. I just want to see if this is even possible.

This demonstrates a poor table design. Fix it at the root and then you don't have this silly logic in either your db or C# code.
instead of
people(UID, Name, Age, Child)
try
people(UID, Name, DateOfBirth)
children(Parent references people.UID, child references people.UID)
You can leave age instead of moving to date of birth but it's really a much better idea to do it this way.

This should work if you want to do this in the database, though you should think about doing it in the front-end.
SELECT UID,
Name,
Age,
STUFF(
(SELECT ',' + Child AS [text()]
FROM parentChildren b
WHERE a.UID = b.UID
FOR XML PATH('')),1,1,'') [ChildConcat]
FROM parentChildren a

This is 100%, without a doubt something to be handled in your application code, NOT in the database!
SQL performs fairly poorly in string manipulation operations, especially compared to iterative languages like C#. You want your database to pass you the actual DATA, and then how you display that to be handled in the application layer.
Any attempt to solve this in SQL will be slower and harder to maintain than a version in your application code.

SQL Server FTS: possible to get information how/why rows were matched?

Is it possible to get the information why/how given row returned by FTS query was matched (or which substring caused row to match)?
For example, consider simpliest table with id and text columns, with FTS index on the later one.
SELECT * FROM Example
WHERE CONTAINS(text, 'FORMSOF(INFLECTIONAL, jump)');
This examplary query could return, say row {1, 'Jumping Jack'}.
Now, is it possible to somehow get information that this very row was matched because of 'Jumping' word? It doesn't even have to be exact information, more of a which substring caused row to match.
Why I'm asking - I got C# app that builds up those queries basing on user input (keywords to search for), and I need the very basic information why/how row was matched back, to use further in C# code.
If it's not possible, any alternatives?
EDIT in regards of Mike Burton's and LesterDove's replies:
The above example was trivial for obvious reasons and your solutions are ok having that in mind, however FTS queries might return results where regex or simple string matching (eg. LIKE) won't cut in. Consider:
Search for bind returns bound (past form).
Search for extraordinary returns amazing (synonym).
Both valid matches.
I've been looking for solutions to this problem and found this: NHunspell. However, I already got FTS & valid results using SQL Server, duplicating similar mechanism (building extra indexes, storing additional words/thezaurus files etc) doesn't look good.
Lester's answer however gave me some ideas that perhaps I could indeed split the original string to temporary table, and run the original FTS query on this split result. As it might work for my case (where DB is fairly small and queries are not very complex), in general case this approach might be out of question.

1/ Use a SPLIT function (many variations can be Googled) on your original substring, which will dump the individual substrings into a temp table of some sort, with one row per substring snippet.
2/ EDIT: You need to use CROSS APPLY to join to a table valued function:
SELECT * FROM Example E CROSS APPLY Split(E.text, ' ') AS S
WHERE CONTAINS(E.text, 'FORMSOF(INFLECTIONAL, jump)') AND S.String LIKE '%jump%';
*NOTE: You need to forage for your own user-defined Split function. I used this one and applied the first commenter's edit to allow for the space character as a delimiter.
So, E is your Example table. You're still FT searching on the text field for the word 'jump'. And now you're "joining" to a table comprised of the individual substring values of your text field. Finally, you're matching that against the word 'jump' by using LIKE or Instr.

One simple post-processing method would be to generate an equivalent Regular Expression for each WHERE clause article and use it to discover after the fact how the found data matches the specified pattern.

You can get SQL to tell you how it interpreted your query, including how it transformed your input.
SELECT occurrence, special_term, display_term, expansion_type, source_term
FROM sys.dm_fts_parser('FORMSOF(INFLECTIONAL, bind)', 1033, 0, 0)
returns
occurrence special_term display_term expansion_type source_term
1 Exact Match binds 2 bind
1 Exact Match binding 2 bind
1 Exact Match bound 2 bind
1 Exact Match bind 0 bind
This isn't precisely what you asked for, but it's a start. You could search your results for anything in the display_term column and probably figure out why it matched.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.