How to find matching URLs except for certain query strings? - c#

I am using the URL for an HTTP resource as unique identifiers for the resources (surprise).
These are all different:
http://localhost/Docs/SomeDocument?group=33&checksafety=true
http://localhost/Docs/SomeDocument?group=11&checksafety=true
http://localhost/Docs/SomeDocument?group=11&checksafety=false
However, I have a third query parameter that should not differentiate resources (on the server side, it pulls the same data from the database).
These are the same (the group and checksafety parameters are the same):
http://localhost/Docs/SomeDocument?group=11&checksafety=false&rendergroup=A
http://localhost/Docs/SomeDocument?group=11&checksafety=false&rendergroup=B
http://localhost/Docs/SomeDocument?rendergroup=C&group=11&checksafety=false
Is a regex appropriate here?
Is there a better way?
I am using C#, .NET 3.5 and ASP.NET.

You could split to get the querystring, then split to get each group, see if all of them have the same number of groups (by group I mean "variable=value"). Analyse each group individually and eliminate the ones that doesn't care.
Then, it is easier to analyse. Here are some ideas
put each group into a List and order it (and then iterate to see if they are the same
put each group into a Set and check if the Union is equal a Set individually.
do this process using the Uri and UriBuilder, and use them for the match verification after removing the "irrelevant" groups
--EDIT included Nick Berardi suggestion

Related

Extract unique listo of fields from maching document

I am new to Lucene, so maybe i have missunderstood something about how it works.
I have indexed few hundred thousand documents with many string field. For example suppose we have 5 string field (named A,B,C,D,E) and the first 3 are indexed (A,B,C) leaving the last two unindexed, only included into the document (i mean D,E). Values in each field may be duplicate, for example assume that the field A is used to store names, and the name 'Richard' appear many times.
When i apply a query i looking for each term in each field, now for example, suppose i get 3K documents that match my query.
Is it possible to get a list of unique values (distinct) of each fields without scan and group the result? I am particularly interested into this because i apply a limit to the documents i actually read, but i would like to get a complete list of unique values in each fields (even the documents i dont' read) of the matching documents.
If this is possibile, can i apply this logic even for unindexed fields (D,E) ?
When doing the search, it will return to you all the documents that have the query conditions. On that result you can do a highlight (which will slow the process), but you can do something like pagination to return the result in pages if you want.
In the highligher you have many methods you can use (depending on what version of Lucene you are using; I am talking here about the last version 4.8.0) like GetBestTextFragments() which takes a parameter called maxNumberFragments. If you set that parameter to 1 then it will return only one value from that particular field even if there might be multiple values that match the query.
I am not sure if that answers your question, but I hope it helps. Regarding the unindexed fields, I dont think you can do that (although I have never tried it).

LDAP property contains to many values

I have a function witch tries to remove a member from a group
The problem is if you try to remove a member, without knowing the existence in the group, you could cause an exception.
So I try to enumerate its membership beforehand.
The problem now is that the member property stops after 3000 Entries, and I don't know a way to get more, or the next 3000 members of that group.
Here is my code
DirectoryEntry target_group = new DirectoryEntry(LDAP_group_DN);
if (target_group.Properties["member"].Contains(LDAP_member_to_remove_DN)) {
target_group.Properties["member"].Remove(LDAP_member_to_remove_DN);
}
target_group.CommitChanges();
target_group.Properties["member"] contains exactly 3000 entries, but in reality it is around 7500.
As a shorthand fix I am using the remove statement in a try/catch block without the .Contains() check, but that doesn't seem correct/beautiful/right.
Can anyone lead me to the correct way?
PS: I can not change the structure of our Directory.
This is a Group of RADIUS users, with should not be split up in more groups!
Instead of getting all the group members to determine if the user is part of that list I would use the memberOf/isMemberOf attribute (assuming that your directory supports this feature). This attribute will tell you if a user belongs to a group without having to retrieve all group members.
This other answer might help.
You need to look at into MaxValRange and learn how to retrieve more values using C#.
We have a very simple sample, but, alas, it is in Java

C# implement raven db full text search by the part of word

I have a grid and I need to support full text search. I need to support search not only by start with and end with, but I need to support search by the part of word. For example if I have "MyWord", I need that search will found by the part of "wor". If I try to use string.contains() I get the following error:
Contains is not supported, doing a substring match over a text field is a very slow operation, and is not allowed using the Linq API.
The recommended method is to use full text search (mark the field as Analyzed and use the Search() method to query it.
If I build raven db index and mark field as Analyzed, contains is not working. It works with StartWith() and EndWith(), but not with contains. Using .Search() I'm getting the same results. Another option is to use lucene syntax:
.Where("Name:*partOfWord*")
and it works fine, but I don't want to combine linq with lucene syntax and I want to solve it using raven db indexes.
Have you any ideas how to implement full text search for raven db using indexes?
You want to be using an NGram analyzer, as described here. It's an analyzer you can add to your RavenDB server by dropping its DLL in the Analyzers folder.
You really don't want to do any *substr Lucene queries ("ending with" clauses, that is), because the performance is terrible. The inconsistency in coding style is a lesser problem.
I use this query to search for persons full names by just typing a part of the name. It is recommended to set a minimum length of search string.
.Search(x => x.Name, "word to search" + "*", escapeQueryOptions: EscapeQueryOptions.AllowPostfixWildcard)

When using table parameters in LoadRunner, how do I select values from individual columns?

In LoadRunner, given a parameter table stored in a file MyTable.dat and a VUGEN script written in C#:
FirstHeader,SecondHeader,ThirdHeader
1A,1B,1C
2A,2B,2C
3A,3B,3C
I can use lr.eval_string("{MyTable}"); to return a whole row:
1A,1B,1C
I can use lr.next_row("MyTable.dat"); to advance to the next row
2A,2B,2C
However, it's not clear how to select an individual column.
The function reference for scripts written in C states that you can use lr_paramarr_idx for parameter arrays - but that doesn't appear to be available in C# & it doesn't make clear if a table row counts as a parameter array.
HP VUGen version 9.52.0.0.
Define individual parameters assigned to the different columns with your defined separator. If you have commas within your data, then use a different data separator, such as a tab (tsv format file) or I commonly use a pipe '|' symbol. If you don't have individual parameters set up and assigned to the individual columns then you will need to grab the whole row and break it apart yourself.
See lr.next_row() and lr.advance_param(). You may be using one where with the parameters explicitly defined you will want to use the other. lr.advance_param() would be the more common use, keeping in mind that when you iterate you are going to pick up some of this advancement on a natural basis, depending upon the definition of your parameters.
Given your questions you will want to take a look at two sections of the LoadRunner documentation, (1) the documentation on the parameterization engine for LoadRunner and (2) the section in the VUGEN manual dealing with advanced concepts and building virtual users in Visual Studio (there is some reinforcement on the parameterization concepts here).
This is a bad answer:
private string[] GetRowCells(string parameter)
{
string row = lr.eval_string("{" + parameter + "}");
return row.Split(',');
}
This is bad because:
If LoadRunner provides the facility for table parameters, there must be the capability for querying individual columns.
The above doesn't take account of columns that may include comma in their body:
For example, the following won't be parsed correctly:
FirstHeader,SecondHeader
"1,A","1,B"
"2,A","2,B"
"3,A","3,B"
Just use the column name that you want to work with.
lr_eval_string("{FirstHeader}");

SQL Server FTS: possible to get information how/why rows were matched?

Is it possible to get the information why/how given row returned by FTS query was matched (or which substring caused row to match)?
For example, consider simpliest table with id and text columns, with FTS index on the later one.
SELECT * FROM Example
WHERE CONTAINS(text, 'FORMSOF(INFLECTIONAL, jump)');
This examplary query could return, say row {1, 'Jumping Jack'}.
Now, is it possible to somehow get information that this very row was matched because of 'Jumping' word? It doesn't even have to be exact information, more of a which substring caused row to match.
Why I'm asking - I got C# app that builds up those queries basing on user input (keywords to search for), and I need the very basic information why/how row was matched back, to use further in C# code.
If it's not possible, any alternatives?
EDIT in regards of Mike Burton's and LesterDove's replies:
The above example was trivial for obvious reasons and your solutions are ok having that in mind, however FTS queries might return results where regex or simple string matching (eg. LIKE) won't cut in. Consider:
Search for bind returns bound (past form).
Search for extraordinary returns amazing (synonym).
Both valid matches.
I've been looking for solutions to this problem and found this: NHunspell. However, I already got FTS & valid results using SQL Server, duplicating similar mechanism (building extra indexes, storing additional words/thezaurus files etc) doesn't look good.
Lester's answer however gave me some ideas that perhaps I could indeed split the original string to temporary table, and run the original FTS query on this split result. As it might work for my case (where DB is fairly small and queries are not very complex), in general case this approach might be out of question.
1/ Use a SPLIT function (many variations can be Googled) on your original substring, which will dump the individual substrings into a temp table of some sort, with one row per substring snippet.
2/ EDIT: You need to use CROSS APPLY to join to a table valued function:
SELECT * FROM Example E CROSS APPLY Split(E.text, ' ') AS S
WHERE CONTAINS(E.text, 'FORMSOF(INFLECTIONAL, jump)') AND S.String LIKE '%jump%';
*NOTE: You need to forage for your own user-defined Split function. I used this one and applied the first commenter's edit to allow for the space character as a delimiter.
So, E is your Example table. You're still FT searching on the text field for the word 'jump'. And now you're "joining" to a table comprised of the individual substring values of your text field. Finally, you're matching that against the word 'jump' by using LIKE or Instr.
One simple post-processing method would be to generate an equivalent Regular Expression for each WHERE clause article and use it to discover after the fact how the found data matches the specified pattern.
You can get SQL to tell you how it interpreted your query, including how it transformed your input.
SELECT occurrence, special_term, display_term, expansion_type, source_term
FROM sys.dm_fts_parser('FORMSOF(INFLECTIONAL, bind)', 1033, 0, 0)
returns
occurrence special_term display_term expansion_type source_term
1 Exact Match binds 2 bind
1 Exact Match binding 2 bind
1 Exact Match bound 2 bind
1 Exact Match bind 0 bind
This isn't precisely what you asked for, but it's a start. You could search your results for anything in the display_term column and probably figure out why it matched.

Categories

Resources