Lucene.Net - Get distinct categories - c#

I have created the following document:
var document = new Document();
document.Add(new Field("category", "foo", Field.Store.YES, Field.Index.NOT_ANALYZED));
...
I have approx 10M documents which belong to 8 distinct categories. I would like to get all distinct categories (get all documents and read a value of category field) by executing search query. Is that feasible?
Another approach is to create a list of categories at index rebuild and to write these values in database.
Any help would be greatly appreciated!

Check out the IndexReader.Terms() method.
If you give it an empty Term for a field, it will return a TermEnum containing all the terms for that field.
TermEnum terms = indexReader.Terms(new Term("category"));
// enumerate the terms

To extend Beaulac's solution for future use...
To only get unique result set, you must iterate through terms like this:
while (null != terms.Term) {
If (term.Field.Equals("category")) {
// do something with this term
}
terms.Next();
}

Related

Selenium C# Find Element with class and text

im very new to testing and have no training in automated tests so please bare with me if i say stupid things but ill try the best i can.
Bascially i am trying to assert that a specific employee in the employee list has the status of 'leaver'.
This is what i have tried (and other variations with the different classes)
Assert.Equal("image-tile__badge background-color--status-leaver ng-star-inserted", Driver.FindElement(By.XPath("//*[contains(#class,'image-tile__content-header') and contains(text(),'End Date, Contract') and contains(#class, 'image-tile__badge')]")).GetAttribute("Class"));
Assert.Equal("image-tile__badge background-color--status-leaver ng-star-inserted", Driver.FindElement(By.XPath("//*[contains(#class,'image-tile__content-header') and contains(text(),'End Date, Contract')]")).FindElement(By.XPath("//*[contains(#class, 'image-tile__badge')]")).GetAttribute("Class"));
The last one finds the element when the status is 'new', but when i change the employee status to 'leaver', it still returns as 'new' so possibly looking at another employee with a 'new' status.
Hopefully this is enough info, let me know if more is needed (this is my first ever post!)
HTML code in image below
[HTML code on Chrome]
[1]: https://i.stack.imgur.com/kUxkf.png
Summary: im trying to assert that the Employee "End Date, Contract" has the status of leaver (aka the leaver class "image-tile__badge background-color--status-leaver ng-star-inserted")
Thanks everyone for their help!
One of my devs managed to take #noldors example and modify it a bit so heres what ended up working for me:
var newElmList1 = Driver.FindElements(By.CssSelector("div.background-color--status-leaver")).ToList();
List<string> newNames1 = new List<string>();
foreach (var newElm in newElmList1)
{
var newName1 = newElm.FindElement(By.XPath(".."))
.FindElement(By.CssSelector("div.image-tile__content-header")).Text;
newNames.Add(newName1);
}
if (!newNames.Contains("End Date, Contract"))
{
throw new Exception("Exception Error on leaver Person");
}
As per your screenshot i fill it's better if you try using Xpath
var elmList = Driver.FindElements(By.Xpath("//div[contains(text(),'leaver')]")).ToList();
i hope it will help you
Thank You.
According to your screenshot, you can find all elements with 'Leaver' specific class with this;
var leaverElmList = Driver.FindElements(By.CssSelector("div.background-color--status-leaver")).ToList();
List<string> leaverNames = new List<string>();
foreach (var leaverElm in leaverElmList) {
var leaverName = leaverElm.FindElement(By.XPath(".."))
.FindElement(By.CssSelector("div.image-tile__content-header"));
.Text()
leaverNames.Add(leaverName);
}
Enddate, Contract which is not related to the div that contains Leaver. It's direct parent is the image-tile div

Find within large list using Contains within Linq

I have two large excel files. I am able to get the rows of these excel files into a list using linqtoexcel. The issue is that I need to use a string from one object within the first list to find if it is part of or contained inside another string within an object of the second list. I was trying the following but the process is taking to long as each list is over 70,000 items.
I have tried using an Any statement but have not be able to pull results. If you have any ideas please share.
List<ExcelOne> exOne = new List<ExcelOne>();
List<ExcelTwo> exTwo = new List<ExcelTwo>();
I am able to build the first list and second list and can verify there are objects in the list. Here was my thought of how I would work through the lists to find matching. Note that once I have found the matching I want to create a new class and add it to a new list.
List<NewFormRow> rows = new List<NewFormRow>();
foreach (var item in exOne)
{
//I am going through each item in list one
foreach (var thing in exTwo)
{
//I now want to check if exTwo.importantRow has or
//contains any part of the string from item.id
if (thing.importantRow.Contains(item.id))
{
NewFormRow adding = new NewFormRow()
{
Idfound = item.id,
ImportantRow = thing.importantRow
};
rows.Add(adding);
Console.WriteLine("added one");
}
}
If you know a quicker way around this please share. Thank you.
It's hard to improve this substring approach. The question is if you have to do it here. Can't you do it where you have filled the lists? Then you don't need this additional step.
However, maybe you find this LINQ query more readable:
List<NewFormRow> rows = exOne
.SelectMany(x => exTwo
.Where(x2 => x2.importantRow.Contains(x.id))
.Select(x2 => new NewFormRow
{
Idfound = x.id,
ImportantRow = x2.importantRow
}))
.ToList();

C#: Generating a table from a .CSV that counts name (string) occurences in another table

I am importing a data table from a .csv file with headers, and this is no problem.
So let us call the file dt.csv.
One column header is named companyName.
But I need to create a new table where I, first of all, list all the companies from the first data table and count how many times each companyName does appear in the first table.
The first table can have anything from 500 to 5000 lines, but the number of different companies appearing will only be 15-50. The challenge is that I do not know the company names to expect in advance, so I cannot make a positive list to count against. I need the list to count against to be generated based on the content of column companyName (so that I do not get duplicates of the same name).
This code is C# but is largely pseudocode as I'm not certain of your approach to reading / writing the CSV file:
var seenCompanies = new List<string>();
foreach(var line in csvFile)
{
seenCompanies.Add(line.GetColumn("companyName"));
}
var companiesAndCounts =
seenCompanies
.GroupBy(s => s)
.Select(group => new { Name = group.Key, Count = group.Count()})
.ToList();
foreach(var group in companiesAndCounts)
{
outputFile.Write(group.Name + "," + group.Count);
}
This is pretty standard conceptually, all you're really doing is just counting the occurrences of each distinct company name and then writing that out.
You can adapt this to better suit your needs, but it should be enough to show how it can be approached.
You can also use System.Generics.Dictionary
//I used this list to test, you should replace "companies" with a list from you csv file
List<string> companies = new List<string>(){"c1","c2","c1","c4","c3","c3","c3","c2"};
Dictionary<string,int> numberOfAppearance = new Dictionary<string,int>();
foreach(string company in companies)
{
if(numberOfAppearance.ContainsKey(company))
numberOfAppearance[company]++;
else
numberOfAppearance.Add(company, 1);
}
//Now numberOfAppearance["companyName"] holds the number of appearances of the company named companyName in the list
I created a List to hold all the company names as I don't know how you store them from your csv file, but It should be easy to adapt.

Ordering bookmarks by page using microsoft interop c#

I have a template word file composed by 2 pages, each page has a bookmark, the first page bookmark name is A4 and the second page bookmark name is A3, but when I read all bookmarks from the word document I get them in alphabetical order, I want them in page order, how can i do this?
foreach (Bookmark bookMark in MergeResultDoc.Bookmarks)
{//IMPORTANTE:IL NOME DEL SEGNALIBRO DEVE ESSERE IL TIPO DI CARTA
pagInizio = Convert.ToInt32(pagNum);
pagNum = bookMark.Range.Information[WdInformation.wdActiveEndPageNumber].ToString();
addData( pagInizio, pagNum, bookMark.Name);
iteration++;
}
You can read the bookMark.Start value.
This returns the start position of the Bookmark in the document.
So you can run over all Bookmarks and sort them by their start position.
Here is a code to do that:
// List to store all bookmarks sorted by position.
List<Bookmark> bmList = new List<Bookmark>();
// Iterate over all the Bookmarks and add them to the list (unordered).
foreach (Bookmark curBookmark in MergeResultDoc.Bookmarks)
{
bmList.Add(curBookmark);
}
// Sort the List by the Start member of each Bookmark.
// After this line the bmList will be ordered.
bmList.Sort(delegate(Bookmark bm1, Bookmark bm2)
{
return bm1.Start.CompareTo(bm2.Start);
});
Use LINQ OrderBy:
var orderedResults = MergeResultDoc.Bookmarks.OrderBy(d => d.Start).ToList();
Document.Boomarks should return the bookmarks in alpha sequence.
Document.Content.Bookmarks should return the bookmarks in the sequence they appear in the document. But VBA collection documentation does not typically guarantee a particular sequence for anything, it's safer to read the Start (as suggested by etaiso) and sort using that.

linq to xml query based on multiple statements in order to serve a previous / next button

I am a newbie to Linq and having difficulties to solve an easy proble..as I 've never done before.
The scenario is a single XML table with books..like :
<?xml version="1.0" encoding="utf-8"?>
<dbproject>
<books_dataset>
<book>
<id>23</id>
<isbn>075221912X</isbn>
<title>Big Brother: The Unseen Story</title>
<author>Jean Ritchie</author>
<publicationYr>2000</publicationYr>
<publisher>Pan Macmillan</publisher>
<pages>169</pages>
<imageBigLink>/images/P/075221912X.01.LZZZZZZZ.jpg</imageBigLink>
<priceActual>0</priceActual>
<numberOfBids>0</numberOfBids>
<sf>kw</sf>
<df></df>
<ef></ef>
<description>Lorem ipsum dolor sit amet</description>
</book>
</books_dataset>
</dbproject>
I am trying to create a query which gives me the ID (next one / first one) of the next/previous book which has a "kw" string in the node.
The IDs are not continuous and there is no index. So for instance a next button is looking for an ID as follows:
Next (higher) ID = Next Book
Which has a "kw" string in
I 've tried many solutions but just got confused :/.
I am able to jump to the next/previous node.. but to be honest I am sure it isn't the best approach to achieve the task.
I am able to list the books which has a kw string but this two requirements do not work together :/
I use this query to ask for a next ID :
var btnNextEval = (from databack in xmlData.Element("dbproject").Elements(QRY).Elements(QRY_sub)
where databack.Element(fid1).Value == trgtCounter.ToString()
select databack).Single().ElementsAfterSelf().First().Element("id").Value;
trgtCounter = Convert.ToInt16(btnNextEval);
I tried to use && to create multiple where but didn't work :/
Please help and show me possible solutions for this silly problem.
Thanks!
Try this:
var nextId = (
from book in xmlData.Elements("book")
let id = (int)book.Element("id")
where ((string)book.Element("sf")) == "kw"
&& id > currentId
select (int)book.Element("id")
).DefaultIfEmpty(-1).Min();
This returns the next ID. To get the book with next ID, do the following:
var nextBook = (
from book in xmlData.Elements("book")
where (int)book.Element("id") == nextId
select book
).First();
Notes:
This assumes there is a variable currentId of type int containing the current id.
You need the DefaultIfEmpty in case there are no ids greater than the current one. In that case, Min will return an error. Using DefaultIfEmpty(-1) will return a single set with -1.
First will also return an error if used on an empty sequence.

Categories

Resources