IText7 and missing GetPageN method in C#

IText7 and missing GetPageN method in C# - c#

I've this C# code working with iTextSharp 5 and I need to port it to IText7.
public static PdfReader Fix(PdfReader pdfReader, int pagina)
{
var dic = pdfReader.GetPageN(pagina);
var resources = dic.GetAsDict(PdfName.Resources);
var fonts = resources?.GetAsDict(PdfName.Font);
if (fonts == null) return pdfReader;
foreach (var key in fonts.Keys)
{
var font = fonts.GetAsDict(key);
var firstChar = font.Get(PdfName.FirstChar);
if (firstChar == null)
font.Put(PdfName.FirstChar, new PdfNumber(32));
var lastChar = font.Get(PdfName.LastChar);
if (lastChar == null)
font.Put(PdfName.LastChar, new PdfNumber(255));
var widths = font.GetAsArray(PdfName.Widths);
if (widths != null) continue;
var array = Enumerable.Repeat(600, 256).ToArray();
font.Put(PdfName.Widths, new PdfArray(array));
}
return pdfReader;
}
The problem I have is that the method GetPageN in this line:
var dic = pdfReader.GetPageN(pagina);
has been removed.
Have someone faced the same problem?

Indeed, now the GetPage() method is inside of the PdfDocument class.
There are also some little changes as to how you get the Dictionary entries from the document, which I took the liberty to adjust your code to.
public static PdfReader Fix(PdfReader pdfReader, int pagina)
{
var dic = new PdfDocument(pdfReader).GetPage(pagina);
var resources = dic.GetPdfObject().GetAsDictionary(PdfName.Resources);
var fonts = resources?.GetAsDictionary(PdfName.Font);
if (fonts == null) return pdfReader;
foreach (var key in fonts.KeySet())
{
var font = fonts.GetAsDictionary(key);
var firstChar = font.Get(PdfName.FirstChar);
if (firstChar == null)
font.Put(PdfName.FirstChar, new PdfNumber(32));
var lastChar = font.Get(PdfName.LastChar);
if (lastChar == null)
font.Put(PdfName.LastChar, new PdfNumber(255));
var widths = font.GetAsArray(PdfName.Widths);
if (widths != null) continue;
var array = Enumerable.Repeat(600, 256).ToArray();
font.Put(PdfName.Widths, new PdfArray(array));
}
return pdfReader;
}
(I haven't checked your code, just made sure that at least what you posted now compiles)

Related

Lucene 4.8 facets usage

I have difficulties understanding this example on how to use facets :
https://lucenenet.apache.org/docs/4.8.0-beta00008/api/Lucene.Net.Demo/Lucene.Net.Demo.Facet.SimpleFacetsExample.html
My goal is to create an index in which each document field have a facet, so that at search time i can choose which facets use to navigate data.
What i am confused about is setup of facets in index creation, to
summarize my question : is index with facets compatibile with
ReferenceManager?
Need DirectoryTaxonomyWriter to be actually written and persisted
on disk or it will embedded into the index itself and is just
temporary? I mean given the code
indexWriter.AddDocument(config.Build(taxoWriter, doc)); of the
example i expect it's temporary and will be embedded into the index (but then the example also show you need the Taxonomy to drill down facet). So can the Taxonomy be tangled in some way with the index so that the are handled althogeter with ReferenceManager?
If is not may i just use the same folder i use for storing index?
Here is a more detailed list of point that confuse me :
In my scenario i am indexing the document asyncrhonously (background process) and then fetching the indext ASAP throught ReferenceManager in ASP.NET application. I hope this way to fetch the index is compatibile with DirectoryTaxonomyWriter needed by facets.
Then i modified the code i write introducing the taxonomy writer as indicated in the example, but i am a bit confused, seems like i can't store DirectoryTaxonomyWriter into the same folder of index because the folder is locked, need i to persist it or it will be embedded into the index (so a RAMDirectory is enougth)? if i need to persist it in a different direcotry, can i safely persist it into subdirectory?
Here the code i am actually using :
private static void BuildIndex (IndexEntry entry)
{
string targetFolder = ConfigurationManager.AppSettings["IndexFolder"] ?? string.Empty;
//** LOG
if (System.IO.Directory.Exists(targetFolder) == false)
{
string message = #"Index folder not found";
_fileLogger.Error(message);
_consoleLogger.Error(message);
return;
}
var metadata = JsonConvert.DeserializeObject<IndexMetadata>(File.ReadAllText(entry.MetdataPath) ?? "{}");
string[] header = new string[0];
List<dynamic> csvRecords = new List<dynamic>();
using (var reader = new StreamReader(entry.DataPath))
{
CsvConfiguration csvConfiguration = new CsvConfiguration(CultureInfo.InvariantCulture);
csvConfiguration.AllowComments = false;
csvConfiguration.CountBytes = false;
csvConfiguration.Delimiter = ",";
csvConfiguration.DetectColumnCountChanges = false;
csvConfiguration.Encoding = Encoding.UTF8;
csvConfiguration.HasHeaderRecord = true;
csvConfiguration.IgnoreBlankLines = true;
csvConfiguration.HeaderValidated = null;
csvConfiguration.MissingFieldFound = null;
csvConfiguration.TrimOptions = CsvHelper.Configuration.TrimOptions.None;
csvConfiguration.BadDataFound = null;
using (var csvReader = new CsvReader(reader, csvConfiguration))
{
csvReader.Read();
csvReader.ReadHeader();
csvReader.Read();
header = csvReader.HeaderRecord;
csvRecords = csvReader.GetRecords<dynamic>().ToList();
}
}
string targetDirectory = Path.Combine(targetFolder, "Index__" + metadata.Boundle + "__" + DateTime.Now.ToString("yyyyMMdd_HHmmss") + "__" + Path.GetRandomFileName().Substring(0, 6));
System.IO.Directory.CreateDirectory(targetDirectory);
//** LOG
{
string message = #"..creating index : {0}";
_fileLogger.Information(message, targetDirectory);
_consoleLogger.Information(message, targetDirectory);
}
using (var dir = FSDirectory.Open(targetDirectory))
{
using (DirectoryTaxonomyWriter taxoWriter = new DirectoryTaxonomyWriter(dir))
{
Analyzer analyzer = metadata.GetAnalyzer();
var indexConfig = new IndexWriterConfig(LuceneVersion.LUCENE_48, analyzer);
using (IndexWriter writer = new IndexWriter(dir, indexConfig))
{
long entryNumber = csvRecords.Count();
long index = 0;
long lastPercentage = 0;
foreach (dynamic csvEntry in csvRecords)
{
Document doc = new Document();
IDictionary<string, object> dynamicCsvEntry = (IDictionary<string, object>)csvEntry;
var indexedMetadataFiled = metadata.IdexedFields;
foreach (string headField in header)
{
if (indexedMetadataFiled.ContainsKey(headField) == false || (indexedMetadataFiled[headField].NeedToBeIndexed == false && indexedMetadataFiled[headField].NeedToBeStored == false))
continue;
var field = new Field(headField,
((string)dynamicCsvEntry[headField] ?? string.Empty).ToLower(),
indexedMetadataFiled[headField].NeedToBeStored ? Field.Store.YES : Field.Store.NO,
indexedMetadataFiled[headField].NeedToBeIndexed ? Field.Index.ANALYZED : Field.Index.NO
);
doc.Add(field);
var facetField = new FacetField(headField, (string)dynamicCsvEntry[headField]);
doc.Add(facetField);
}
long percentage = (long)(((decimal)index / (decimal)entryNumber) * 100m);
if (percentage > lastPercentage && percentage % 10 == 0)
{
_consoleLogger.Information($"..indexing {percentage}%..");
lastPercentage = percentage;
}
writer.AddDocument(doc);
index++;
}
writer.Commit();
}
}
}
//** LOG
{
string message = #"Index Created : {0}";
_fileLogger.Information(message, targetDirectory);
_consoleLogger.Information(message, targetDirectory);
}
}

c# How to cast from 'iTextSharp.text.pdf.PdfArray' to 'iTextSharp.text.pdf.PRIndirectReference'

I was using this piece of code till today and it was working fine:
for (int page = 1; page <= reader.NumberOfPages; page++)
{
var cpage = reader.GetPageN(page);
var content = cpage.Get(PdfName.CONTENTS);
var ir = (PRIndirectReference)content;
var value = reader.GetPdfObject(ir.Number);
if (value.IsStream())
{
PRStream stream = (PRStream)value;
var streamBytes = PdfReader.GetStreamBytes(stream);
var tokenizer = new PRTokeniser(new RandomAccessFileOrArray(streamBytes));
try
{
while (tokenizer.NextToken())
{
if (tokenizer.TokenType == PRTokeniser.TK_STRING)
{
string strs = tokenizer.StringValue;
if (!(br = excludeList.Any(st => strs.Contains(st))))
{
//strfor += tokenizer.StringValue;
if (!string.IsNullOrWhiteSpace(strs) &&
!stringsList.Any(i => i == strs && excludeHeaders.Contains(strs)))
stringsList.Add(strs);
}
}
}
}
finally
{
tokenizer.Close();
}
}
}
But today I got an exception for some pdf file: Unable to cast object of type 'iTextSharp.text.pdf.PdfArray' to type 'iTextSharp.text.pdf.PRIndirectReference
On debugging I got to know that the error is at this line: var ir = (PRIndirectReference)content;. That's because the pdf content that I'm extracting, I get it in the form of ArrayList, as you can see from the below image:
It would be really grateful if anyone can help me with this. Thanks in advance.
EDIT :
The pdf contents are paragraphs, tables, headers & footers, images in few cases. But I'm not bothered of images as I'm bypassing them.
As you can see from the code I'm trying to add the words into a string list, so I expect the output as plain text; words to be specific.

That was real easy! Don't know why I couldn't make out.
PdfReader reader = new PdfReader(name);
List<string> stringsList = new List<string>();
for (int page = 1; page <= reader.NumberOfPages; page++)
{
//directly get the contents into a byte stream
var streamByte = reader.GetPageContent(page);
var tokenizer = new PRTokeniser(new RandomAccessFileOrArray(streamByte));
var sb = new StringBuilder(); //use a string builder instead
try
{
while (tokenizer.NextToken())
{
if (tokenizer.TokenType == PRTokeniser.TK_STRING)
{
var currentText = tokenizer.StringValue;
currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
sb.Append(tokenizer.StringValue);
}
}
}
finally
{
//add appended strings into a string list
if(sb != null)
stringsList.Add(sb.ToString());
tokenizer.Close();
}
}

What is the most efficient way to substring specific portions of a text to a list of objects

I have the following vCard text, my purpose is to parse the text to a list of vCard objects
BEGIN:VCARD
VERSION:2.1
N:Kleit;Ali;;;
FN:Ali Kleit
TEL;CELL:70101010
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:Kleit;Saeed;;;
FN:Saeed Kleit
TEL;CELL:03494949
END:VCARD
the following is my code to do that
List<string> cards = new List<string>();
if (text != null)
{
while (text.Length != 0)
{
int idx_begin = text.IndexOf("BEGIN:VCARD");
if (idx_begin == -1)
break;
string endToken = "END:VCARD";
int idx_end = text.IndexOf(endToken);
if (idx_end == -1)
break;
string card = text.Substring(idx_begin, idx_end + endToken.Length);
text = text.Substring(idx_end + endToken.Length);
cards.Add(card);
}
}
next, using Thought.vCards.vCard .NET Library parser to parse each found vCard text
List<Thought.vCards.vCard> vCards = new List<Thought.vCards.vCard>();
List<string> failedStrings = new List<string>();
foreach (string card in cards)
{
using (TextReader sr = new StringReader(card))
{
var vCard = new Thought.vCards.vCard(sr);
if (vCard == null)
{
failedStrings.Add(card);
continue;
}
vCards.Add(vCard);
}
}
Is there any more efficient way to accomplish that knowing that the text might be in an incorrect format?

Something like this?
var vcards = File.ReadAllText(Path.Combine(Path.GetDirectoryName(Util.CurrentQueryPath), "Contacts.vcf"));
var vcardRe = new Regex(#"BEGIN:VCARD\s+(.+?)\s+END:VCARD", RegexOptions.Compiled | RegexOptions.Singleline);
var res = vcardRe.Matches(vcards)
.Cast<Match>()
.Select(x => x.Groups[0].Captures.Cast<Capture>().Select(c => c.Value).Last())
;
List<Thought.vCards.vCard> vCards = new List<Thought.vCards.vCard>();
List<string> failedStrings = new List<string>();
foreach(string card in res)
{
using (TextReader sr = new StringReader(card))
{
var vCard = new Thought.vCards.vCard(sr);
if (vCard == null)
{
failedStrings.Add(card);
continue;
}
vCards.Add(vCard);
}
}
vCards.Dump();

File not being released by IIS when processing

In an ASP.Net MVC4 application, I'm using the following code to process a Go To Webinar Attendees report (CSV format).
For some reason, the file that is being loaded is not being released by IIS and it is causing issues when attempting to process another file.
Do you see anything out of the ordinary here?
The CSVHelper (CsvReader) is from https://joshclose.github.io/CsvHelper/
public AttendeesData GetRecords(string filename, string webinarKey)
{
StreamReader sr = new StreamReader(Server.MapPath(filename));
CsvReader csvread = new CsvReader(sr);
csvread.Configuration.HasHeaderRecord = false;
List<AttendeeRecord> record = csvread.GetRecords<AttendeeRecord>().ToList();
record.RemoveRange(0, 7);
AttendeesData attdata = new AttendeesData();
attdata.Attendees = new List<Attendee>();
foreach (var rec in record)
{
Attendee aa = new Attendee();
aa.Webinarkey = webinarKey;
aa.FullName = String.Concat(rec.First_Name, " ", rec.Last_Name);
aa.AttendedWebinar = 0;
aa.Email = rec.Email_Address;
aa.JoinTime = rec.Join_Time.Replace(" CST", "");
aa.LeaveTime = rec.Leave_Time.Replace(" CST", "");
aa.TimeInSession = rec.Time_in_Session.Replace("hour", "hr").Replace("minute", "min");
aa.Makeup = 0;
aa.RegistrantKey = Registrants.Where(x => x.email == rec.Email_Address).FirstOrDefault().registrantKey;
List<string> firstPolls = new List<string>()
{
rec.Poll_1.Trim(), rec.Poll_2.Trim(),rec.Poll_3.Trim(),rec.Poll_4.Trim()
};
int pass1 = firstPolls.Count(x => x != "");
List<string> secondPolls = new List<string>()
{
rec.Poll_5.Trim(), rec.Poll_6.Trim(),rec.Poll_7.Trim(),rec.Poll_8.Trim()
};
int pass2 = secondPolls.Count(x => x != "");
aa.FirstPollCount = pass1;
aa.SecondPollCount = pass2;
if (aa.TimeInSession != "")
{
aa.AttendedWebinar = 1;
}
if (aa.FirstPollCount == 0 || aa.SecondPollCount == 0)
{
aa.AttendedWebinar = 0;
}
attdata.Attendees.Add(aa);
attendeeToDB(aa); // adds to Oracle DB using EF6.
}
// Should I call csvread.Dispose() here?
sr.Close();
return attdata;
}

Yes. You have to dispose objects too.
sr.Close();
csvread.Dispose();
sr.Dispose();
Better strategy to use using keyword.

You should use usings for your streamreaders and writers.
You should follow some naming conventions (Lists contains always multiple entries, rename record to records)
You should use clear names (not aa)

In TFS API, how do I get the full class name for a given test?

I have an ITestCaseResult object in hand and I can't figure out how to extract the Test Class information from it. The object contains the test method's name in the TestCaseTitle property but there are a lot of duplicate titles across our code base and I would like more information.
Assuming I have Foo.Bar assembly with class Baz and method ThisIsATestMethod, I currently only have access to the ThisIsATestMethod information from the title, but I would like to obtain Foo.Bar.Baz.ThisIsATestMethod.
How can I do that using the TFS API?
Here's some stripped down code:
var def = buildServer.CreateBuildDetailSpec(teamProject.Name);
def.MaxBuildsPerDefinition = 1;
def.QueryOrder = BuildQueryOrder.FinishTimeDescending;
def.DefinitionSpec.Name = buildDefinition.Name;
def.Status = BuildStatus.Failed | BuildStatus.PartiallySucceeded | BuildStatus.Succeeded;
var build = buildServer.QueryBuilds(def).Builds.SingleOrDefault();
if (build == null)
return;
var testRun = tms.GetTeamProject(teamProject.Name).TestRuns.ByBuild(build.Uri).SingleOrDefault();
if (testRun == null)
return;
foreach (var outcome in new[] { TestOutcome.Error, TestOutcome.Failed, TestOutcome.Inconclusive, TestOutcome.Timeout, TestOutcome.Warning })
ProcessTestResults(bd, testRun, outcome);
...
private void ProcessTestResults(ADBM.BuildDefinition bd, ITestRun testRun, TestOutcome outcome)
{
var results = testRun.QueryResultsByOutcome(outcome);
if (results.Count == 0)
return;
var testResults = from r in results // The "r" in here is an ITestCaseResult. r.GetTestCase() is always null.
select new ADBM.Test() { Title = r.TestCaseTitle, Outcome = outcome.ToString(), ErrorMessage = r.ErrorMessage };
}

You can do this by downloading the TRX file from TFS and parsing it manually. To download the TRX file for a test run, do this:
TfsTeamProjectCollection tpc = TfsTeamProjectCollectionFactory.GetTeamProjectCollection(new Uri("http://my-tfs:8080/tfs/DefaultCollection"));
ITestManagementService tms = tpc.GetService<ITestManagementService>();
ITestManagementTeamProject tmtp = tms.GetTeamProject("My Project");
ITestRunHelper testRunHelper = tmtp.TestRuns;
IEnumerable<ITestRun> testRuns = testRunHelper.ByBuild(new Uri("vstfs:///Build/Build/123456"));
var failedRuns = testRuns.Where(run => run.QueryResultsByOutcome(TestOutcome.Failed).Any()).ToList();
failedRuns.First().Attachments[0].DownloadToFile(#"D:\temp\myfile.trx");
Then parse the TRX file (which is XML), looking for the <TestMethod> element, which contains the fully-qualified class name in the "className" attribute:
<TestMethod codeBase="C:/Builds/My.Test.AssemblyName.DLL" adapterTypeName="Microsoft.VisualStudio.TestTools.TestTypes.Unit.UnitTestAdapter, Microsoft.VisualStudio.QualityTools.Tips.UnitTest.Adapter, Version=11.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a" className="My.Test.ClassName, My.Test.AssemblyName, Version=2.0.0.0, Culture=neutral, PublicKeyToken=null" name="Test_Method" />

Since the details of the testcase are stored in the work item you can fetch the data by accessing the work item for the test case
ITestCaseResult result;
var testCase = result.GetTestCase();
testCase.WorkItem["Automated Test Name"]; // fqdn of method
testCase.WorkItem["Automated Test Storage"]; // dll

Here you have a way to get the Assembly name:
foreach (ITestCaseResult testCaseResult in failures)
{
string testName = testCaseResult.TestCaseTitle;
ITmiTestImplementation testImplementation = testCaseResult.Implementation as ITmiTestImplementation;
string assembly = testImplementation.Storage;
}
Unfortunately, ITestCaseResult and ITmiTestImplementation don’t seem to contain the namespace of the test case.
Check the last response in this link, that might help.
Good Luck!
EDIT:
This is based on Charles Crain's answer, but getting the class name without having to download to file:
var className = GetTestClassName(testResult.Attachments);
And the method itself:
private static string GetTestClassName(IAttachmentCollection attachmentCol)
{
if (attachmentCol == null || attachmentCol.Count == 0)
{
return string.Empty;
}
var attachment = attachmentCol.First(att => att.AttachmentType == "TmiTestResultDetail");
var content = new byte[attachment.Length];
attachment.DownloadToArray(content, 0);
var strContent = Encoding.UTF8.GetString(content);
var reader = XmlReader.Create(new StringReader(RemoveTroublesomeCharacters(strContent)));
var root = XElement.Load(reader);
var nameTable = reader.NameTable;
if (nameTable != null)
{
var namespaceManager = new XmlNamespaceManager(nameTable);
namespaceManager.AddNamespace("ns", "http://microsoft.com/schemas/VisualStudio/TeamTest/2010");
var classNameAtt = root.XPathSelectElement("./ns:TestDefinitions/ns:UnitTest[1]/ns:TestMethod[1]", namespaceManager).Attribute("className");
if (classNameAtt != null) return classNameAtt.Value.Split(',')[1].Trim();
}
return string.Empty;
}
internal static string RemoveTroublesomeCharacters(string inString)
{
if (inString == null) return null;
var newString = new StringBuilder();
foreach (var ch in inString)
{
// remove any characters outside the valid UTF-8 range as well as all control characters
// except tabs and new lines
if ((ch < 0x00FD && ch > 0x001F) || ch == '\t' || ch == '\n' || ch == '\r')
{
newString.Append(ch);
}
}
return newString.ToString();
}

public string GetFullyQualifiedName()
{
var collection = new TfsTeamProjectCollection("http://tfstest:8080/tfs/DefaultCollection");
var service = collection.GetService<ITestManagementService>();
var tmProject = service.GetTeamProject(project.TeamProjectName);
var testRuns = tmProject.TestRuns.Query("select * From TestRun").OrderByDescending(x => x.DateCompleted);
var run = testRuns.First();
var client = collection.GetClient<TestResultsHttpClient>();
var Tests = client.GetTestResultsAsync(run.ProjectName, run.Id).Result;
var FullyQualifiedName = Tests.First().AutomatedTestName;
return FullyQualifiedName;
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

IText7 and missing GetPageN method in C# - c#

Related

Lucene 4.8 facets usage

c# How to cast from 'iTextSharp.text.pdf.PdfArray' to 'iTextSharp.text.pdf.PRIndirectReference'

What is the most efficient way to substring specific portions of a text to a list of objects

File not being released by IIS when processing

In TFS API, how do I get the full class name for a given test?

Categories

Resources