I'm trying to write a code which, given a path to an item in the TFS repository and two revisions, would compute a difference between the contents file had at these two moments. For now the code might look like this:
using (var projectCollection = new TfsTeamProjectCollection(new Uri(repositoryUrl)))
{
projectCollection.EnsureAuthenticated();
var versionControlServer = (VersionControlServer)projectCollection.GetService(typeof(VersionControlServer));
string path = "$/MyProject/path/to/file.xml"
var before = new DiffItemVersionedFile(versionControlServer, path, VersionSpec.ParseSingleSpec(minRevision.ToString(), null));
var after = new DiffItemVersionedFile(versionControlServer, path, VersionSpec.ParseSingleSpec(maxRevision.ToString(), null));
using (var stream = new MemoryStream())
using (var writer = new StreamWriter(stream))
{
var options = new DiffOptions();
options.Flags = DiffOptionFlags.EnablePreambleHandling;
options.OutputType = DiffOutputType.Unified;
options.TargetEncoding = Encoding.UTF8;
options.SourceEncoding = Encoding.UTF8;
options.StreamWriter = writer;
Difference.DiffFiles(versionControlServer, before, after, options, path, true);
writer.Flush();
var reader = new StreamReader(stream);
var diff = reader.ReadToEnd();
}
}
But once this code is executed, the variable diff is an empty string even though I know for sure the file has been modified between minRevision and maxRevision.
This code will also throw an exception if the file didn't exist at minRevision or was deleted in maxRevision, but this seems to be a problem to solve later, once I get this thing working with files which were only edited.
EDIT
Having checked temp files, I'm sure both versions of the file are downloaded correctly. Something is wrong with the computation of the diff or with writing the diff to a stream or with copying the diff to a string.
Solved. The problem was the reader. After I changed the last two lines to
var diff = Encoding.UTF8.GetString(stream.ToArray());
I got some diff at last.
I know you accepted your answer, and this was asked in 2012, but I recently had to do the same thing, but much prefer using a StreamReader vs .ToArray()
The answer is that you have to reset the MemoryStream before you start reading from it.
add this
stream.Position = 0;
right after you flush the writer
Related
I'm transforming XML using XSLT sheet. The sheet consists of several files, which are included like this:
<xsl:include href="tokens.xsl"/>
<xsl:include href="glayout.xsl"/>
<xsl:include href="scripts.xsl"/>
<xsl:include href="tables.xsl"/>
<xsl:include href="entities.xsl"/>
<xsl:include href="cmarkup.xsl"/>
The transform code looks like following:
// Load text
var reader = XmlReader.Create(new StringReader(text));
// Load transform
XslCompiledTransform myXslTrans = new XslCompiledTransform();
using (var fs = new FileStream(result.FileName, FileMode.Open, FileAccess.Read))
{
var xmlReader = XmlReader.Create(fs);
myXslTrans.Load(xmlReader);
}
// Perform transformation
MemoryStream ms = new MemoryStream();
XmlTextWriter writer = new XmlTextWriter(ms, Encoding.UTF8);
myXslTrans.Transform(reader, null, writer);
// Recover result to string
ms.Seek(0, SeekOrigin.Begin);
var textReader = new StreamReader(ms);
string transformed = textReader.ReadToEnd();
Transform fails on the include lines. I found out, that I may provide my own resolver to provide missing documents, but since their URLs are relative, I'm getting them appended to current application's folder, like:
D:\Dokumenty\Dev\VS\Dev.Editor\Dev.Editor\bin\Debug\tokens.xsl
There are two dirty solutions:
Cut off the application path to retrieve only file name, then search for the file in original sheet's folder (but what if file had a subfolder, like: Include/tokens.xsl?
Temporarily set current directory to the one in which main sheet resides:
var dir = System.IO.Directory.GetCurrentDirectory();
try
{
System.IO.Directory.SetCurrentDirectory(System.IO.Path.GetDirectoryName(result.FileName));
myXslTrans.Load(xmlReader, null, resolver);
}
finally
{
System.IO.Directory.SetCurrentDirectory(dir);
}
But I don't like this solution either. Is there a way to force the XslCompiledTransform to pass the original URLs to the resolver? Or possibly other, more generic solution to this problem?
If you have a file name or URI with the main stylesheet module then use the overload of the Load method taking a string (https://learn.microsoft.com/en-us/dotnet/api/system.xml.xsl.xslcompiledtransform.load?view=netframework-4.8#System_Xml_Xsl_XslCompiledTransform_Load_System_String_) with e.g. myXslTrans.Load(result.FileName).
I have a c# class that takes an HTML and converts it to PDF using wkhtmltopdf.
As you will see below, I am generating 3 PDFs - Landscape, Portrait, and combined of the two.
The properties object contains the html as a string, and the argument for landscape/portrait.
System.IO.MemoryStream PDF = new WkHtmlToPdfConverter().GetPdfStream(properties);
System.IO.FileStream file = new System.IO.FileStream("abc_landscape.pdf", System.IO.FileMode.Create);
PDF.Position = 0;
properties.IsHorizontalOrientation = false;
System.IO.MemoryStream PDF_portrait = new WkHtmlToPdfConverter().GetPdfStream(properties);
System.IO.FileStream file_portrait = new System.IO.FileStream("abc_portrait.pdf", System.IO.FileMode.Create);
PDF_portrait.Position = 0;
System.IO.MemoryStream finalStream = new System.IO.MemoryStream();
PDF.CopyTo(finalStream);
PDF_portrait.CopyTo(finalStream);
System.IO.FileStream file_combined = new System.IO.FileStream("abc_combined.pdf", System.IO.FileMode.Create);
try
{
PDF.WriteTo(file);
PDF.Flush();
PDF_portrait.WriteTo(file_portrait);
PDF_portrait.Flush();
finalStream.WriteTo(file_combined);
finalStream.Flush();
}
catch (Exception)
{
throw;
}
finally
{
PDF.Close();
file.Close();
PDF_portrait.Close();
file_portrait.Close();
finalStream.Close();
file_combined.Close();
}
The PDFs "abc_landscape.pdf" and "abc_portrait.pdf" generate correctly, as expected, but the operation fails when I try to combine the two in a third pdf (abc_combined.pdf).
I am using MemoryStream to preform the merge, and at the time of debug, I can see that the finalStream.length is equal to the sum of the previous two PDFs. But when I try to open the PDF, I see the content of just 1 of the two PDFs.
The same can be seen below:
Additionally, when I try to close the "abc_combined.pdf", I am prompted to save it, which does not happen with the other 2 PDFs.
Below are a few things that I have tried out already, to no avail:
Change CopyTo() to WriteTo()
Merge the same PDF (either Landscape or Portrait one) with itself
In case it is required, below is the elaboration of the GetPdfStream() method.
var htmlStream = new MemoryStream();
var writer = new StreamWriter(htmlStream);
writer.Write(htmlString);
writer.Flush();
htmlStream.Position = 0;
return htmlStream;
Process process = Process.Start(psi);
process.EnableRaisingEvents = true;
try
{
process.Start();
process.BeginErrorReadLine();
var inputTask = Task.Run(() =>
{
htmlStream.CopyTo(process.StandardInput.BaseStream);
process.StandardInput.Close();
});
// Copy the output to a memorystream
MemoryStream pdf = new MemoryStream();
var outputTask = Task.Run(() =>
{
process.StandardOutput.BaseStream.CopyTo(pdf);
});
Task.WaitAll(inputTask, outputTask);
process.WaitForExit();
// Reset memorystream read position
pdf.Position = 0;
return pdf;
}
catch (Exception ex)
{
throw ex;
}
finally
{
process.Dispose();
}
Merging pdf in C# or any other language is not straight forward with out using 3rd party library.
I assume your requirement for not using library is that most Free libraries, nuget packages has limitation or/and cost money for commercial use.
I have made research and found you an Open Source library called PdfClown with nuget package, it is also available for Java. It is Free with out limitation (donate if you like). The library has a lot of features. One such you can merge 2 or more documents to one document.
I supply my example that take a folder with multiple pdf files, merged it and save it to same or another folder. It is also possible to use MemoryStream, but I do not find it necessary in this case.
The code is self explaining, the key point here is using SerializationModeEnum.Incremental:
public static void MergePdf(string srcPath, string destFile)
{
var list = Directory.GetFiles(Path.GetFullPath(srcPath));
if (string.IsNullOrWhiteSpace(srcPath) || string.IsNullOrWhiteSpace(destFile) || list.Length <= 1)
return;
var files = list.Select(File.ReadAllBytes).ToList();
using (var dest = new org.pdfclown.files.File(new org.pdfclown.bytes.Buffer(files[0])))
{
var document = dest.Document;
var builder = new org.pdfclown.tools.PageManager(document);
foreach (var file in files.Skip(1))
{
using (var src = new org.pdfclown.files.File(new org.pdfclown.bytes.Buffer(file)))
{ builder.Add(src.Document); }
}
dest.Save(destFile, SerializationModeEnum.Incremental);
}
}
To test it
var srcPath = #"C:\temp\pdf\input";
var destFile = #"c:\temp\pdf\output\merged.pdf";
MergePdf(srcPath, destFile);
Input examples
PDF doc A and PDF doc B
Output example
Links to my research:
https://csharp-source.net/open-source/pdf-libraries
https://sourceforge.net/projects/clown/
https://www.oipapio.com/question-3526089
Disclaimer: A part of this answer is taken from my my personal web site https://itbackyard.com/merge-multiple-pdf-files-to-one-pdf-file-in-c/ with source code to github.
This answer from Stack Overflow (Combine two (or more) PDF's) by Andrew Burns works for me:
using (PdfDocument one = PdfReader.Open("pdf 1.pdf", PdfDocumentOpenMode.Import))
using (PdfDocument two = PdfReader.Open("pdf 2.pdf", PdfDocumentOpenMode.Import))
using (PdfDocument outPdf = new PdfDocument())
{
CopyPages(one, outPdf);
CopyPages(two, outPdf);
outPdf.Save("file1and2.pdf");
}
void CopyPages(PdfDocument from, PdfDocument to)
{
for (int i = 0; i < from.PageCount; i++)
{
to.AddPage(from.Pages[i]);
}
}
That's not quite how PDFs work. PDFs are structured files in a specific format.
You can't just append the bytes of one to the other and expect the result to be a valid document.
You're going to have to use a library that understands the format and can do the operation for you, or developing your own solution.
PDF files aren't just text and images. Behind the scenes there is a strict file format that describes things like PDF version, the objects contained in the file and where to find them.
In order to merge 2 PDFs you'll need to manipulate the streams.
First you'll need to conserve the header from only one of the files. This is pretty easy since it's just the first line.
Then you can write the body of the first page, and then the second.
Now the hard part, and likely the part that will convince you to use a library, is that you have to re-build the xref table. The xref table is a cross reference table that describes the content of the document and more importantly where to find each element. You'd have to calculate the byte offset of the second page, shift all of the elements in it's xref table by that much, and then add it's xref table to the first. You'll also need to ensure you create objects in the xref table for the page break.
Once that's done, you need to re-build the document trailer which tells an application where the various sections of the document are among other things.
See https://resources.infosecinstitute.com/pdf-file-format-basic-structure/
This is not trivial and you'll end up re-writing lots of code that already exists.
Major part of my job is automation of engineering process, so I have to create simple program, that compares 2 different version of 1 drawn element, by overlapping drawings, in order to review differences. Drawings represent single sheet PDF files.
I'm using .Net Framework and C# 4.5;
iTextSharp library for editing PDF files;
Initially, I'm getting 2 files, read them and create the third one, that contains the result;
var file1 = "file1.pdf";
var file2 = "file2.pdf";
var result = "result.pdf";
using (Stream f1Stream = new FileStream(file1, FileMode.Open))
using (Stream f2Stream = new FileStream(file2, FileMode.Open))
using (Stream resultStream = new FileStream(result, FileMode.Create, FileAccess.ReadWrite))
using (PdfReader f2Reader = new PdfReader(f2Stream))
using (PdfReader f1Reader = new PdfReader(f1Stream))
{
PdfStamper pdfStamper = new PdfStamper(f1Reader, resultStream);
PdfContentByte pdfContentByte = pdfStamper.GetOverContent(1);
var page = pdfStamper.GetImportedPage(f2Reader, 1);
pdfContentByte.AddTemplate(page,2,2);
pdfStamper.Close();
}
The code above makes just that, but a few sequential questions are arising
I want to change the color of elements in the result file i.e. elements that come from the 1st drawing in green and the others from 2nd one - in red color. Maybe I have to change the color of entities in initial 2 PDFs and then to merge;
Initial files have layers, and because they are two sequential revision of the same construction element and differences between them are very few, they have identical layers. And I want to have " layerFoo " and " layerFoo# " in the result PDF. Maybe I have to rename all the layers in one the the 2 initial PDFs and then to merge them.
Аll suggestions are welcomed including usage of another library :)
--> Edit1
Big thanks to Chris Haas! You are absolutely right for token type and string value! iTextRUPS is great helping tool for understanding the structure of PDF files.
Following code is taken from the post that you pointed me out.
The following statement:
stream.SetData(System.Text.Encoding.ASCII.GetBytes(String.Join("\n", newBuf.ToArray())));
updates the stream of the file and then with
using (var fs = new FileStream(file2, FileMode.Create, FileAccess.Write, FileShare.None))
{
var stamper = new PdfStamper(reader, fs);
reader.SetPageContent(1,reader.GetPageContent(1));
stamper.Close();
}
the new file is created with updated stream.
I made 1 simple test file with only 2 lines, change their color and save back to a new file.
No problem!
After that, I tried the same simple operation with real file, that represents real drawing of construction element, the result file was less than half of the original and was broken.
What comes to mind is the updated stream is saved to the new file but the other information inside other containers is not saved, it's just the stream.
Because I stuck with that, I continue to the next step of investigation -> layers
I wrote this code in order to get available layers in a PDF file. I will try to insert more records into layers dictionary to see what will happen.
var resourcesReference = page.Get(PdfName.RESOURCES) as PdfIndirectReference;
var resources = PdfReader.GetPdfObject(resourcesReference) as PdfDictionary;
var propertiesObjhectReferences = resources.Get(PdfName.PROPERTIES);
var properties = PdfReader.GetPdfObject(propertiesObjhectReferences) as PdfDictionary;
foreach (var property in properties.Keys)
{
var layerReference = properties.Get(property);
var layerObject = PdfReader.GetPdfObject(layerReference) as PdfDictionary;
foreach (var key in layerObject.Keys)
{
if (key.ToString()!=PdfName.TYPE.ToString())
{
var layerName = layerObject.GetAsString(key).ToUnicodeString();
}
}
}
If I come back to my main goal from the top of the post, I tends to insert the stream and layers from first file into second in order to obtain result file, that contains objects from the previous 2, painted in different colors + layers from both.
Feel free to suggest me another, more simpler and beautiful solution! I will be happy if you revise my code and correct it! Thank You very much!
EDIT 2
I will simplify the work because the lack of time, just change the color of entities inside one PDF and put it on the background on the other.
const string Pdf = "file1.pdf";
var reader = new PdfReader(Pdf);
var page = reader.GetPageN(1);
var objectReference = page.Get(PdfName.CONTENTS) as PdfIndirectReference;
var stream = (PRStream)PdfReader.GetPdfObject(objectReference);
var streamBytes = PdfReader.GetStreamBytes(stream);
var tokenizer = new PRTokeniser(new RandomAccessFileOrArray(streamBytes));
var newBuf = new List<string>();
while (tokenizer.NextToken())
{
var token = tokenizer.StringValue;
newBuf.Add(token);
if (tokenizer.TokenType == PRTokeniser.TokType.OTHER
&& newBuf[newBuf.Count - 1].Equals("S", StringComparison.CurrentCultureIgnoreCase))
{
newBuf.Insert(newBuf.Count - 1, "0");
newBuf.Insert(newBuf.Count - 1, "1");
newBuf.Insert(newBuf.Count - 1, "1");
newBuf.Insert(newBuf.Count - 1, "RG");
}
}
var resultStream = String.Join("\n", newBuf.ToArray());
stream.SetData(System.Text.Encoding.ASCII.GetBytes(resultStream));
var file2 = Pdf.Insert(Pdf.Length - 4, "Result");
using (var fs = new FileStream(file2, FileMode.Create, FileAccess.Write, FileShare.None))
{
var stamper = new PdfStamper(reader, fs);
reader.SetPageContent(1, reader.GetPageContent(1));
stamper.Close();
}
Result PDF is broken and iTextRUPS throws exception when try to get the stream data from the page.
I have a few multimillion lined text files located in a directory, I want to read line by line and replace “|” with “\” and then write out the line to a new file. This code might work just fine but I’m not seeing any resulting text file, or it might be I’m just be impatient.
{
string startingdir = #"K:\qload";
string dest = #"K:\D\ho\jlg\load\dest";
string[] files = Directory.GetFiles(startingdir, "*.txt");
foreach (string file in files)
{
StringBuilder sb = new StringBuilder();
using (FileStream fs = new FileStream(file, FileMode.Open))
using (StreamReader rdr = new StreamReader(fs))
{
while (!rdr.EndOfStream)
{
string begdocfile = rdr.ReadLine();
string replacementwork = docfile.Replace("|", "\\");
sb.AppendLine(replacementwork);
FileInfo file_info = new FileInfo(file);
string outputfilename = file_info.Name;
using (FileStream fs2 = new FileStream(dest + outputfilename, FileMode.Append))
using (StreamWriter writer = new StreamWriter(fs2))
{
writer.WriteLine(replacementwork);
}
}
}
}
}
DUHHHHH Thanks to everyone.
Id10t error.
Get rid of the StringBuilder, and do not reopen the output file for each line:
string startingdir = #"K:\qload";
string dest = #"K:\D\ho\jlg\load\dest";
string[] files = Directory.GetFiles(startingdir, "*.txt");
foreach (string file in files)
{
var outfile = Path.Combine(dest, Path.GetFileName(file));
using (StreamReader reader = new StreamReader(file))
using (StreamWriter writer = new StreamWriter(outfile))
{
string line = reader.ReadLine();
while (line != null)
{
writer.WriteLine(line.Replace("|", "\\"));
line = reader.ReadLine();
}
}
}
Why are you using a StringBuilder - you are just filling up your memory without doing anything with it.
You should also move the FileStream and StreamWriter using statements to outside of your loop - you are re-creating your output streams for every line, causing unneeded IO in the form of opening and closing the file.
Use Path.Combine(dest, outputfilename), from your code it looks like you're writing to the file K:\D\ho\jlg\load\destouputfilename.txt
This code might work just fine but I’m not seeing any resulting text file, or it might be I’m just be impatient.
Have you considered having a Console.WriteLine in there to check the progress. Sure, it's going to slow down performance a tiny tiny bit - but you'll know what's going on.
It looks like you might want to do a Path.Combine, so that instead of new FileStream(dest + outputfilename), you have new FileStream(Path.Combine(dest + outputfilename)), which will create the files in the directory that you expect, rather than creating them in K:\D\ho\jlg\load.
However, I'm not sure why you're writing to a StringBuilder that you're not using, or why you're opening and closing the file stream and stream writer on each line that you're writing, is that to force the writer to flush it's output? If so, it might be easier to just flush the writer/stream on each write.
you're opening and closing the output strean for each line in the output, you'll have to be very patient!
open it once outside the loop.
I guess the problem is here:
string begdocfile = rdr.ReadLine();
string replacementwork = docfile.Replace("|", "\\");
you're reading into begdocfile variable but replacing chars in docfile which I guess is empty
string replacementwork = docfile.Replace("|", "\\");
I believe the above line in your code is incorrect : it should be "begdocfile.Replace ..." ?
I suggest you focus on getting as much of the declaration and "name manufacture" out of the inner loop as possible : right now you are creating new FileInfo objects, and path names for every single line you read in every file : that's got to be hugely expensive.
make a single pass over the list of target files first, and create, at one time, the destination files, perhaps store them in a List for easy access, later. Or a Dictionary where "string" will be the new file path associated with that FileInfo ? Another strategy : just copy the whole directory once, and then operate to directly change the copied files : then rename them, rename the directory, whatever.
move every variable declaration out of that inner loop, and within the using code blocks you can.
I suspect you are going to hear from someone here at more of a "guru level" shortly who might suggest a different strategy based on a more profound knowledge of streams than I have, but that's a guess.
Good luck !
I am trying to write some text to the file using StreamWriter and getting the
path for the file from FolderDialog selected folder. My code works fine if the
file does not already exist. but if the file already exist it throws the Exception
that the file is in used by other process.
using(StreamWriter sw = new StreamWriter(FolderDialog.SelectedPath + #"\my_file.txt")
{
sw.writeLine("blablabla");
}
Now if I write like this:
using(StreamWriter sw = new StreamWriter(#"C:\some_folder\my_file.txt")
it works fine with an existing file.
It may have to do with the way you are combining your path and filename. Give this a try:
using(StreamWriter sw = new StreamWriter(
Path.Combine(FolderDialog.SelectedPath, "my_file.txt"))
{
sw.writeLine("blablabla");
}
Also, check to make sure the FolderDialog.SelectedPath value isn't blank. :)
The file is already in use, so it cannot be overwritten. However, note that this message isn't always entirely accurate - the file may in fact be in use by your own process. Check your usage patterns.
This is a cheap answer, but have you tried this workaround?
string sFileName= FolderDialog.SelectedPath + #"\my_file.txt";
using(StreamWriter sw = new StreamWriter(sFileName))
{
sw.writeLine("blablabla");
}
The other thing I would suggest is verifying that FolderDialog.SelectedPath + "\my_file.txt" is equal to the hard coded path of "C:\some_folder\my_file.txt".
Check whether the file is in fact in use by some other process.
To do that, run Process Explorer, press Ctrl+F, type the filename, and click Find.
As an aside, the best way to accomplish this task is like this:
using(StreamWriter sw = File.AppendText(Path.Combine(FolderDialog.SelectedPath, #"my_file.txt")))
EDIT: Do NOT put a slash in the second argument to Path.Combine.
Try this
using (StreamWriter sw = File.AppendText(#"C:\some_folder\my_file.txt"))
{
sw.writeLine("blablabla");
}
it will only work in existing file, so to validate if the file is new or already exists, do something like
string path = #"C:\some_folder\my_file.txt";
if (!File.Exists(path))
{
// Create a file to write to.
using (StreamWriter sw = File.CreateText(path))
{
//once file was created insert the text or the columns
sw.WriteLine("blbalbala");
}
}
// if already exists just write
using (StreamWriter sw = File.AppendText(#"C:\some_folder\my_file.txt"))
{
sw.writeLine("blablabla");
}