Building a large file text-editor - c#

So I'm building my own custom-drawn large text edit control, which I want to be able to open a large text file, and only a small chunk of it. I am able to successfully do that, but the problem is large lines. I have it so that every time I read a char (which shouldn't be too many as it's limited to the size of the screen), it will measure the length of that line. If it's greater than the width of the screen, it will stop rendering that line. However, I still have to find the position of the next line which means having to read that line (which could be hundreds of GB large). Here is my code so far:
if (encoding == Encoding.ASCII || encoding == Encoding.Default)
{
fileStream.Seek(visibleLinesRanges[0].Start, SeekOrigin.Begin);
int maxNumberLines = MaxLinesOnScreen;
int linesRead = 0;
int read;
byte[] byteBuffer = new byte[bufferSize];
StringBuilder text = new StringBuilder();
string lineText = string.Empty;
bool dontAddLine = false;
void AddLine()
{
if (dontAddLine)
{
lineText = string.Empty;
return;
}
linesRead++;
bool visible = linesRead >= maxNumberLines;
if (vScrollBar.Visible != visible)
{
vScrollBar.Visible = visible;
}
text.Append(lineText);
lineText = string.Empty;
}
while ((read = fileStream.Read(byteBuffer, 0, bufferSize)) > 0)
{
char[] charBuffer = encoding.GetChars(byteBuffer, 0, read);
for (int i = 0; i < read; i++)
{
char c = charBuffer[i];
if (!dontAddLine)
{
string newLineText = lineText + c;
if (TextRenderer.MeasureText(newLineText, Font).Width > Width)
{
lineText += "\n";
AddLine();
if (linesRead > maxNumberLines)
{
goto Drawing;
}
dontAddLine = true;
}
else
{
lineText = newLineText;
}
}
if (c == '\r')
{
char c1 = ReadCharSingle(charBuffer, i, read);
if (c1 != '\n')
{
AddLine();
if (linesRead > maxNumberLines)
{
goto Drawing;
}
dontAddLine = false;
}
}
else if (c == '\n')
{
AddLine();
if (linesRead > maxNumberLines)
{
goto Drawing;
}
dontAddLine = false;
}
}
}
Drawing:
TextRenderer.DrawText(g, text.ToString(), Font, Point.Empty, Color.Gainsboro);
}
This works, and barely uses any memory, but the problem is that it's very slow when opening a file with a single line that is about ~120MB in size. Also, whenever I resize the window, I have to call this code (in the Paint event), so that the text is updated.
This is very slow. Is there any way to speed this up? Thanks.

Related

C# - Getting all of data from device via Serial Port and Detect the Control Character (ACK, SOH, ...)

I can easily receive the data's response from device and show this in Textbox with Serial.ReadExisting(). When showing the data, the problem is some of control character (ACK, SOH, ETX, ...) unprintable.
I try to detect the control character from Serial response with the code below but something wrong in the comparison.
The code is:
public bool read_port(ref string rs232data, int timeout)
{
try
{
int index_read = 0;
int total_read = 0;
int i = 0;
int delay_ms = 1;
byte[] buffer = new byte[serialPort1.ReadBufferSize];
rs232data = "";
string buffer_str = "";
if ((serialPort1.IsOpen == false))
{
#if DEBUG
MessageBox.Show("COM port is not opening");
#endif
return false;
}
do
{
Application.DoEvents();
PauseForMilliSeconds(delay_ms); //System.Threading.Thread.Sleep(delay_ms);
i++;
//serial port receive data
if (serialPort1.BytesToRead <= 0) continue;
if (i > 1) i--;
index_read = serialPort1.Read(buffer, total_read, serialPort1.BytesToRead);
total_read += index_read;
buffer_str = ToStringLsb(buffer);
//// Find a Start byte
int ack_index = buffer_str.LastIndexOf(ToStringLsb(ACK));
int nak_index = buffer_str.LastIndexOf(ToStringLsb(NAK));
//Find a Stop byte
int stop_index_etx = buffer_str.LastIndexOf(ToStringLsb(ETX));
int stop_index_lf = buffer_str.LastIndexOf(ToStringLsb(LF));
if ((stop_index_etx < 0) && (stop_index_lf < 0) && (ack_index < 0)) continue; // Can't find a byte Stop
if (stop_index_etx > 0)
{
if ((stop_index_etx + 1) < total_read)
{
stop_index_etx++;
}
else
{
if (buffer[total_read - 2] == ETX)
{
stop_index_etx--;
}
else continue;
}
}
rs232data = buffer_str.Substring(0, total_read);
return true;
}
while (i < timeout);
return false;
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
return false;
}
}
public static string ToStringLsb(byte bytevalue)
{
try
{
byte[] bytearray = { bytevalue };
return System.Text.Encoding.GetEncoding("iso-8859-1").GetString(bytearray);
}
catch (System.Exception ex)
{
MessageBox.Show(ex.Message);
return "";
}
}
public static string ToStringLsb(byte[] bytearray)
{
return System.Text.Encoding.GetEncoding("iso-8859-1").GetString(bytearray);
}
You need to look at a ASCII map and determine which values are the control values you are looking for.
So for example I would define the following control characters in my code, then in your DataReceived event you can look for those characters.
char ACK = (char)6;
char NAK = (char)21;
char SOH = (char)1;
char LF = (char)10;
StringBuilder sb = new StringBuilder();
private void serialPort1_DataReceived(object sender, System.IO.Ports.SerialDataReceivedEventArgs e)
{
string Data = serialPort1.ReadExisting();
foreach (char c in Data)
{
if (c == LF)
{
sb.Append(c);
CurrentLine = sb.ToString();
sb.Clear();
//parse CurrentLine here or print it to textbox
}
if (c == ACK)
{
sb.Append("<ACK>"); //or whatever you want to print
}
else
{
sb.Append(c);
}
}
}
edit:
To answer some of your questions in the comments below. The DataReceived event fires when it gets characters in it's buffer, but it could fire while only getting half of your message. So when using this event, you have to build a string until you know you have the whole message. In my example above, I'm assuming a LF (Line Feed) indicates I have the whole message. For you, you would use whatever character you are searching for that marks the end of your message (maybe ACK in your case?). You can choose to append that character to your string or not, it depends on your requirements. In my example you can see I append the LF but you could easily take that line out.

C# TcpClient streamreader with eventhandler not all messages are processed

I'm reading continuously from a TcpClient streamreader.
The data coming from the stream is raw XML. There is no message framing. So there is now reliable method to know when the message is finished. Though I only have 3 XML messages coming from the server. But when they are coming is unknown. And I can't configure/program the server.
This is my code so far.
public void Start()
{
StreamReader reader = new StreamReader(_tcpClient.GetStream());
char[] chars = new char[Int16.MaxValue];
while (!_requestStop)
{
try
{
while ((reader.Read(chars, 0, chars.Length)) != 0)
{
string s = new string(chars);
s = removeEmptyChars(s);
if (s.IndexOf("<foo", StringComparison.OrdinalIgnoreCase) > 0 &&
s.IndexOf("</foo>", StringComparison.OrdinalIgnoreCase) > 0)
{
Console.WriteLine(s);
OnAlarmResponseComplete(new CustomEventArgs(s));
}
if (s.IndexOf("<bar", StringComparison.OrdinalIgnoreCase) > 0 &&
s.IndexOf("</bar>", StringComparison.OrdinalIgnoreCase) > 0)
{
Console.WriteLine(s);
OnAckComplete(new CustomEventArgs(s));
}
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
//break;
}
}
reader.Close();
Console.WriteLine("Stopping TcpReader thread!");
}
Then in my main thread I'm processing the events. I'm adding them to a list.
Where I process the list.
When I'm debugging my application, I will be receiving 10 foo and 10 bar messages. And in my lists I have only 1 foo and 1 bar message stored.
Are the eventhandlers to slow to process this? Or am I missing something?
Here is the code you should use to cover all kinds of input issues (foo or bar received partially, foo and bar received together, etc..)
I can't say I approve using string parsing to handle XML content, but anyways.
private static string ProcessAndTrimFooBar(string s, out bool foundAny)
{
foundAny = false;
int fooStart = s.IndexOf("<foo", StringComparison.OrdinalIgnoreCase);
int fooEnd = s.IndexOf("</foo>", StringComparison.OrdinalIgnoreCase);
int barStart = s.IndexOf("<bar", StringComparison.OrdinalIgnoreCase);
int barEnd = s.IndexOf("</bar>", StringComparison.OrdinalIgnoreCase);
bool fooExists = fooStart >= 0 && fooEnd >= 0;
bool barExists = barStart >= 0 && barEnd >= 0;
if ((fooExists && !barExists) || (fooExists && barExists && fooStart < barStart))
{
string fooNodeContent = s.Substring(fooStart, fooEnd - fooStart + 6);
s = s.Substring(fooEnd + 6);
Console.WriteLine("Received <foo>: {0}", fooNodeContent);
OnAlarmResponseComplete(new CustomEventArgs(fooNodeContent));
foundAny = true;
}
if ((barExists && !fooExists) || (barExists && fooExists && barStart < fooStart))
{
string barNodeContent = s.Substring(barStart, barEnd - barStart + 6);
s = s.Substring(barEnd + 6);
Console.WriteLine("Received <bar>: {0}", barNodeContent);
OnAckComplete(new CustomEventArgs(barNodeContent));
foundAny = true;
}
return s;
}
public static void Start()
{
StreamReader reader = new StreamReader(_tcpClient.GetStream());
char[] chars = new char[Int16.MaxValue];
while (!_requestStop)
{
try
{
int currentOffset = 0;
while ((reader.Read(chars, currentOffset, chars.Length - currentOffset)) != 0)
{
string s = new string(chars).TrimEnd('\0');
bool foundAny;
do
{
s = ProcessAndTrimFooBar(s, out foundAny);
} while (foundAny);
chars = s.PadRight(Int16.MaxValue, '\0').ToCharArray();
currentOffset = s.Length;
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
//break;
}
}
reader.Close();
Console.WriteLine("Stopping TcpReader thread!");
}

Streamreader with custom LineBreak - Performance optimisation

Edit: See my Solution below...
I had the following Problem to solve:
We receive Files (mostly adress-Information) from different sources, these can be in Windows Standard with CR/LF ('\r''\n') as Line Break or UNIX with LF ('\n').
When reading text in using the StreamReader.ReadLine() method, this is no Problem because it handles both cases equally.
The Problem occurs when you have a CR or a LF somewhere in the File that is not supposed to be there.
This happens for example if you Export a EXCEL-File with Cells that contain LineBreaks within the Cell to .CSV or other Flat-Files.
Now you have a File that for example has the following structure:
FirstName;LastName;Street;HouseNumber;PostalCode;City;Country'\r''\n'
Jane;Doe;co James Doe'\n'TestStreet;5;TestCity;TestCountry'\r''\n'
John;Hancock;Teststreet;1;4586;TestCity;TestCounty'\r''\n'
Now the StreamReader.ReadLine() Method reads the First Line as:
FirstName;LastName;Street;HouseNumber;PostalCode;City;Country
Which is fine but the seccond Line will be:
Jane;Doe;co James Doe
This will either break your Code or you will have false Results, as the following Line will be:
TestStreet;5;TestCity;TestCountry
So we usualy ran the File trough a tool that checks if there are loose '\n' or '\r' arround and delete them.
But this step is easy to Forget and so I tried to implement a ReadLine() method of my own. The requirement was that it would be able to use one or two LineBreak characters and those characters could be defined freely by the consuming logic.
This is the Class that I came up with:
public class ReadFile
{
private FileStream file;
private StreamReader reader;
private string fileLocation;
private Encoding fileEncoding;
private char lineBreak1;
private char lineBreak2;
private bool useSeccondLineBreak;
private bool streamCreated = false;
private bool endOfStream;
public bool EndOfStream
{
get { return endOfStream; }
set { endOfStream = value; }
}
public ReadFile(string FileLocation, Encoding FileEncoding, char LineBreak1, char LineBreak2, bool UseSeccondLineBreak)
{
fileLocation = FileLocation;
fileEncoding = FileEncoding;
lineBreak1 = LineBreak1;
lineBreak2 = LineBreak2;
useSeccondLineBreak = UseSeccondLineBreak;
}
public string ReadLine()
{
if (streamCreated == false)
{
file = new FileStream(fileLocation, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
reader = new StreamReader(file, fileEncoding);
streamCreated = true;
}
StringBuilder builder = new StringBuilder();
char[] buffer = new char[1];
char lastChar = new char();
char currentChar = new char();
bool first = true;
while (reader.EndOfStream != true)
{
if (useSeccondLineBreak == true)
{
reader.Read(buffer, 0, 1);
lastChar = currentChar;
if (currentChar == lineBreak1 && buffer[0] == lineBreak2)
{
break;
}
else
{
currentChar = buffer[0];
}
if (first == false)
{
builder.Append(lastChar);
}
else
{
first = false;
}
}
else
{
reader.Read(buffer, 0, 1);
if (buffer[0] == lineBreak1)
{
break;
}
else
{
currentChar = buffer[0];
}
builder.Append(currentChar);
}
}
if (reader.EndOfStream == true)
{
EndOfStream = true;
}
return builder.ToString();
}
public void Close()
{
if (streamCreated == true)
{
reader.Close();
file.Close();
}
}
}
This code works fine, it does what it is supposed to do but compared to the original StreamReader.ReadLine() method, it is ~3 Times slower. As we work with large row-Counts the difference is not only messured but also reflected in real world Performance.
(for 700'000 Rows it takes ~ 5 Seconds to read all Lines, extract a Chunk and write it to a new File, with my method it takes ~15 Seconds on my system)
I tried different aproaches with bigger buffers but so far I wasn't able to increase Performance.
What I would be interessted in:
Any suggestions how I could improve the performance of this code to get closer to the original Performance of StreamReader.ReadLine()?
Solution:
This now takes ~6 Seconds (compared to ~5 Sec using the Default 'StreamReader.ReadLine()' ) for 700'000 Rows to do the same things as the code above does.
Thanks Jim Mischel for pointing me in the right direction!
public class ReadFile
{
private FileStream file;
private StreamReader reader;
private string fileLocation;
private Encoding fileEncoding;
private char lineBreak1;
private char lineBreak2;
private bool useSeccondLineBreak;
const int BufferSize = 8192;
int bufferedCount;
char[] rest = new char[BufferSize];
int position = 0;
char lastChar;
bool useLastChar;
private bool streamCreated = false;
private bool endOfStream;
public bool EndOfStream
{
get { return endOfStream; }
set { endOfStream = value; }
}
public ReadFile(string FileLocation, Encoding FileEncoding, char LineBreak1, char LineBreak2, bool UseSeccondLineBreak)
{
fileLocation = FileLocation;
fileEncoding = FileEncoding;
lineBreak1 = LineBreak1;
lineBreak2 = LineBreak2;
useSeccondLineBreak = UseSeccondLineBreak;
}
private int readInBuffer()
{
return reader.Read(rest, 0, BufferSize);
}
public string ReadLine()
{
StringBuilder builder = new StringBuilder();
bool lineFound = false;
if (streamCreated == false)
{
file = new FileStream(fileLocation, FileMode.Open, FileAccess.Read, FileShare.ReadWrite, 8192);
reader = new StreamReader(file, fileEncoding);
streamCreated = true;
bufferedCount = readInBuffer();
}
while (lineFound == false && EndOfStream != true)
{
if (position < bufferedCount)
{
for (int i = position; i < BufferSize; i++)
{
if (useLastChar == true)
{
useLastChar = false;
if (rest[i] == lineBreak2)
{
count++;
position = i + 1;
lineFound = true;
break;
}
else
{
builder.Append(lastChar);
}
}
if (rest[i] == lineBreak1)
{
if (useSeccondLineBreak == true)
{
if (i + 1 <= BufferSize - 1)
{
if (rest[i + 1] == lineBreak2)
{
position = i + 2;
lineFound = true;
break;
}
else
{
builder.Append(rest[i]);
}
}
else
{
useLastChar = true;
lastChar = rest[i];
}
}
else
{
position = i + 1;
lineFound = true;
break;
}
}
else
{
builder.Append(rest[i]);
}
position = i + 1;
}
}
else
{
bufferedCount = readInBuffer();
position = 0;
}
}
if (reader.EndOfStream == true && position == bufferedCount)
{
EndOfStream = true;
}
return builder.ToString();
}
public void Close()
{
if (streamCreated == true)
{
reader.Close();
file.Close();
}
}
}
The way to speed this up would be to have it read more than one character at a time. For example, create a 4 kilobyte buffer, read data into that buffer, and then go character-by-character. If you copy character-by-character to a StringBuilder, it's pretty easy.
The code below shows how to parse out lines in a loop. You'd have to split this up so that it can maintain state between calls, but it should give you the idea.
const int BufferSize = 4096;
const string newline = "\r\n";
using (var strm = new StreamReader(....))
{
int newlineIndex = 0;
var buffer = new char[BufferSize];
StringBuilder sb = new StringBuilder();
int charsInBuffer = 0;
int bufferIndex = 0;
char lastChar = (char)-1;
while (!(strm.EndOfStream && bufferIndex >= charsInBuffer))
{
if (bufferIndex > charsInBuffer)
{
charsInBuffer = strm.Read(buffer, 0, buffer.Length);
if (charsInBuffer == 0)
{
// nothing read. Must be at end of stream.
break;
}
bufferIndex = 0;
}
if (buffer[bufferIndex] == newline[newlineIndex])
{
++newlineIndex;
if (newlineIndex == newline.Length)
{
// found a line
Console.WriteLine(sb.ToString());
newlineIndex = 0;
sb = new StringBuilder();
}
}
else
{
if (newlineIndex > 0)
{
// copy matched newline characters
sb.Append(newline.Substring(0, newlineIndex));
newlineIndex = 0;
}
sb.Append(buffer[bufferIndex]);
}
++bufferIndex;
}
// Might be a line left, without a newline
if (newlineIndex > 0)
{
sb.Append(newline.Substring(0, newlineIndex));
}
if (sb.Length > 0)
{
Console.WriteLine(sb.ToString());
}
}
You could optimize this a bit by keeping track of the starting position so that when you find a line you create a string from buffer[start] to buffer[current], without creating a StringBuilder. Instead you call the String(char[], int32, int32) constructor. That's a little tricky to handle when you cross a buffer boundary. Probably would want to handle crossing the buffer boundary as a special case and use a StringBuilder for temporary storage in that case.
I wouldn't bother with that optimization, though, until after I got this first version working.

How to save a MJPEG Stream to disk (C# .NET)?

I have an application that read the stream from a camera (MJPEG) and show it on the form in real time (in a picture box). This is working. This stream reading start when the user click on the button "Start".
What I want to do is that when the user click on a button "Stop", the stream between the button "Start" and "Stop" would be save on disk as a .mpg.
Right now, it write something on the disk, but I can't open it in Windows Media Player.
Here is the code to write the stream
private void ReadWriteStream(byte[] buffer, int start, int lenght, Stream writeStream)
{
Stream readStream = new MemoryStream(buffer, start, lenght);
int bytesRead = readStream.Read(buffer, 0, m_readSize);
// write the required bytes
while (bytesRead > 0 && !m_bStopLecture)
{
writeStream.Write(buffer, 0, bytesRead);
bytesRead = readStream.Read(buffer, 0, m_readSize);
}
readStream.Close();
}
Here is the place that call the function. This is in a loop and as I said, the video is playing in the PictureBox.
// image at stop
Stream towrite = new MemoryStream(buffer, start, stop - start);
Image img = Image.FromStream(towrite);
imgSnapshot.Image = img;
// write to the stream
ReadWriteStream(buffer, start, stop - start, writeStream);
Thanks a lot!
You need to set the content type on the stream, and include the frame boundry data. I would start by looking at the question MJPG VLC and HTTP Streaming.
There is an implementation # https://net7mma.codeplex.com/SourceControl/latest specifically https://net7mma.codeplex.com/SourceControl/latest#Rtsp/Server/Streams/MJPEGSourceStream.cs
Something like this:
{
// buffer to read stream
byte[] buffer = new byte[bufSize];
// JPEG magic number
byte[] jpegMagic = new byte[] { 0xFF, 0xD8, 0xFF };
int jpegMagicLength = 3;
ASCIIEncoding encoding = new ASCIIEncoding();
while (!stopEvent.WaitOne(0, false))
{
// reset reload event
reloadEvent.Reset();
// HTTP web request
HttpWebRequest request = null;
// web responce
WebResponse response = null;
// stream for MJPEG downloading
Stream stream = null;
// boundary betweeen images (string and binary versions)
byte[] boundary = null;
string boudaryStr = null;
// length of boundary
int boundaryLen;
// flag signaling if boundary was checked or not
bool boundaryIsChecked = false;
// read amounts and positions
int read, todo = 0, total = 0, pos = 0, align = 1;
int start = 0, stop = 0;
// align
// 1 = searching for image start
// 2 = searching for image end
try
{
// create request
request = (HttpWebRequest)WebRequest.Create(m_Source);
// set user agent
if (userAgent != null)
{
request.UserAgent = userAgent;
}
// set proxy
if (proxy != null)
{
request.Proxy = proxy;
}
// set timeout value for the request
request.Timeout = requestTimeout;
// set login and password
if ((login != null) && (password != null) && (login != string.Empty))
request.Credentials = new NetworkCredential(login, password);
// set connection group name
if (useSeparateConnectionGroup)
request.ConnectionGroupName = GetHashCode().ToString();
// force basic authentication through extra headers if required
if (forceBasicAuthentication)
{
string authInfo = string.Format("{0}:{1}", login, password);
authInfo = Convert.ToBase64String(Encoding.Default.GetBytes(authInfo));
request.Headers["Authorization"] = "Basic " + authInfo;
}
// get response
response = request.GetResponse();
// check content type
string contentType = response.ContentType;
string[] contentTypeArray = contentType.Split('/');
// "application/octet-stream"
if ((contentTypeArray[0] == "application") && (contentTypeArray[1] == "octet-stream"))
{
boundaryLen = 0;
boundary = new byte[0];
}
else if ((contentTypeArray[0] == "multipart") && (contentType.Contains("mixed")))
{
// get boundary
int boundaryIndex = contentType.IndexOf("boundary", 0);
if (boundaryIndex != -1)
{
boundaryIndex = contentType.IndexOf("=", boundaryIndex + 8);
}
if (boundaryIndex == -1)
{
// try same scenario as with octet-stream, i.e. without boundaries
boundaryLen = 0;
boundary = new byte[0];
}
else
{
boudaryStr = contentType.Substring(boundaryIndex + 1);
// remove spaces and double quotes, which may be added by some IP cameras
boudaryStr = boudaryStr.Trim(' ', '"');
boundary = encoding.GetBytes(boudaryStr);
boundaryLen = boundary.Length;
boundaryIsChecked = false;
}
}
else
{
throw new Exception("Invalid content type.");
}
// get response stream
stream = response.GetResponseStream();
stream.ReadTimeout = requestTimeout;
// loop
while ((!stopEvent.WaitOne(0, false)) && (!reloadEvent.WaitOne(0, false)))
{
// check total read
if (total > bufSize - readSize)
{
total = pos = todo = 0;
}
// read next portion from stream
if ((read = stream.Read(buffer, total, readSize)) == 0)
throw new ApplicationException();
total += read;
todo += read;
// increment received bytes counter
bytesReceived += read;
// do we need to check boundary ?
if ((boundaryLen != 0) && (!boundaryIsChecked))
{
// some IP cameras, like AirLink, claim that boundary is "myboundary",
// when it is really "--myboundary". this needs to be corrected.
pos = Utility.ContainsBytes(buffer, ref start, ref read, boundary, 0, boundary.Length);
// continue reading if boudary was not found
if (pos == -1)
continue;
for (int i = pos - 1; i >= 0; i--)
{
byte ch = buffer[i];
if ((ch == (byte)'\n') || (ch == (byte)'\r'))
{
break;
}
boudaryStr = (char)ch + boudaryStr;
}
boundary = encoding.GetBytes(boudaryStr);
boundaryLen = boundary.Length;
boundaryIsChecked = true;
}
// search for image start
if ((align == 1) && (todo >= jpegMagicLength))
{
start = Utility.ContainsBytes(buffer, ref pos, ref todo, jpegMagic, 0, jpegMagicLength);
if (start != -1)
{
// found JPEG start
pos = start + jpegMagicLength;
todo = total - pos;
align = 2;
}
else
{
// delimiter not found
todo = jpegMagicLength - 1;
pos = total - todo;
}
}
// search for image end ( boundaryLen can be 0, so need extra check )
while ((align == 2) && (todo != 0) && (todo >= boundaryLen))
{
stop = Utility.ContainsBytes(buffer, ref start, ref read,
(boundaryLen != 0) ? boundary : jpegMagic,
pos, todo);
if (stop != -1)
{
pos = stop;
todo = total - pos;
// increment frames counter
framesReceived++;
// image at stop
using (Bitmap bitmap = (Bitmap)Bitmap.FromStream(new MemoryStream(buffer, start, stop - start)))
{
// notify client
Packetize(bitmap);
}
// shift array
pos = stop + boundaryLen;
todo = total - pos;
Array.Copy(buffer, pos, buffer, 0, todo);
total = todo;
pos = 0;
align = 1;
}
else
{
// boundary not found
if (boundaryLen != 0)
{
todo = boundaryLen - 1;
pos = total - todo;
}
else
{
todo = 0;
pos = total;
}
}
}
}
}
catch (ApplicationException)
{
// do nothing for Application Exception, which we raised on our own
// wait for a while before the next try
Thread.Sleep(250);
}
catch (ThreadAbortException)
{
break;
}
catch (Exception exception)
{
// wait for a while before the next try
Thread.Sleep(250);
}
finally
{
// abort request
if (request != null)
{
request.Abort();
request = null;
}
// close response stream
if (stream != null)
{
stream.Close();
stream = null;
}
// close response
if (response != null)
{
response.Close();
response = null;
}
}
// need to stop ?
if (stopEvent.WaitOne(0, false))
break;
}
}
}

Using itextsharp (or any c# pdf library), how to open a PDF, replace some text, and save it again?

Using itextsharp (or any c# pdf library), i need to open a PDF, replace some placeholder text with actual values, and return it as a byte[].
Can someone suggest how to do this? I've had a look at the itext docs and can't figure out where to get started. So far i'm stuck on how to get the source pdf from a PDFReader to a Document object, i presume i'm probably approaching this the wrong way.
Thanks a lot
In the end, i used PDFescape to open my existing PDF file, and place some form fields in where i need to put my fields, then save it again to create my PDF file.
http://www.pdfescape.com
Then i found this blog entry about how to replace form fields:
http://www.johnnycode.com/blog/2010/03/05/using-a-template-to-programmatically-create-pdfs-with-c-and-itextsharp/
All works nicely! Here's the code:
public static byte[] Generate()
{
var templatePath = HttpContext.Current.Server.MapPath("~/my_template.pdf");
// Based on:
// http://www.johnnycode.com/blog/2010/03/05/using-a-template-to-programmatically-create-pdfs-with-c-and-itextsharp/
var reader = new PdfReader(templatePath);
var outStream = new MemoryStream();
var stamper = new PdfStamper(reader, outStream);
var form = stamper.AcroFields;
var fieldKeys = form.Fields.Keys;
foreach (string fieldKey in fieldKeys)
{
if (form.GetField(fieldKey) == "MyTemplatesOriginalTextFieldA")
form.SetField(fieldKey, "1234");
if (form.GetField(fieldKey) == "MyTemplatesOriginalTextFieldB")
form.SetField(fieldKey, "5678");
}
// "Flatten" the form so it wont be editable/usable anymore
stamper.FormFlattening = true;
stamper.Close();
reader.Close();
return outStream.ToArray();
}
Unfortunately I was looking for something similar and could not figure it out. Below was about as far as I got, maybe you can use this as a starting point. The problem is that PDF does not actually save text, but instead uses lookup tables and some other arcane wizardry. This method reads the byte-values for the page and attempts to convert to string, but as far as I can tell it can only do English and misses on some special characters, so I gave up my project and moved on.
string contents = string.Empty();
Document doc = new Document();
PdfReader reader = new PdfReader("pathToPdf.pdf");
using (MemoryStream memoryStream = new MemoryStream())
{
PdfWriter writer = PdfWriter.GetInstance(doc, memoryStream);
doc.Open();
PdfContentByte cb = writer.DirectContent;
for (int p = 1; p <= reader.NumberOfPages; p++)
{
// add page from reader
doc.SetPageSize(reader.GetPageSize(p));
doc.NewPage();
// pickup here something like this:
byte[] bt = reader.GetPageContent(p);
contents = ExtractTextFromPDFBytes(bt);
if (contents.IndexOf("something")!=-1)
{
// make your own pdf page and add to cb (contentbyte)
}
else
{
PdfImportedPage page = writer.GetImportedPage(reader, p);
int rot = reader.GetPageRotation(p);
if (rot == 90 || rot == 270)
cb.AddTemplate(page, 0, -1.0F, 1.0F, 0, 0, reader.GetPageSizeWithRotation(p).Height);
else
cb.AddTemplate(page, 1.0F, 0, 0, 1.0F, 0, 0);
}
}
reader.Close();
doc.Close();
File.WriteAllBytes("pathToOutputOrSamePathToOverwrite.pdf", memoryStream.ToArray());
This is taken from this site.
private string ExtractTextFromPDFBytes(byte[] input)
{
if (input == null || input.Length == 0) return "";
try
{
string resultString = "";
// Flag showing if we are we currently inside a text object
bool inTextObject = false;
// Flag showing if the next character is literal
// e.g. '\\' to get a '\' character or '\(' to get '('
bool nextLiteral = false;
// () Bracket nesting level. Text appears inside ()
int bracketDepth = 0;
// Keep previous chars to get extract numbers etc.:
char[] previousCharacters = new char[_numberOfCharsToKeep];
for (int j = 0; j < _numberOfCharsToKeep; j++) previousCharacters[j] = ' ';
for (int i = 0; i < input.Length; i++)
{
char c = (char)input[i];
if (inTextObject)
{
// Position the text
if (bracketDepth == 0)
{
if (CheckToken(new string[] { "TD", "Td" }, previousCharacters))
{
resultString += "\n\r";
}
else
{
if (CheckToken(new string[] { "'", "T*", "\"" }, previousCharacters))
{
resultString += "\n";
}
else
{
if (CheckToken(new string[] { "Tj" }, previousCharacters))
{
resultString += " ";
}
}
}
}
// End of a text object, also go to a new line.
if (bracketDepth == 0 &&
CheckToken(new string[] { "ET" }, previousCharacters))
{
inTextObject = false;
resultString += " ";
}
else
{
// Start outputting text
if ((c == '(') && (bracketDepth == 0) && (!nextLiteral))
{
bracketDepth = 1;
}
else
{
// Stop outputting text
if ((c == ')') && (bracketDepth == 1) && (!nextLiteral))
{
bracketDepth = 0;
}
else
{
// Just a normal text character:
if (bracketDepth == 1)
{
// Only print out next character no matter what.
// Do not interpret.
if (c == '\\' && !nextLiteral)
{
nextLiteral = true;
}
else
{
if (((c >= ' ') && (c <= '~')) ||
((c >= 128) && (c < 255)))
{
resultString += c.ToString();
}
nextLiteral = false;
}
}
}
}
}
}
// Store the recent characters for
// when we have to go back for a checking
for (int j = 0; j < _numberOfCharsToKeep - 1; j++)
{
previousCharacters[j] = previousCharacters[j + 1];
}
previousCharacters[_numberOfCharsToKeep - 1] = c;
// Start of a text object
if (!inTextObject && CheckToken(new string[] { "BT" }, previousCharacters))
{
inTextObject = true;
}
}
return resultString;
}
catch
{
return "";
}
}
private bool CheckToken(string[] tokens, char[] recent)
{
foreach (string token in tokens)
{
if ((recent[_numberOfCharsToKeep - 3] == token[0]) &&
(recent[_numberOfCharsToKeep - 2] == token[1]) &&
((recent[_numberOfCharsToKeep - 1] == ' ') ||
(recent[_numberOfCharsToKeep - 1] == 0x0d) ||
(recent[_numberOfCharsToKeep - 1] == 0x0a)) &&
((recent[_numberOfCharsToKeep - 4] == ' ') ||
(recent[_numberOfCharsToKeep - 4] == 0x0d) ||
(recent[_numberOfCharsToKeep - 4] == 0x0a)))
{
return true;
}
}
return false;
}
I have a python script here that replaces some text in a PDF:
import re
import sys
import zlib
# Module to find and replace text in PDF files
#
# Usage:
# python pdf_replace.py <input_filename> <text_to_find> <text_to_replace> <output_filename>
#
# #author Ionox0
input_filename = sys.argv[1]
text_to_find = sys.argv[2]
text_to_replace = sys.argv[3]
output_filename sys.argv[4]
pdf = open(input_filename, "rb").read()
# Create a copy of the PDF content to make edits to
pdf_copy = pdf[0:]
# Search for stream objects with text to replace
stream = re.compile(r'.*?FlateDecode.*?stream(.*?)endstream', re.S)
for s in stream.findall(pdf):
s = s.strip('\r\n')
try:
text = zlib.decompress(s)
if text_to_find in text:
print('Found match:')
print(text)
text = text.replace(text_to_find, text_to_replace)
pdf_copy = pdf_copy.replace(s, zlib.compress(text))
except:
pass
with open(output_filename, 'wb') as out:
out.write(pdf_copy)

Categories

Resources