How Can I Parse a Pcapng File in C#?

How Can I Parse a Pcapng File in C#? - c#

I'm new to Pcapng files. I've read the 40+ page whitepaper and I'm still scratching my head and sweating. I understand that the Pcapng file is:
Made up of a Section Header Block - This is the start of every Pcapng file.
Question 1: How large is this?
It appears that it's BlockType (4 Bytes) + BlockTotalLength (4 bytes) + Byte Order Magic (4 Bytes) + Mahor and Minor Version (4 bytes total, 2 bytes each) + Section Length (4 bytes) + Options (Variable) + Block Total length (again, 4 bytes).
If I'm building a parser, how would I know how many bytes I need to skip to arrive at my first data frame block?
Question 2: Where is the data stored? By data I mean the entire frame that contains Ethernet, IP, and TCP Data, as shown in the picture below (Figure 1).
The documentation states that:
A section includes data delimited by two section header blocks.
When doing a manual inspection (yes, I went byte by byte over a file to see how many bytes lie in between two frames :'( ), I noticed there were 35 bytes in between each message (each message shown on wireshark had 35 bytes in between). Are these bytes related to a pcapng block?
Once I understand how to get to the first tcp frame, and how many bytes I need to skip to get to the next, I can build my parser.
I'm willing to send Bitcoin/Monero to anyone who can help me understand how I can best parse these pcapng messages. Thanks!

If I'm building a parser, how would I know how many bytes I need to skip to arrive at my first data frame block?
That's not how you do it.
If you're building a parser, note that a parser must look at more than just the first data frame block.
First of all, it must look at the Section Header Block (SHB), to determine the byte order of the data in all the subsequent blocks by looking at the Byte-Order Magic field.
After that, you need to look at all subsequent blocks, looking for Interface Description Blocks and Enhanced Packet Blocks (EPBs), Simple Packet Blocks (SPBs), and possibly Packet Blocks (PBs) (those are obsolete, so no program should write them, but programs should be prepared to read them). Each EPB or PB has an interface ID that refers to an IDB, which must have appeared before the EPB or PB in question; an SPB implicitly refers to the first IDB, which, again, must have appeared before the SPB in question.
The format of the packet data in an EPB, SPB, or PB depends on the link-layer type specified by the IDB to which it refers, so you need to have read the IDB in question.
And, as the above indicates, there is no fixed number of bytes between the SHB and the first EPB, SPB, or PB, so there is no simple fixed number of bytes to skip to get to the first data frame block. For one thing, there's a variable number of bytes, which you can only determine by reading all the blocks before the first EPB, SPB, or PB. For another thing, you can't skip them, you have to read them to get enough information to interpret the packet data in them.
Where is the data stored? By data I mean the entire frame that contains Ethernet, IP, and TCP Data, as shown in the picture below (Figure 1).
It's stored in EPBs, SPBs, or PBs. See the descriptions of those three block types; frames are in the "Packet Data" fields of those blocks.
So I'm at my Interface Description Block and the 64 bit number that contains both a Timestamp Resolution of 9 (10^-9, Nanoseconds?) and 6 (10^-6, Microseconds).
As Christopher Maynard indicated, the 9 isn't a timestamp resolution, it's an option type. Pcapng blocks have both fixed information at the beginning and options; an option begins with an option type and option value length, followed by the option data. An IDB if_tsresol option has
2 bytes of option type, with the value 9;
2 bytes of option value length, with the value 1;
1 byte of option value, with the value as specified in the description of that option.
A value of 6 means the time stamp resolution is 1/10^6 of a second, which means 1 microsecond.

I think #tee-zad-awk found an answer that helped over at https://ask.wireshark.org/question/15159/how-can-i-display-as-much-pcapng-information-as-possible/, but for the benefit of anyone else looking for an answer to this question, I've linked it here and have provided my answer below, just in case the link is ever broken someday.
It seems that, after reading the 40 page whitepaper on Pcapng ...
The current PCAP Next Generation (pcapng) Capture File Format draft document is 52 pages, so perhaps you're not looking at the most recent version? Other versions do exist, such as those at https://datatracker.ietf.org/doc/html/draft-tuexen-opswg-pcapng-00, https://pcapng.github.io/pcapng/ or https://www.tcpdump.org/pcap/pcap.html and probably others, but they're all obsolete.
If you're looking for a pcapng parser to help you decipher the file, then look no further than Wireshark itself. If you've loaded a pcapng file into Wireshark, you can use "View -> Reload as File Format/Capture" (Ctrl+Shift+F) to cause Wireshark to load and display the raw file contents itself rather than to load and display the packets from the file. This should cause you to be able to see the various pcapng blocks and be able to drill down into them. For example:
Frame 1: 184 bytes on wire (1472 bits), 184 bytes captured (1472 bits)
MIME file
PCAPNG File Format
Block: Section Header Block 1
Block: Interface Description Block 0
Block: Enhanced Packet Block 1
You can also have a look at the Wireshark source code, such as the epan/dissectors/file-pcapng.c and wiretap/pcapng.c files.
By the way, if you're looking to support all extensions, the Wireshark [PcapNg wiki page] (https://wiki.wireshark.org/Development/PcapNg) has a link to Augmented PCAP Next Generation Dump File Format page that you might also want to take a look at. I don't know how many other extensions may have been implemented but not included in the main pcapng file format specification, but hopefully not many, as this could quickly become problematic with different projects possibly using the same block type for different blocks. That practice should be highly discouraged.

In order to find it out, it is helpful to read the specifications of the protocol of the network device and the package that has been sent. For example, we need to know the frame description of an Ethernet device and the package description of a TCP/IP package in order to understand the raw data. Having studied this, we record some traffic in Wireshark and select a block in the upper window of Wireshark. The middle window will tell you in clear text what Wireshark has received. On clicking on any of the lines in the middle window, Wireshark will mark the bytes of the raw data in the lower window that bear the information of the clicked line. Also, you can click on the raw data and then the clear text is marked. Moreover, the status line also informs you about it. This is very helpful for understanding the data.
I needed to read the TCP / IPv4 packages of Ethernet traffic. The block starts with the identification block type = 0x00000006 and the length of the block. The device was Ethernet so that I had the link type LINKTYPE_ETHERNET. The section length can be taken from byte 16-23. The other entries of the block header can be taken from here.
After the block header or after 28 bytes , the Ethernet frame came with the following entries (see here for a description):
mac address destination, 6 bytes
mac address source, 6 bytes
type: 0x0800 for IPv4, 0x0806 for ARP, 0x86DD for IPv6, 0x8100 for the presence of an IEEE 802.1Q tag.
For an IPv4 package or type = 0x0800, the following bytes are the IPv4 header (see here for a description):
IP version and header length, 1 byte
differentiated services field, 1 byte
total length, 2 bytes
identification, 2 bytes
flags, 1 byte
fragment offset, 1 byte
time to live, 1 byte
protocol with 0x06 for TCP, 1 byte
header checksum, 2 bytes
source IP address
destination IP address
options
The total length is very important: the byte that follows the last byte of the IPv4 + TCP package is located at total length bytes after the entry IP version and header length. However, this entry can be tricky. I head an entry with length 0 though the IP header length already had 20 bytes. In this case, Wireshark was helpful. It reported
[Total Length: 1547 bytes (reported as 0, presumed to be because of "TCP segmentation offload" (TSO))]
A detailed description of this phenomenon can be taken from here. In this case I could compute the payload length by the section length from above minus the length of the Ethernet frame (14 bytes) minus the length of the IP Header minus the length of the TCP header. However, padding problems could arise though I did not have these problems. A padding problem occurs when the package length is extended to a multiple of 4 bytes or something.
If the protocol of the IPv4 header is 0x06, the TCP package follows. The details of an TCP package can be taken from here. Of course, Wireshark also helps you interpreting the TCP package: just click on the lines in the middle window that belong to the TCP package or click on the raw data.
As outlined here, the interpretation of a pcapng file has many ifs and whens.

Related

How can I determine the size of a UDP payload in Windows?

I'm trying to figure out how to work around a problem in Windows. I'm using C# (net5.0), but if you know the answer in C or C++ that shouldn't be a real problem, because I can call functions in DLLs without an issue.
While testing UDP handling with multicast on Windows 10, I found a problem: when my buffer is too short for the payload coming in, System.Net.Sockets.Socket.ReceiveMessageFrom fills the buffer to its maximum capacity, doesn't set the SocketOptions.Fragmented or SocketOptions.Multicast flags (and setting either before the call is made leads to a "not supported" exception), and the IPPacketInformation.Address field is null. ReceiveMessageFrom's return value is the size of my buffer, not the size of the packet. I cannot seem to get the necessary allocation size (even with SocketOptions.Peek) no matter what I do.
When my buffer is long enough for the payload coming in, (System.Net.Sockets.Socket)s.ReceiveMessageFrom fills the buffer, sets SocketOptions.Multicast, and sets the IPPacketInformation.Address field to the multicast group IP that it received from. The return value in that case is the amount of data actually received, when my buffer is larger than the data received.
On Linux, I can set SocketOptions.Fragmented and it will work correctly: the too-small buffer is filled, but the return value of ReceiveMessageFrom is set to the actual size of the incoming data. Combined with SocketOptions.Peek, this allows me to allocate a new buffer large enough to hold it, and retrieve the full data. (This is very much akin to calling e.g. Windows Registry functions with a 0-length buffer, and being told how big of a buffer you'll actually need.)
The alternative to Linux's way would be to try to allocate a buffer as large as the interface will allow, but System.Net.NetworkInformation.IPInterfaceProperties doesn't have a .Mtu member, while System.Net.NetworkInformation.IPv6InterfaceProperties does. I can't figure out how to get the maximum size of a frame, which is necessary because some drivers support a feature called "jumbo frames" that can be upwards of 64kb.)
To whit: My multicastsender program sends packets that are 26 bytes long, containing the entire uppercase US alphabet from A to Z.
My multicastreceiver program is where I have been making the changes. When I set my receive buffer in that program to less than 26 bytes long, I get the problematic behavior. When I set it to 26 bytes or more long, I get the correct behavior.
This question is not about OS-level or Winsock-level buffers. I will tune those separately, if they need to be. I am specifically trying to ensure that the data that I retrieve with ReceiveMessageFrom is not truncated. (When the OS level doesn't have enough buffer space for the data to be queued, it simply drops the entire packet. It does not write a partial packet to the queue. My application is receiving partial data from the call to ReceiveMessageFrom, and it's not indicating that there is anything that was truncated. I need to figure out how to work around this.)
I am not okay with losing packet space by encoding the size of the data in the area reserved for the data itself, as that will take at least 2 bytes, and I already need to squeeze a lot in here. The UDP header already has a Length field, and that field contains what I need, but I have no access to it.
Thanks for your help!

Client-Server with TCP with compression

I have a program that work like a chat.
Client and server are connected with 2 TCP sockets, one for incoming messages another for outgoing messages.
Sometimes the messages can be very big (ex. 2 MByte of text) so I want to compress them before sending over the channel.
The problem is that I don't know how to find the start and end of compressed message.
Now I use two special characters to find start and end of message but with compression there can be errors.
There is maybe a type of compression that don't use some specific bytes?
I use C# to open and manage sockets so I need a compression that work under windows.

Append to start of message it length. After that you just need to read length, and after that get exactly count of bytes what you need.
It will looks like:
|length|data|..|..|length|data|..|..|..|
And more exactly
|3|26|125|36|4|12|45|16|34|
Where 3 and 4 are length.

You just need an escaping scheme.
Send STX (Start of Transmission) at the start.
Send ETX (End of Transmission) at the end.
If an STX or ETX appears in the data, prefix it with ESC (escape).
If an ESC appears in the data, prefix it with ESC.
At the receiver:
The first byte should be STX, otherwise you have a bug. Discard it.
After that, if a byte is ESC, discard it and accept the next byte, whatever it is.
Otherwise, if the next byte is ETX, discard it and stop reading.
The problem with the length-word prefix suggested in another answer is that you can't know the length without doing the compression first, which costs time and space.

Networkstream.ReadAsync behaviour. Checksums not matching

I'm encountering a pretty difficult problem in an asynchronous RSS feed aggregator application I am creating.
The system is built from a communicating client and server. The server collects RSS feed items from different sources and then distributes them to clients based on their subscriptions.
The issue that I am having is - My specification states that I must implement a pre-defined byte based protocol to communicate between the client and server. Part of this protocol is that when the server sends a payload to the client, that payload must have a checksum created and sent with it, then when it is received by the client the checksum is equated based on the received payload. We then compare the two checksums and if they do not match then we have an issue.
The thing that is baffling me is that, sometimes the checksums will match perfectly and other times they will not, using the same algorithm for all sending and receiving operations.
Here is the method that checks incoming ->
private bool CheckChecksum(uint receivedChecksum, IEnumerable<byte> bytes) {
uint ourChecksum = 0;
foreach (var b in bytes)
ourChecksum += b;
if (ourChecksum != receivedChecksum) {
Debug.WriteLine("received {0}, calculated {1}", receivedChecksum, ourChecksum);
_writeOutToFile = true;
}
return receivedChecksum == ourChecksum;
}
and when calculating it before sending ->
uint checksum = payloadBytes.Aggregate<byte, uint>(0,
(currentChecksum, currentByte) => currentChecksum + currentByte);
Since the behaviour seems to occur on updates that have very large checksums (generally 2 million+) then the only thing I can think that would be causing it is these large byte[] sizes.
As you can see above, I write out to a file the contents of the payload when the checksums do not match. What I found was that the byte[] just ends early (despite the fact that the reading/writing lengths match on both the client and server). The end of the byte[] is just filled with empty spaces.
I am using NetworkStream.ReadAsync and WriteAsync to do all of my I/O operations. I thought that it may be a fault with the way I am using or understanding these.
I realise that this is a very difficult and vague problem (because I'm not sure what is going wrong myself) but if you need any further information I will provide it.
Here is some extra information:
The checksums are of type uint, and their endianness is encoded correctly at both ends of the system.
Payloads are strings which are encoded into ASCII bytes and decoded on the client-side.
All messages are sent with a checksum and a payload length. The payload length represents how many bytes to read off the stream. Since the I read payload length amount of bytes from the networkstream, when the checksums do not match the payload becomes white space after a varying length (but is correct before this point).
Sometimes (rarely) the checksums will match even when they are large.
I am running both the client and server locally on one machine.
I have tried having a Thread.Sleep(5000) before the read and this makes no difference.
Here is a sample of sent and received data that fails.
Sent from server - http://pastebin.com/jvbCbQmJ
Received by client - http://pastebin.com/eNkWymwi

Your receiving code tries to read before all the bytes have been posted.
When you try to send huge chunks of data on only one package, your readAsync will detect there's something to read, try to read for full lenght, which may not be available, and fill the part that was not posted yet with 0s
You can either divide your message on the server, read by parts on the client, or try to read what's available until you have received everything or some time has passed

How to read bytes from SerialPort.BaseStream without Length

I want to use the stream class to read/write data to/from a serial port. I use the BaseStream to get the stream (link below) but the Length property doesn't work. Does anyone know how can I read the full buffer without knowing how many bytes there are?
http://msdn.microsoft.com/en-us/library/system.io.ports.serialport.basestream.aspx

You can't. That is, you can't guarantee that you've received everything if all you have is the BaseStream.
There are two ways you can know if you've received everything:
Send a length word as the first 2 or 4 bytes of the packet. That says how many bytes will follow. Your reader then reads that length word, reads that many bytes, and knows it's done.
Agree on a record separator. That works great for text. For example you might decide that a null byte or a end-of-line character signals the end of the data. This is somewhat more difficult to do with arbitrary binary data, but possible. See comment.
Or, depending on your application, you can do some kind of timing. That is, if you haven't received anything new for X number of seconds (or milliseconds?), you assume that you've received everything. That has the obvious drawback of not working well if the sender is especially slow.

Maybe you can try SerialPort.BytesToRead property.

H.225 User Information Packet Parsing

I'm writing some code using PacketDotNet and SharpPCap to parse H.225 packets for a VOIP phone system. I've been using Wireshark to look at the structure, but I'm stuck. I've been using This as a reference.
Most of the H.225 packets I see are user information type with an empty message body and the actual information apparently shows up as a list of NonStandardControls in Wireshark. I thought I'd just extract out these controls and parse them later, but I don't really know where they start.
In almost all cases, the items start at the 10th byte of the H.225 data. Each item appears to begin with the length which is recorded as 2 bytes. However, I am getting a packet that has items starting at the 11th byte.
The only difference I see in this packet is something in the message body supposedly called open type length which has a value of 1, whereas the rest all appear to be 0. Would the items start at 10 + open type length? Is there some document that explains what this open type length is for?
Thanks.

H.225 doesn't use a fixed length encoding, it user ASN.1 PER encoding (not BER).
You probably won't find a C# library. OPAL is adding a C API if you are able to use that.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.