Wireshark从数据读入到协议处理

协议 wireshark

数据的读入
1. 网卡输入
2. 离线数据包输入
数据分析前的预处理
Wrieshark中Dissector之间的跳转
Reference

数据的读入

Wireshark数据的读入分为两种，一种是直接从网卡读入实时的报文数据，另一种是读取被保存为文件的报文数据。先来分别看下这两种方式有何区别。

网卡输入

从网卡流如的数据是通过dumpcap调用libpcap或winpcap抓取的，而抓取到数据后其通过管道将数据传输到主进程再进行处理，其过程大概如下：

在线协议分析

从上图可以看到，在Wireshark抓包时，数据包实际上是由另一个进程抓取再通过管道的形式传输到主进程。这样设计的原因从官网的说法是避免Wireshark运行时权限太高，因为数据包的抓取需要Root权限，而这种方式实现则只需要将Root权限授予dumpcap就可以，而在协议分析阶段只使用正常的用户权限。

离线数据包输入

先来看下如下流程图，

离线协议分析

从上图能够看出，离线的本地数据包文件直接由WireTap模块直接读取。

数据分析前的预处理

数据包怎么从管道读取过来的与数据包是怎么通过Wtap读取到的这些内容在这将不会详述了，跟着代码逻辑其过程很容易理解。而数据包应该怎么传入Epan去做分析，传入Epan模块之前又有什么要求？这才是阅读Wireshark源码应该关注的重点。

先来看其读取数据并开始处理的代码片段，

capture_file *cf = xxx
...
edt = epan_dissect_new(cf->epan, create_proto_tree, print_packet_info && print_details);
while (to_read-- && cf->provider.wth) {
     wtap_cleareof(cf->provider.wth);
     ret = wtap_read(cf->provider.wth, &err, &err_info, &data_offset);
     reset_epan_mem(cf, edt, create_proto_tree, print_packet_info && print_details);
     ret = process_packet_single_pass(cf, edt, data_offset,
                                         wtap_phdr(cf->provider.wth),
                                         wtap_buf_ptr(cf->provider.wth), tap_flags);
}

其对应的流程图为，其中从初始化Frame数据开始为process_packet_single_pass中的过程，数据读入指实际从网卡或文件读入：

协议分析流程

从上面的步骤能够能够看出如果要进行协议分析需要做两件重要的准备：

准备一个Epan Dissect分析器的实例
待分析的数据

怎么准备？上面的代码片段已经给出了答案，实际上所有的参数都可以从capture_file中直接或间接地得到。capture_file结构定义如下：

typedef struct _capture_file {
  epan_t      *epan;
  file_state   state;                /* Current state of capture file */
  gchar       *filename;             /* Name of capture file */
  ...
  gboolean     is_tempfile;          /* Is capture file a temporary file? */
  ...
  gint64       f_datalen;            /* Size of capture file data (uncompressed) */
  guint16      cd_t;                 /* File type of capture file */
  unsigned int open_type;            /* open_routine index+1 used, if selected, or WTAP_TYPE_AUTO */
  ...
  guint32      count;                /* Total number of frames */
  ...
  gboolean     drops_known;          /* TRUE if we know how many packets were dropped */
  guint32      drops;                /* Dropped packets */
  nstime_t     elapsed_time;         /* Elapsed time */
  int          snap;                 /* Maximum captured packet length; 0 if unknown */
  ...
  struct wtap_pkthdr phdr;           /* Packet header */
  Buffer       buf;                  /* Packet data */
  /* packet provider */
  struct packet_provider_data provider;
  /* frames */
  ...
  column_info  cinfo;                /* Column formatting information */
  ...
} capture_file;

如下是获得一个capture_file对象的代码片段,

cf_status_t
cf_open(capture_file *cf, const char *fname, unsigned int type, gboolean is_tempfile, int *err)
{
  wtap  *wth;
  gchar *err_info;

  wth = wtap_open_offline(fname, type, err, &err_info, perform_two_pass_analysis);
  epan_free(cf->epan);
  cf->epan = tshark_epan_new(cf);

  cf->provider.wth = wth;
  cf->f_datalen = 0; /* not used, but set it anyway */

  cf->filename = g_strdup(fname);

  cf->is_tempfile = is_tempfile;

  cf->unsaved_changes = FALSE;

  cf->cd_t      = wtap_file_type_subtype(cf->provider.wth);
  cf->open_type = type;
  cf->count     = 0;
  cf->drops_known = FALSE;
  cf->drops     = 0;
  cf->snap      = wtap_snapshot_length(cf->provider.wth);
  nstime_set_zero(&cf->elapsed_time);
  cf->provider.ref = NULL;
  cf->provider.prev_dis = NULL;
  cf->provider.prev_cap = NULL;

  cf->state = FILE_READ_IN_PROGRESS;

  wtap_set_cb_new_ipv4(cf->provider.wth, add_ipv4_name);
  wtap_set_cb_new_ipv6(cf->provider.wth, (wtap_new_ipv6_callback_t) add_ipv6_name);

  return CF_OK;

fail:
  return CF_ERROR;
}

再来看看Wireshark协议分析的入口是什么样子的。

1. void epan_dissect_run(epan_dissect_t *edt, int file_type_subtype,
    struct wtap_pkthdr *phdr, tvbuff_t *tvb, frame_data *fd,
    column_info *cinfo);

2. void epan_dissect_run_with_taps(epan_dissect_t *edt, int file_type_subtype,
    struct wtap_pkthdr *phdr, tvbuff_t *tvb, frame_data *fd,
    column_info *cinfo);

3. void epan_dissect_file_run(epan_dissect_t *edt, struct wtap_pkthdr *phdr,
    tvbuff_t *tvb, frame_data *fd, column_info *cinfo);

4. void epan_dissect_file_run_with_taps(epan_dissect_t *edt, struct wtap_pkthdr *phdr,
    tvbuff_t *tvb, frame_data *fd, column_info *cinfo);

其中1,2接口用于处理Frame数据的，3,4接口用于处理文件数据的，同时它们都有附加Tap与不附加Tap两种选择。这儿以epan_dissect_run为例开始分析。

epan_dissect_run需要如下6个参数:

edt
file_type_subtype
phdr
tvb
fd
cinfo

edt这个参数是一个epan_dissect_t对象，其可以通过epan_dissect_new接口获得，依赖的参数就是capture_file中的epan

file_type_subtype就是指capture_file中记录的文件类型，就是其中的cd_t字段

phdr通过wtap_phdr(capture_file->provider.wth)获取，实际上是指capture_file->provider.wth.phdr

tvb是一个指向wireshark中单独定义的一个buffer结构的指针，每次输入调用分析器之前，都需要New这么一个对象来保存当前Frame的信息，而在New的时候需要将capture_file->provider中指向Frame数据的指针赋值给这个Buffer的data字段。

fd是一个指向描述Frame数据的数据结构的对象的指针，该数据结构类型为frame_data。传参前直接定义这么一个变量就可以，但需要用frame_data_init初始化。

cinfo就是一个指向capture_file->cinfo字段的指针

好了，说到这里总结起来就是获得capture_file对象，并按接口要求选择合适的字段或构造合适的对象输入分析器入口的接口就可以。而数据从哪儿获取？数据就存储在capture_file->provider中。

Wrieshark中Dissector之间的跳转

从上文描述能够知道Wireshark协议分析与处理主要依赖了epan_dissect_t对象。每一个数据包都对应了一个epan_dissect_t对象，epan_dissect_t中将会保存分析过程中的所有信息，其中特别重要的是Protocol Tree。Protocol Tree的在Wireshark上的直观表现形式就如下图所示：

Protocol Tree结构

上面就是一个数据包的Protocol Tree。接着之前的说起，前面说道调用epan_dissect_run就开始了协议分析，那么来看看epan_dissect_run是怎么实现的。

void epan_dissect_run(epan_dissect_t *edt, int file_type_subtype,
    struct wtap_pkthdr *phdr, tvbuff_t *tvb, frame_data *fd,
    column_info *cinfo)
{
    wmem_enter_packet_scope();
    dissect_record(edt, file_type_subtype, phdr, tvb, fd, cinfo);

    /* 在这，协议分析过程中所有使用wmem_*[malloc|realloc|alloc0]等接口
       申请的内存都会被释放 */
    wmem_leave_packet_scope();
}

从上面的代码看到调用了dissect_record接口，而dissect_record接口是在packet.c中实现的两个入口接口之一，另一个是dissect_file,当使用epan_dissect_file*之类的接口时就会被调用到。

在dissect_record里面在对当前的epan_dissect_t对象进行了一些必要的初始化后就跳转到了Frame数据的处理模块，这个模块的实现在epan/dissectors/packet-frame.c中。

说到这儿最大的疑惑就变成了Disscetor之间是怎么跳转的了，下面将用一张图说明其跳转原理。

跳转表结构

Wireshark中对于Disscetor之间的跳转用到了几张全局Hash表，上图中画出了其中的两张最基本也是最重要的。对于这点在Wireshark(1)——初始化中也有提到，这些表都是在程序启动时初始化的，其中包括各个协议向这些表注册处理句柄。如上图所示，Regiesterd Dissectors中保存了所有Disscetor的处理句柄,每个Disscetor对应一个协议，从图中可看出已经注册了frame, file, data, ethernet, ip, tcp等这些协议的解析器，当然frame, file, data不是协议，但在这将它们当做协议同等对待。有了这张表之后在协议分析过程中当知道下一层协议是什么时，就直接可以从这张表查相应协议的处理句柄。那Dissector Table的作用在这应该也很好理解了，在Wireshark初始化阶段某些协议会注册一些用于跳转的字段，如ip.proto是IP协议注册的字段，而当TCP或UDP等协议在初始化时就会将自生的协议号与处理句柄作为键值对注册到ip.proto对应的Dissector Table中的Hash表中去。这样当协议分析在IP层处理时，只需要使用ip.proto的值到HashTable中查找处理句柄就可以了。

有人在这儿可能有疑问了，貌似两张表都可以查到处理句柄为什么增加复杂度设计多张不同的表呢？这个问题从代码设计来看有几方面的考量。

增加Dissector Table使得在跳转时不用关心相应字段的具体值，直接做Hash查找就可以了。这样使得代码更简洁。
更多的使用Dissector Table去查找可以减少查找过程的冲突率，或可以完全消除查找冲突，因为这样做相当于对所有协议分组，每次查找到相应的分组中去查。

Reference

WireShark Source Code

转载请注明来源，欢迎对文章中的引用来源进行考证，欢迎指出任何有错误或不够清晰的表达。可以在下面评论区评论，也可以邮件至 yxhlfx@163.com