github的一些开源项目
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

212 lines
9.8 KiB

  1. <html>
  2. <head>
  3. <title>pcre2serialize specification</title>
  4. </head>
  5. <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
  6. <h1>pcre2serialize man page</h1>
  7. <p>
  8. Return to the <a href="index.html">PCRE2 index page</a>.
  9. </p>
  10. <p>
  11. This page is part of the PCRE2 HTML documentation. It was generated
  12. automatically from the original man page. If there is any nonsense in it,
  13. please consult the man page, in case the conversion went wrong.
  14. <br>
  15. <ul>
  16. <li><a name="TOC1" href="#SEC1">SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS</a>
  17. <li><a name="TOC2" href="#SEC2">SECURITY CONCERNS</a>
  18. <li><a name="TOC3" href="#SEC3">SAVING COMPILED PATTERNS</a>
  19. <li><a name="TOC4" href="#SEC4">RE-USING PRECOMPILED PATTERNS</a>
  20. <li><a name="TOC5" href="#SEC5">AUTHOR</a>
  21. <li><a name="TOC6" href="#SEC6">REVISION</a>
  22. </ul>
  23. <br><a name="SEC1" href="#TOC1">SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS</a><br>
  24. <P>
  25. <b>int32_t pcre2_serialize_decode(pcre2_code **<i>codes</i>,</b>
  26. <b> int32_t <i>number_of_codes</i>, const uint8_t *<i>bytes</i>,</b>
  27. <b> pcre2_general_context *<i>gcontext</i>);</b>
  28. <br>
  29. <br>
  30. <b>int32_t pcre2_serialize_encode(const pcre2_code **<i>codes</i>,</b>
  31. <b> int32_t <i>number_of_codes</i>, uint8_t **<i>serialized_bytes</i>,</b>
  32. <b> PCRE2_SIZE *<i>serialized_size</i>, pcre2_general_context *<i>gcontext</i>);</b>
  33. <br>
  34. <br>
  35. <b>void pcre2_serialize_free(uint8_t *<i>bytes</i>);</b>
  36. <br>
  37. <br>
  38. <b>int32_t pcre2_serialize_get_number_of_codes(const uint8_t *<i>bytes</i>);</b>
  39. <br>
  40. <br>
  41. If you are running an application that uses a large number of regular
  42. expression patterns, it may be useful to store them in a precompiled form
  43. instead of having to compile them every time the application is run. However,
  44. if you are using the just-in-time optimization feature, it is not possible to
  45. save and reload the JIT data, because it is position-dependent. The host on
  46. which the patterns are reloaded must be running the same version of PCRE2, with
  47. the same code unit width, and must also have the same endianness, pointer width
  48. and PCRE2_SIZE type. For example, patterns compiled on a 32-bit system using
  49. PCRE2's 16-bit library cannot be reloaded on a 64-bit system, nor can they be
  50. reloaded using the 8-bit library.
  51. </P>
  52. <P>
  53. Note that "serialization" in PCRE2 does not convert compiled patterns to an
  54. abstract format like Java or .NET serialization. The serialized output is
  55. really just a bytecode dump, which is why it can only be reloaded in the same
  56. environment as the one that created it. Hence the restrictions mentioned above.
  57. Applications that are not statically linked with a fixed version of PCRE2 must
  58. be prepared to recompile patterns from their sources, in order to be immune to
  59. PCRE2 upgrades.
  60. </P>
  61. <br><a name="SEC2" href="#TOC1">SECURITY CONCERNS</a><br>
  62. <P>
  63. The facility for saving and restoring compiled patterns is intended for use
  64. within individual applications. As such, the data supplied to
  65. <b>pcre2_serialize_decode()</b> is expected to be trusted data, not data from
  66. arbitrary external sources. There is only some simple consistency checking, not
  67. complete validation of what is being re-loaded. Corrupted data may cause
  68. undefined results. For example, if the length field of a pattern in the
  69. serialized data is corrupted, the deserializing code may read beyond the end of
  70. the byte stream that is passed to it.
  71. </P>
  72. <br><a name="SEC3" href="#TOC1">SAVING COMPILED PATTERNS</a><br>
  73. <P>
  74. Before compiled patterns can be saved they must be serialized, which in PCRE2
  75. means converting the pattern to a stream of bytes. A single byte stream may
  76. contain any number of compiled patterns, but they must all use the same
  77. character tables. A single copy of the tables is included in the byte stream
  78. (its size is 1088 bytes). For more details of character tables, see the
  79. <a href="pcre2api.html#localesupport">section on locale support</a>
  80. in the
  81. <a href="pcre2api.html"><b>pcre2api</b></a>
  82. documentation.
  83. </P>
  84. <P>
  85. The function <b>pcre2_serialize_encode()</b> creates a serialized byte stream
  86. from a list of compiled patterns. Its first two arguments specify the list,
  87. being a pointer to a vector of pointers to compiled patterns, and the length of
  88. the vector. The third and fourth arguments point to variables which are set to
  89. point to the created byte stream and its length, respectively. The final
  90. argument is a pointer to a general context, which can be used to specify custom
  91. memory management functions. If this argument is NULL, <b>malloc()</b> is used
  92. to obtain memory for the byte stream. The yield of the function is the number
  93. of serialized patterns, or one of the following negative error codes:
  94. <pre>
  95. PCRE2_ERROR_BADDATA the number of patterns is zero or less
  96. PCRE2_ERROR_BADMAGIC mismatch of id bytes in one of the patterns
  97. PCRE2_ERROR_NOMEMORY memory allocation failed
  98. PCRE2_ERROR_MIXEDTABLES the patterns do not all use the same tables
  99. PCRE2_ERROR_NULL the 1st, 3rd, or 4th argument is NULL
  100. </pre>
  101. PCRE2_ERROR_BADMAGIC means either that a pattern's code has been corrupted, or
  102. that a slot in the vector does not point to a compiled pattern.
  103. </P>
  104. <P>
  105. Once a set of patterns has been serialized you can save the data in any
  106. appropriate manner. Here is sample code that compiles two patterns and writes
  107. them to a file. It assumes that the variable <i>fd</i> refers to a file that is
  108. open for output. The error checking that should be present in a real
  109. application has been omitted for simplicity.
  110. <pre>
  111. int errorcode;
  112. uint8_t *bytes;
  113. PCRE2_SIZE erroroffset;
  114. PCRE2_SIZE bytescount;
  115. pcre2_code *list_of_codes[2];
  116. list_of_codes[0] = pcre2_compile("first pattern",
  117. PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
  118. list_of_codes[1] = pcre2_compile("second pattern",
  119. PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
  120. errorcode = pcre2_serialize_encode(list_of_codes, 2, &bytes,
  121. &bytescount, NULL);
  122. errorcode = fwrite(bytes, 1, bytescount, fd);
  123. </pre>
  124. Note that the serialized data is binary data that may contain any of the 256
  125. possible byte values. On systems that make a distinction between binary and
  126. non-binary data, be sure that the file is opened for binary output.
  127. </P>
  128. <P>
  129. Serializing a set of patterns leaves the original data untouched, so they can
  130. still be used for matching. Their memory must eventually be freed in the usual
  131. way by calling <b>pcre2_code_free()</b>. When you have finished with the byte
  132. stream, it too must be freed by calling <b>pcre2_serialize_free()</b>. If this
  133. function is called with a NULL argument, it returns immediately without doing
  134. anything.
  135. </P>
  136. <br><a name="SEC4" href="#TOC1">RE-USING PRECOMPILED PATTERNS</a><br>
  137. <P>
  138. In order to re-use a set of saved patterns you must first make the serialized
  139. byte stream available in main memory (for example, by reading from a file). The
  140. management of this memory block is up to the application. You can use the
  141. <b>pcre2_serialize_get_number_of_codes()</b> function to find out how many
  142. compiled patterns are in the serialized data without actually decoding the
  143. patterns:
  144. <pre>
  145. uint8_t *bytes = &#60;serialized data&#62;;
  146. int32_t number_of_codes = pcre2_serialize_get_number_of_codes(bytes);
  147. </pre>
  148. The <b>pcre2_serialize_decode()</b> function reads a byte stream and recreates
  149. the compiled patterns in new memory blocks, setting pointers to them in a
  150. vector. The first two arguments are a pointer to a suitable vector and its
  151. length, and the third argument points to a byte stream. The final argument is a
  152. pointer to a general context, which can be used to specify custom memory
  153. management functions for the decoded patterns. If this argument is NULL,
  154. <b>malloc()</b> and <b>free()</b> are used. After deserialization, the byte
  155. stream is no longer needed and can be discarded.
  156. <pre>
  157. pcre2_code *list_of_codes[2];
  158. uint8_t *bytes = &#60;serialized data&#62;;
  159. int32_t number_of_codes =
  160. pcre2_serialize_decode(list_of_codes, 2, bytes, NULL);
  161. </pre>
  162. If the vector is not large enough for all the patterns in the byte stream, it
  163. is filled with those that fit, and the remainder are ignored. The yield of the
  164. function is the number of decoded patterns, or one of the following negative
  165. error codes:
  166. <pre>
  167. PCRE2_ERROR_BADDATA second argument is zero or less
  168. PCRE2_ERROR_BADMAGIC mismatch of id bytes in the data
  169. PCRE2_ERROR_BADMODE mismatch of code unit size or PCRE2 version
  170. PCRE2_ERROR_BADSERIALIZEDDATA other sanity check failure
  171. PCRE2_ERROR_MEMORY memory allocation failed
  172. PCRE2_ERROR_NULL first or third argument is NULL
  173. </pre>
  174. PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled
  175. on a system with different endianness.
  176. </P>
  177. <P>
  178. Decoded patterns can be used for matching in the usual way, and must be freed
  179. by calling <b>pcre2_code_free()</b>. However, be aware that there is a potential
  180. race issue if you are using multiple patterns that were decoded from a single
  181. byte stream in a multithreaded application. A single copy of the character
  182. tables is used by all the decoded patterns and a reference count is used to
  183. arrange for its memory to be automatically freed when the last pattern is
  184. freed, but there is no locking on this reference count. Therefore, if you want
  185. to call <b>pcre2_code_free()</b> for these patterns in different threads, you
  186. must arrange your own locking, and ensure that <b>pcre2_code_free()</b> cannot
  187. be called by two threads at the same time.
  188. </P>
  189. <P>
  190. If a pattern was processed by <b>pcre2_jit_compile()</b> before being
  191. serialized, the JIT data is discarded and so is no longer available after a
  192. save/restore cycle. You can, however, process a restored pattern with
  193. <b>pcre2_jit_compile()</b> if you wish.
  194. </P>
  195. <br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
  196. <P>
  197. Philip Hazel
  198. <br>
  199. Retired from University Computing Service
  200. <br>
  201. Cambridge, England.
  202. <br>
  203. </P>
  204. <br><a name="SEC6" href="#TOC1">REVISION</a><br>
  205. <P>
  206. Last updated: 27 June 2018
  207. <br>
  208. Copyright &copy; 1997-2018 University of Cambridge.
  209. <br>
  210. <p>
  211. Return to the <a href="index.html">PCRE2 index page</a>.
  212. </p>