hl7parse
|
find unicode bom More...
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
Go to the source code of this file.
Classes | |
struct | bom_t |
Byte Order MArk (BOM) information of a file. This struct is created by detect_bom() More... | |
Typedefs | |
typedef struct bom_t | bom_t |
Byte Order MArk (BOM) information of a file. This struct is created by detect_bom() | |
Enumerations | |
enum | bom_endianness_t { UNKNOWN, LITTLE, BIG, SIGNATURE } |
endianness detected in bom More... | |
Functions | |
char * | bom_to_string (int length, unsigned char *bom, bom_endianness_t endianness) |
hex representation of the bom More... | |
void | print_bom (bom_t *bom) |
debug function to print bom More... | |
bom_t * | detect_bom (FILE *fd) |
check if the file has a bom More... | |
find unicode bom
When parsing an HL7 file, the opened file pointer should be at the beginning of data (typically just at the beginning of MSH
).
If the file contains a unicode BOM, and the file pointer points at the beginning of the file, the parser will fail. Therefore we first must skip the BOM bytes.
This is a crude method of detecting if the file has a BOM. Alternatively you may deploy you own method and just skip ahead until you know the file pointer is at the first character of data (at the beginning of MSH
) before parsing the file.
we try to detect known BOM patterns and then place the pointer just after it. known patterns:
2 Bytes
0xFF 0xFE
0xFE 0xFF
3 Bytes
0xEF 0xBB 0xBF
0xF7 0x64 0x4C
0x0E 0xFE 0xFF
0xFB 0xEE 0xFF
4 Bytes
0x2B 0x2F 0x76
// Followed by 38, 39, 2B, or 2F (ASCII 8, 9, + or /), depending on what the next character is.0x00 0x00 0xFF 0xFF
0xFF 0xFE 0x00 0x00
0xDD 0x73 0x66 0x73
0x84 0x31 0x95 0x33
enum bom_endianness_t |
char* bom_to_string | ( | int | length, |
unsigned char * | bom, | ||
bom_endianness_t | endianness | ||
) |
hex representation of the bom
length | lenght of input buffer |
bom | byte array with the bom |
endianness | endianness to display |
bom_t* detect_bom | ( | FILE * | fd | ) |
check if the file has a bom
if there is a bom, it will be copied to bom->bom
. The file pointer will be set to the first character after the bom.
To check if a bom has been detected, bom->length is greater than 0. Length represents the number of bytes bom->bom contains.
fd | file handle to read data from |
bom->bom
, length is indicated by bom->length
void print_bom | ( | bom_t * | bom | ) |
debug function to print bom
bom |