hl7parse
Classes | Typedefs | Enumerations | Functions
bom.h File Reference

find unicode bom More...

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
Include dependency graph for bom.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

struct  bom_t
 Byte Order MArk (BOM) information of a file. This struct is created by detect_bom() More...
 

Typedefs

typedef struct bom_t bom_t
 Byte Order MArk (BOM) information of a file. This struct is created by detect_bom()
 

Enumerations

enum  bom_endianness_t { UNKNOWN, LITTLE, BIG, SIGNATURE }
 endianness detected in bom More...
 

Functions

char * bom_to_string (int length, unsigned char *bom, bom_endianness_t endianness)
 hex representation of the bom More...
 
void print_bom (bom_t *bom)
 debug function to print bom More...
 
bom_tdetect_bom (FILE *fd)
 check if the file has a bom More...
 

Detailed Description

find unicode bom

When parsing an HL7 file, the opened file pointer should be at the beginning of data (typically just at the beginning of MSH).

If the file contains a unicode BOM, and the file pointer points at the beginning of the file, the parser will fail. Therefore we first must skip the BOM bytes.

This is a crude method of detecting if the file has a BOM. Alternatively you may deploy you own method and just skip ahead until you know the file pointer is at the first character of data (at the beginning of MSH) before parsing the file.

how it's done:

we try to detect known BOM patterns and then place the pointer just after it. known patterns:

2 Bytes

3 Bytes

4 Bytes

usage

#include "bom.h"
FILE *fd = fopen(some/file, 'rb');
rewind(fd); // make sure the file pointer is at the beginning of the file

Enumeration Type Documentation

◆ bom_endianness_t

endianness detected in bom

Enumerator
UNKNOWN 

undetected

LITTLE 

little endian

BIG 

big endian

SIGNATURE 

smaller than 16 bit, it doesn't matter

Function Documentation

◆ bom_to_string()

char* bom_to_string ( int  length,
unsigned char *  bom,
bom_endianness_t  endianness 
)

hex representation of the bom

Parameters
lengthlenght of input buffer
bombyte array with the bom
endiannessendianness to display
Returns
printable string

◆ detect_bom()

bom_t* detect_bom ( FILE *  fd)

check if the file has a bom

if there is a bom, it will be copied to bom->bom. The file pointer will be set to the first character after the bom.

To check if a bom has been detected, bom->length is greater than 0. Length represents the number of bytes bom->bom contains.

Note
The file pointer must be at the beginning of the file or this will fail. Either run detect_bom() right after opening a file or rewind before using.
See also
https://en.wikipedia.org/wiki/Byte_order_mark
Parameters
fdfile handle to read data from
Returns
bom_t bom bytes are stored in bom->bom, length is indicated by bom->length

◆ print_bom()

void print_bom ( bom_t bom)

debug function to print bom

Parameters
bom