Most unix commands emit text output aimed at humans. It is designed to be parsed and understood by a user. Humans are gifted at extracting details and pattern matching in such output. Often programmers need to extract information from this human-oriented output. Programmers use tools like grep, awk, and regular expressions to ferret out the pieces of information they need. Such solutions are fragile and require maintenance when output contents change or evolve, along with testing and validation.
Modern tool developers favor encoding schemes like XML and JSON, which allow trivial parsing and extraction of data. Such formats are simple, well understood, hierarchical, easily parsed, and often integrate easier with common tools and environments. Changes to content can be done in ways that do not break existing users of the data, which can reduce maintenance costs and increase feature velocity.
In addition, modern reality means that more output ends up in web browsers than in terminals, making HTML output valuable.
libxo allows a single set of function calls in source code to generate traditional text output, as well as XML and JSON formatted data. HTML can also be generated; "<div>" elements surround the traditional text output, with attributes that detail how to render the data.
There are four encoding styles supported by libxo:
- TEXT output can be display on a terminal session, allowing compatibility with traditional command line usage.
- XML output is suitable for tools like XPath and protocols like NETCONF.
- JSON output can be used for RESTful APIs and integration with languages like Javascript and Python.
- HTML can be matched with a small CSS file to permit rendering in any HTML5 browser.
In general, XML and JSON are suitable for encoding data, while TEXT is suited for terminal output and HTML is suited for display in a web browser (see Section 8).
Section Contents:
Most traditional programs generate text output on standard output, with contents like:
36 ./src
40 ./bin
90 .
In this example (taken from du source code), the code to generate this data might look like:
printf("%d\t%s\n", num_blocks, path);
Simple, direct, obvious. But it's only making text output. Imagine using a single code path to make TEXT, XML, JSON or HTML, deciding at run time which to generate.
libxo expands on the idea of printf format strings to make a single format containing instructions for creating multiple output styles:
xo_emit("{:blocks/%d}\t{:path/%s}\n", num_blocks, path);
This line will generate the same text output as the earlier printf call, but also has enough information to generate XML, JSON, and HTML.
The following sections introduce the other formats.
XML output consists of a hierarchical set of elements, each encoded with a start tag and an end tag. The element should be named for data value that it is encoding:
<item>
<blocks>36</blocks>
<path>./src</path>
</item>
<item>
<blocks>40</blocks>
<path>./bin</path>
</item>
<item>
<blocks>90</blocks>
<path>.</path>
</item>
XML is a W3C standard for encoding data. See w3c.org/TR/xml for additional information.
JSON output consists of a hierarchical set of objects and lists, each encoded with a quoted name, a colon, and a value. If the value is a string, it must be quoted, but numbers are not quoted. Objects are encoded using braces; lists are encoded using square brackets. Data inside objects and lists is separated using commas:
items: [
{ "blocks": 36, "path" : "./src" },
{ "blocks": 40, "path" : "./bin" },
{ "blocks": 90, "path" : "./" }
]
HTML output is designed to allow the output to be rendered in a web browser with minimal effort. Each piece of output data is rendered inside a <div> element, with a class name related to the role of the data. By using a small set of class attribute values, a CSS stylesheet can render the HTML into rich text that mirrors the traditional text content.
Additional attributes can be enabled to provide more details about the data, including data type, description, and an XPath location.
<div class="line">
<div class="data" data-tag="blocks">36</div>
<div class="padding"> </div>
<div class="data" data-tag="path">./src</div>
</div>
<div class="line">
<div class="data" data-tag="blocks">40</div>
<div class="padding"> </div>
<div class="data" data-tag="path">./bin</div>
</div>
<div class="line">
<div class="data" data-tag="blocks">90</div>
<div class="padding"> </div>
<div class="data" data-tag="path">./</div>
</div>
libxo uses format strings to control the rendering of data into the various output styles. Each format string contains a set of zero or more field descriptions, which describe independent data fields. Each field description contains a set of modifiers, a content string, and zero, one, or two format descriptors. The modifiers tell libxo what the field is and how to treat it, while the format descriptors are formatting instructions using printf-style format strings, telling libxo how to format the field. The field description is placed inside a set of braces, with a colon (":") after the modifiers and a slash ("/") before each format descriptors. Text may be intermixed with field descriptions within the format string.
The field description is given as follows:
'{' [ role | modifier ]* [',' long-names ]* ':' [ content ]
[ '/' field-format [ '/' encoding-format ]] '}'
The role describes the function of the field, while the modifiers enable optional behaviors. The contents, field-format, and encoding-format are used in varying ways, based on the role. These are described in the following sections.
In the following example, three field descriptors appear. The first is a padding field containing three spaces of padding, the second is a label ("In stock"), and the third is a value field ("in‑stock"). The in-stock field has a "%u" format that will parse the next argument passed to the xo_emit function as an unsigned integer.
xo_emit("{P: }{Lwc:In stock}{:in-stock/%u}\n", 65);
This single line of code can generate text (" In stock: 65\n"), XML ("<in‑stock>65</in‑stock>"), JSON ('"in‑stock": 6'), or HTML (too lengthy to be listed here).
While roles and modifiers typically use single character for brevity, there are alternative names for each which allow more verbose formatting strings. These names must be preceded by a comma, and may follow any single-character values:
xo_emit("{L,white,colon:In stock}{,key:in-stock/%u}\n", 65);
Section Contents:
Field roles are optional, and indicate the role and formatting of the content. The roles are listed below; only one role is permitted:
R |
Name |
Description |
C |
color |
Field has color and effect controls |
D |
decoration |
Field is non-text (e.g., colon, comma) |
E |
error |
Field is an error message |
G |
gettext |
Call gettext(3) on the format string |
L |
label |
Field is text that prefixes a value |
N |
note |
Field is text that follows a value |
P |
padding |
Field is spaces needed for vertical alignment |
T |
title |
Field is a title value for headings |
U |
units |
Field is the units for the previous value field |
V |
value |
Field is the name of field (the default) |
W |
warning |
Field is a warning message |
[ |
start-anchor |
Begin a section of anchored variable-width text |
] |
stop-anchor |
End a section of anchored variable-width text |
EXAMPLE:
xo_emit("{L:Free}{D::}{P: }{:free/%u} {U:Blocks}\n",
free_blocks);
When a role is not provided, the "value" role is used as the default.
Roles and modifiers can also use more verbose names, when preceded by a comma:
EXAMPLE:
xo_emit("{,label:Free}{,decoration::}{,padding: }"
"{,value:free/%u} {,units:Blocks}\n",
free_blocks);
Section Contents:
Colors and effects control how text values are displayed; they are used for display styles (TEXT and HTML).
xo_emit("{C:bold}{:value}{C:no-bold}\n", value);
Colors and effects remain in effect until modified by other "C"-role fields.
xo_emit("{C:bold}{C:inverse}both{C:no-bold}only inverse\n");
If the content is empty, the "reset" action is performed.
xo_emit("{C:both,underline}{:value}{C:}\n", value);
The content should be a comma-separated list of zero or more colors or display effects.
xo_emit("{C:bold,inverse}Ugly{C:no-bold,no-inverse}\n");
The color content can be either static, when placed directly within the field descriptor, or a printf-style format descriptor can be used, if preceded by a slash ("/"):
xo_emit("{C:/%s%s}{:value}{C:}", need_bold ? "bold" : "",
need_underline ? "underline" : "", value);
Color names are prefixed with either "fg‑" or "bg‑" to change the foreground and background colors, respectively.
xo_emit("{C:/fg-%s,bg-%s}{Lwc:Cost}{:cost/%u}{C:reset}\n",
fg_color, bg_color, cost);
The following table lists the supported effects:
Name |
Description |
bg-XXXXX |
Change background color |
bold |
Start bold text effect |
fg-XXXXX |
Change foreground color |
inverse |
Start inverse (aka reverse) text effect |
no-bold |
Stop bold text effect |
no-inverse |
Stop inverse (aka reverse) text effect |
no-underline |
Stop underline text effect |
normal |
Reset effects (only) |
reset |
Reset colors and effects (restore defaults) |
underline |
Start underline text effect |
The following color names are supported:
Name |
Description |
black |
|
blue |
|
cyan |
|
default |
Default color for foreground or background |
green |
|
magenta |
|
red |
|
white |
|
yellow |
|
When using colors, the developer should remember that users will change the foreground and background colors of terminal session according to their own tastes, so assuming that "blue" looks nice is never safe, and is a constant annoyance to your dear author. In addition, a significant percentage of users (1 in 12) will be color blind. Depending on color to convey critical information is not a good idea. Color should enhance output, but should not be used as the sole means of encoding information.
Decorations are typically punctuation marks such as colons, semi-colons, and commas used to decorate the text and make it simpler for human readers. By marking these distinctly, HTML usage scenarios can use CSS to direct their display parameters.
xo_emit("{D:((}{:name}{D:))}\n", name);
libxo supports internationalization (i18n) through its use of gettext(3). Use the "{G:}" role to request that the remaining part of the format string, following the "{G:}" field, be handled using gettext().
Since gettext() uses the string as the key into the message catalog, libxo uses a simplified version of the format string that removes unimportant field formatting and modifiers, stopping minor formatting changes from impacting the expensive translation process. A developer change such as changing "/%06d" to "/%08d" should not force hand inspection of all .po files.
The simplified version can be generated for a single message using the "xopo -s <text>" command, or an entire .pot can be translated using the "xopo -f <input> -o <output>" command.
xo_emit("{G:}Invalid token\n");
The {G:} role allows a domain name to be set. gettext calls will continue to use that domain name until the current format string processing is complete, enabling a library function to emit strings using it's own catalog. The domain name can be either static as the content of the field, or a format can be used to get the domain name from the arguments.
xo_emit("{G:libc}Service unavailable in restricted mode\n");
See Section 11.5 for additional details.
Labels are text that appears before a value.
xo_emit("{Lwc:Cost}{:cost/%u}\n", cost);
Notes are text that appears after a value.
xo_emit("{:cost/%u} {N:per year}\n", cost);
Padding represents whitespace used before and between fields.
The padding content can be either static, when placed directly within the field descriptor, or a printf-style format descriptor can be used, if preceded by a slash ("/"):
xo_emit("{P: }{Lwc:Cost}{:cost/%u}\n", cost);
xo_emit("{P:/%30s}{Lwc:Cost}{:cost/%u}\n", "", cost);
Title are heading or column headers that are meant to be displayed to the user. The title can be either static, when placed directly within the field descriptor, or a printf-style format descriptor can be used, if preceded by a slash ("/"):
xo_emit("{T:Interface Statistics}\n");
xo_emit("{T:/%20.20s}{T:/%6.6s}\n", "Item Name", "Cost");
Title fields have an extra convenience feature; if both content and format are specified, instead of looking to the argument list for a value, the content is used, allowing a mixture of format and content within the field descriptor:
xo_emit("{T:Name/%20s}{T:Count/%6s}\n");
Since the incoming argument is a string, the format must be "%s" or something suitable.
Units are the dimension by which values are measured, such as degrees, miles, bytes, and decibels. The units field carries this information for the previous value field.
xo_emit("{Lwc:Distance}{:distance/%u}{Uw:miles}\n", miles);
Note that the sense of the 'w' modifier is reversed for units; a blank is added before the contents, rather than after it.
When the XOF_UNITS flag is set, units are rendered in XML as the "units" attribute:
<distance units="miles">50</distance>
Units can also be rendered in HTML as the "data‑units" attribute:
<div class="data" data-tag="distance" data-units="miles"
data-xpath="/top/data/distance">50</div>
The value role is used to represent the a data value that is interesting for the non-display output styles (XML and JSON). Value is the default role; if no other role designation is given, the field is a value. The field name must appear within the field descriptor, followed by one or two format descriptors. The first format descriptor is used for display styles (TEXT and HTML), while the second one is used for encoding styles (XML and JSON). If no second format is given, the encoding format defaults to the first format, with any minimum width removed. If no first format is given, both format descriptors default to "%s".
xo_emit("{:length/%02u}x{:width/%02u}x{:height/%02u}\n",
length, width, height);
xo_emit("{:author} wrote \"{:poem}\" in {:year/%4d}\n,
author, poem, year);
The anchor roles allow a set of strings by be padded as a group, but still be visible to xo_emit as distinct fields. Either the start or stop anchor can give a field width and it can be either directly in the descriptor or passed as an argument. Any fields between the start and stop anchor are padded to meet the minimum width given.
To give a width directly, encode it as the content of the anchor tag:
xo_emit("({[:10}{:min/%d}/{:max/%d}{]:})\n", min, max);
To pass a width as an argument, use "%d" as the format, which must appear after the "/". Note that only "%d" is supported for widths. Using any other value could ruin your day.
xo_emit("({[:/%d}{:min/%d}/{:max/%d}{]:})\n", width, min, max);
If the width is negative, padding will be added on the right, suitable for left justification. Otherwise the padding will be added to the left of the fields between the start and stop anchors, suitable for right justification. If the width is zero, nothing happens. If the number of columns of output between the start and stop anchors is less than the absolute value of the given width, nothing happens.
Widths over 8k are considered probable errors and not supported. If XOF_WARN is set, a warning will be generated.
Field modifiers are flags which modify the way content emitted for particular output styles:
M |
Name |
Description |
a |
argument |
The content appears as a 'const char *' argument |
c |
colon |
A colon (":") is appended after the label |
d |
display |
Only emit field for display styles (text/HTML) |
e |
encoding |
Only emit for encoding styles (XML/JSON) |
g |
gettext |
Call gettext on field's render content |
h |
humanize (hn) |
Format large numbers in human-readable style |
|
hn-space |
Humanize: Place space between numeric and unit |
|
hn-decimal |
Humanize: Add a decimal digit, if number < 10 |
|
hn-1000 |
Humanize: Use 1000 as divisor instead of 1024 |
k |
key |
Field is a key, suitable for XPath predicates |
l |
leaf-list |
Field is a leaf-list |
n |
no-quotes |
Do not quote the field when using JSON style |
p |
plural |
Gettext: Use comma-separated plural form |
q |
quotes |
Quote the field when using JSON style |
t |
trim |
Trim leading and trailing whitespace |
w |
white |
A blank (" ") is appended after the label |
Roles and modifiers can also use more verbose names, when preceded by a comma. For example, the modifier string "Lwc" (or "L,white,colon") means the field has a label role (text that describes the next field) and should be followed by a colon ('c') and a space ('w'). The modifier string "Vkq" (or ":key,quote") means the field has a value role (the default role), that it is a key for the current instance, and that the value should be quoted when encoded for JSON.
Section Contents:
The argument modifier indicates that the content of the field descriptor will be placed as a UTF-8 string (const char *) argument within the xo_emit parameters.
EXAMPLE:
xo_emit("{La:} {a:}\n", "Label text", "label", "value");
TEXT:
Label text value
JSON:
"label": "value"
XML:
<label>value</label>
The argument modifier allows field names for value fields to be passed on the stack, avoiding the need to build a field descriptor using snprintf. For many field roles, the argument modifier is not needed, since those roles have specific mechanisms for arguments, such as "{C:fg‑%s}".
The colon modifier appends a single colon to the data value:
EXAMPLE:
xo_emit("{Lc:Name}{:name}\n", "phil");
TEXT:
Name:phil
The colon modifier is only used for the TEXT and HTML output styles. It is commonly combined with the space modifier ('{w:}'). It is purely a convenience feature.
The display modifier indicated the field should only be generated for the display output styles, TEXT and HTML.
EXAMPLE:
xo_emit("{Lcw:Name}{d:name} {:id/%d}\n", "phil", 1);
TEXT:
Name: phil 1
XML:
<id>1</id>
The display modifier is the opposite of the encoding modifier, and they are often used to give to distinct views of the underlying data.
The display modifier indicated the field should only be generated for the display output styles, TEXT and HTML.
EXAMPLE:
xo_emit("{Lcw:Name}{:name} {e:id/%d}\n", "phil", 1);
TEXT:
Name: phil
XML:
<name>phil</name><id>1</id>
The encoding modifier is the opposite of the display modifier, and they are often used to give to distinct views of the underlying data.
The gettext modifier is used to translate individual fields using the gettext domain (typically set using the "{G:}" role) and current language settings. Once libxo renders the field value, it is passed to gettext(3), where it is used as a key to find the native language translation.
In the following example, the strings "State" and "full" are passed to gettext() to find locale-based translated strings.
xo_emit("{Lgwc:State}{g:state}\n", "full");
See Section 3.2.1.3, Section 3.2.2.10, and Section 11.5 for additional details.
The humanize modifier is used to render large numbers as in a human-readable format. While numbers like "44470272" are completely readable to computers and savants, humans will generally find "44M" more meaningful.
"hn" can be used as an alias for "humanize".
The humanize modifier only affects display styles (TEXT and HMTL). The "no‑humanize" option (See Section 4) will block the function of the humanize modifier.
There are a number of modifiers that affect details of humanization. These are only available in as full names, not single characters. The "hn‑space" modifier places a space between the number and any multiplier symbol, such as "M" or "K" (ex: "44 K"). The "hn‑decimal" modifier will add a decimal point and a single tenths digit when the number is less than 10 (ex: "4.4K"). The "hn‑1000" modifier will use 1000 as divisor instead of 1024, following the JEDEC-standard instead of the more natural binary powers-of-two tradition.
EXAMPLE:
xo_emit("{h:input/%u}, {h,hn-space:output/%u}, "
"{h,hn-decimal:errors/%u}, {h,hn-1000:capacity/%u}, "
"{h,hn-decimal:remaining/%u}\n",
input, output, errors, capacity, remaining);
TEXT:
21, 57 K, 96M, 44M, 1.2G
In the HTML style, the original numeric value is rendered in the "data‑number" attribute on the <div> element:
<div class="data" data-tag="errors"
data-number="100663296">96M</div>
The key modifier is used to indicate that a particular field helps uniquely identify an instance of list data.
EXAMPLE:
xo_open_list("user");
for (i = 0; i < num_users; i++) {
xo_open_instance("user");
xo_emit("User {k:name} has {:count} tickets\n",
user[i].u_name, user[i].u_tickets);
xo_close_instance("user");
}
xo_close_list("user");
Currently the key modifier is only used when generating XPath value for the HTML output style when XOF_XPATH is set, but other uses are likely in the near future.
The leaf-list modifier is used to distinguish lists where each instance consists of only a single value. In XML, these are rendered as single elements, where JSON renders them as arrays.
EXAMPLE:
for (i = 0; i < num_users; i++) {
xo_emit("Member {l:user}\n", user[i].u_name);
}
XML:
<user>phil</user>
<user>pallavi</user>
JSON:
"user": [ "phil", "pallavi" ]
The name of the field must match the name of the leaf list.
The no-quotes modifier (and its twin, the 'quotes' modifier) affect the quoting of values in the JSON output style. JSON uses quotes for string value, but no quotes for numeric, boolean, and null data. xo_emit applies a simple heuristic to determine whether quotes are needed, but often this needs to be controlled by the caller.
EXAMPLE:
const char *bool = is_true ? "true" : "false";
xo_emit("{n:fancy/%s}", bool);
JSON:
"fancy": true
The plural modifier selects the appropriate plural form of an expression based on the most recent number emitted and the current language settings. The contents of the field should be the singular and plural English values, separated by a comma:
xo_emit("{:bytes} {Ngp:byte,bytes}\n", bytes);
The plural modifier is meant to work with the gettext modifier ({g:}) but can work independently. See Section 3.2.2.5.
When used without the gettext modifier or when the message does not appear in the message catalog, the first token is chosen when the last numeric value is equal to 1; otherwise the second value is used, mimicking the simple pluralization rules of English.
When used with the gettext modifier, the ngettext(3) function is called to handle the heavy lifting, using the message catalog to convert the singular and plural forms into the native language.
The quotes modifier (and its twin, the 'no‑quotes' modifier) affect the quoting of values in the JSON output style. JSON uses quotes for string value, but no quotes for numeric, boolean, and null data. xo_emit applies a simple heuristic to determine whether quotes are needed, but often this needs to be controlled by the caller.
EXAMPLE:
xo_emit("{q:time/%d}", 2014);
JSON:
"year": "2014"
The heuristic is based on the format; if the format uses any of the following conversion specifiers, then no quotes are used:
d i o u x X D O U e E f F g G a A c C p
The trim modifier removes any leading or trailing whitespace from the value.
EXAMPLE:
xo_emit("{t:description}", " some input ");
JSON:
"description": "some input"
The white space modifier appends a single space to the data value:
EXAMPLE:
xo_emit("{Lw:Name}{:name}\n", "phil");
TEXT:
Name phil
The white space modifier is only used for the TEXT and HTML output styles. It is commonly combined with the colon modifier ('{c:}'). It is purely a convenience feature.
Note that the sense of the 'w' modifier is reversed for the units role ({Uw:}); a blank is added before the contents, rather than after it.
The field format is similar to the format string for printf(3). Its use varies based on the role of the field, but generally is used to format the field's contents.
If the format string is not provided for a value field, it defaults to "%s".
Note a field definition can contain zero or more printf-style 'directives', which are sequences that start with a '%' and end with one of following characters: "diouxXDOUeEfFgGaAcCsSp". Each directive is matched by one of more arguments to the xo_emit function.
The format string has the form:
'%' format-modifier * format-character
The format- modifier can be:
- a '#' character, indicating the output value should be prefixed with '0x', typically to indicate a base 16 (hex) value.
- a minus sign ('‑'), indicating the output value should be padded on the right instead of the left.
- a leading zero ('0') indicating the output value should be padded on the left with zeroes instead of spaces (' ').
- one or more digits ('0' - '9') indicating the minimum width of the argument. If the width in columns of the output value is less than the minimum width, the value will be padded to reach the minimum.
- a period followed by one or more digits indicating the maximum number of bytes which will be examined for a string argument, or the maximum width for a non-string argument. When handling ASCII strings this functions as the field width but for multi-byte characters, a single character may be composed of multiple bytes. xo_emit will never dereference memory beyond the given number of bytes.
- a second period followed by one or more digits indicating the maximum width for a string argument. This modifier cannot be given for non-string arguments.
- one or more 'h' characters, indicating shorter input data.
- one or more 'l' characters, indicating longer input data.
- a 'z' character, indicating a 'size_t' argument.
- a 't' character, indicating a 'ptrdiff_t' argument.
- a ' ' character, indicating a space should be emitted before positive numbers.
- a '+' character, indicating sign should emitted before any number.
Note that 'q', 'D', 'O', and 'U' are considered deprecated and will be removed eventually.
The format character is described in the following table:
Ltr |
Argument Type |
Format |
d |
int |
base 10 (decimal) |
i |
int |
base 10 (decimal) |
o |
int |
base 8 (octal) |
u |
unsigned |
base 10 (decimal) |
x |
unsigned |
base 16 (hex) |
X |
unsigned long |
base 16 (hex) |
D |
long |
base 10 (decimal) |
O |
unsigned long |
base 8 (octal) |
U |
unsigned long |
base 10 (decimal) |
e |
double |
[-]d.ddde+-dd |
E |
double |
[-]d.dddE+-dd |
f |
double |
[-]ddd.ddd |
F |
double |
[-]ddd.ddd |
g |
double |
as 'e' or 'f' |
G |
double |
as 'E' or 'F' |
a |
double |
[-]0xh.hhhp[+-]d |
A |
double |
[-]0Xh.hhhp[+-]d |
c |
unsigned char |
a character |
C |
wint_t |
a character |
s |
char * |
a UTF-8 string |
S |
wchar_t * |
a unicode/WCS string |
p |
void * |
'%#lx' |
The 'h' and 'l' modifiers affect the size and treatment of the argument:
Mod |
d, i |
o, u, x, X |
hh |
signed char |
unsigned char |
h |
short |
unsigned short |
l |
long |
unsigned long |
ll |
long long |
unsigned long long |
j |
intmax_t |
uintmax_t |
t |
ptrdiff_t |
ptrdiff_t |
z |
size_t |
size_t |
q |
quad_t |
u_quad_t |
For strings, the 'h' and 'l' modifiers affect the interpretation of the bytes pointed to argument. The default '%s' string is a 'char *' pointer to a string encoded as UTF-8. Since UTF-8 is compatible with ASCII data, a normal 7-bit ASCII string can be used. '%ls' expects a 'wchar_t *' pointer to a wide-character string, encoded as a 32-bit Unicode values. '%hs' expects a 'char *' pointer to a multi-byte string encoded with the current locale, as given by the LC_CTYPE, LANG, or LC_ALL environment varibles. The first of this list of variables is used and if none of the variables are set, the locale defaults to "UTF‑8".
libxo will convert these arguments as needed to either UTF-8 (for XML, JSON, and HTML styles) or locale-based strings for display in text style.
xo_emit("All strings are utf-8 content {:tag/%ls}",
L"except for wide strings");
"%S" is equivalent to "%ls".
Format |
Argument Type |
Argument Contents |
%s |
const char * |
UTF-8 string |
%S |
const char * |
UTF-8 string (alias for '%s') |
%ls |
const wchar_t * |
Wide character UNICODE string |
%hs |
const char * |
locale-based string |
For example, a function is passed a locale-base name, a hat size, and a time value. The hat size is formatted in a UTF-8 (ASCII) string, and the time value is formatted into a wchar_t string.
void print_order (const char *name, int size,
struct tm *timep) {
char buf[32];
const char *size_val = "unknown";
if (size > 0)
snprintf(buf, sizeof(buf), "%d", size);
size_val = buf;
}
wchar_t when[32];
wcsftime(when, sizeof(when), L"%d%b%y", timep);
xo_emit("The hat for {:name/%hs} is {:size/%s}.\n",
name, size_val);
xo_emit("It was ordered on {:order-time/%ls}.\n",
when);
}
It is important to note that xo_emit will perform the conversion required to make appropriate output. Text style output uses the current locale (as described above), while XML, JSON, and HTML use UTF-8.
UTF-8 and locale-encoded strings can use multiple bytes to encode one column of data. The traditional "precision'" (aka "max‑width") value for "%s" printf formatting becomes overloaded since it specifies both the number of bytes that can be safely referenced and the maximum number of columns to emit. xo_emit uses the precision as the former, and adds a third value for specifying the maximum number of columns.
In this example, the name field is printed with a minimum of 3 columns and a maximum of 6. Up to ten bytes of data at the location given by 'name' are in used in filling those columns.
xo_emit("{:name/%3.10.6s}", name);
Characters in the format string that are not part of a field definition are copied to the output for the TEXT style, and are ignored for the JSON and XML styles. For HTML, these characters are placed in a <div> with class "text".
EXAMPLE:
xo_emit("The hat is {:size/%s}.\n", size_val);
TEXT:
The hat is extra small.
XML:
<size>extra small</size>
JSON:
"size": "extra small"
HTML:
<div class="text">The hat is </div>
<div class="data" data-tag="size">extra small</div>
<div class="text">.</div>
libxo supports the '%m' directive, which formats the error message associated with the current value of "errno". It is the equivalent of "%s" with the argument strerror(errno).
xo_emit("{:filename} cannot be opened: {:error/%m}", filename);
xo_emit("{:filename} cannot be opened: {:error/%s}",
filename, strerror(errno));
libxo does not support the '%n' directive. It's a bad idea and we just don't do it.
The "eformat" string is the format string used when encoding the field for JSON and XML. If not provided, it defaults to the primary format with any minimum width removed. If the primary is not given, both default to "%s".
For padding and labels, the content string is considered the content, unless a format is given.
Many compilers and tool chains support validation of printf-like arguments. When the format string fails to match the argument list, a warning is generated. This is a valuable feature and while the formatting strings for libxo differ considerably from printf, many of these checks can still provide build-time protection against bugs.
libxo provide variants of functions that provide this ability, if the "‑‑enable‑printflike" option is passed to the "configure" script. These functions use the "_p" suffix, like "xo_emit_p()", xo_emit_hp()", etc.
The following are features of libxo formatting strings that are incompatible with printf-like testing:
- implicit formats, where "{:tag}" has an implicit "%s";
- the "max" parameter for strings, where "{:tag/%4.10.6s}" means up to ten bytes of data can be inspected to fill a minimum of 4 columns and a maximum of 6;
- percent signs in strings, where "{:filled}%" makes a single, trailing percent sign;
- the "l" and "h" modifiers for strings, where "{:tag/%hs}" means locale-based string and "{:tag/%ls}" means a wide character string;
- distinct encoding formats, where "{:tag/#%s/%s}" means the display styles (text and HTML) will use "#%s" where other styles use "%s";
If none of these features are in use by your code, then using the "_p" variants might be wise.
Function |
printf-like Equivalent |
xo_emit_hv |
xo_emit_hvp |
xo_emit_h |
xo_emit_hp |
xo_emit |
xo_emit_p |
xo_emit_warn_hcv |
xo_emit_warn_hcvp |
xo_emit_warn_hc |
xo_emit_warn_hcp |
xo_emit_warn_c |
xo_emit_warn_cp |
xo_emit_warn |
xo_emit_warn_p |
xo_emit_warnx_ |
xo_emit_warnx_p |
xo_emit_err |
xo_emit_err_p |
xo_emit_errx |
xo_emit_errx_p |
xo_emit_errc |
xo_emit_errc_p |
libxo can retain the parsed internal information related to the given format string, allowing subsequent xo_emit calls, the retained information is used, avoiding repetitive parsing of the format string.
SYNTAX:
int xo_emit_f(xo_emit_flags_t flags, const char fmt, ...);
EXAMPLE:
xo_emit_f(XOEF_RETAIN, "{:some/%02d}{:thing/%-6s}{:fancy}\n",
some, thing, fancy);
To retain parsed format information, use the XOEF_RETAIN flag to the xo_emit_f() function. A complete set of xo_emit_f functions exist to match all the xo_emit function signatures (with handles, varadic argument, and printf-like flags):
Function |
Flags Equivalent |
xo_emit_hv |
xo_emit_hvf |
xo_emit_h |
xo_emit_hf |
xo_emit |
xo_emit_f |
xo_emit_hvp |
xo_emit_hvfp |
xo_emit_hp |
xo_emit_hfp |
xo_emit_p |
xo_emit_fp |
The format string must be immutable across multiple calls to xo_emit_f(), since the library retains the string. Typically this is done by using static constant strings, such as string literals. If the string is not immutable, the XOEF_RETAIN flag must not be used.
The functions xo_retain_clear() and xo_retain_clear_all() release internal information on either a single format string or all format strings, respectively. Neither is required, but the library will retain this information until it is cleared or the process exits.
const char *fmt = "{:name} {:count/%d}\n";
for (i = 0; i < 1000; i++) {
xo_open_instance("item");
xo_emit_f(XOEF_RETAIN, fmt, name[i], count[i]);
}
xo_retain_clear(fmt);
The retained information is kept as thread-specific data.
In this example, the value for the number of items in stock is emitted:
xo_emit("{P: }{Lwc:In stock}{:in-stock/%u}\n",
instock);
This call will generate the following output:
TEXT:
In stock: 144
XML:
<in-stock>144</in-stock>
JSON:
"in-stock": 144,
HTML:
<div class="line">
<div class="padding"> </div>
<div class="label">In stock</div>
<div class="decoration">:</div>
<div class="padding"> </div>
<div class="data" data-tag="in-stock">144</div>
</div>
Clearly HTML wins the verbosity award, and this output does not include XOF_XPATH or XOF_INFO data, which would expand the penultimate line to:
<div class="data" data-tag="in-stock"
data-xpath="/top/data/item/in-stock"
data-type="number"
data-help="Number of items in stock">144</div>
For XML and JSON, individual fields appear inside hierarchies which provide context and meaning to the fields. Unfortunately, these encoding have a basic disconnect between how lists is similar objects are represented.
XML encodes lists as set of sequential elements:
<user>phil</user>
<user>pallavi</user>
<user>sjg</user>
JSON encodes lists using a single name and square brackets:
"user": [ "phil", "pallavi", "sjg" ]
This means libxo needs three distinct indications of hierarchy: one for containers of hierarchy appear only once for any specific parent, one for lists, and one for each item in a list.
Section Contents:
A "container" is an element of a hierarchy that appears only once under any specific parent. The container has no value, but serves to contain other nodes.
To open a container, call xo_open_container() or xo_open_container_h(). The former uses the default handle and the latter accepts a specific handle.
int xo_open_container_h (xo_handle_t *xop, const char *name);
int xo_open_container (const char *name);
To close a level, use the xo_close_container() or xo_close_container_h() functions:
int xo_close_container_h (xo_handle_t *xop, const char *name);
int xo_close_container (const char *name);
Each open call must have a matching close call. If the XOF_WARN flag is set and the name given does not match the name of the currently open container, a warning will be generated.
Example:
xo_open_container("top");
xo_open_container("system");
xo_emit("{:host-name/%s%s%s", hostname,
domainname ? "." : "", domainname ?: "");
xo_close_container("system");
xo_close_container("top");
Sample Output:
Text:
my-host.example.org
XML:
<top>
<system>
<host-name>my-host.example.org</host-name>
</system>
</top>
JSON:
"top" : {
"system" : {
"host-name": "my-host.example.org"
}
}
HTML:
<div class="data"
data-tag="host-name">my-host.example.org</div>
A list is set of one or more instances that appear under the same parent. The instances contain details about a specific object. One can think of instances as objects or records. A call is needed to open and close the list, while a distinct call is needed to open and close each instance of the list:
xo_open_list("item");
for (ip = list; ip->i_title; ip++) {
xo_open_instance("item");
xo_emit("{L:Item} '{:name/%s}':\n", ip->i_title);
xo_close_instance("item");
}
xo_close_list("item");
Getting the list and instance calls correct is critical to the proper generation of XML and JSON data.
Some users may find tracking the names of open containers, lists, and instances inconvenient. libxo offers a "Do The Right Thing" mode, where libxo will track the names of open containers, lists, and instances so the close function can be called without a name. To enable DTRT mode, turn on the XOF_DTRT flag prior to making any other libxo output.
xo_set_flags(NULL, XOF_DTRT);
Each open and close function has a version with the suffix "_d", which will close the open container, list, or instance:
xo_open_container("top");
...
xo_close_container_d();
This also works for lists and instances:
xo_open_list("item");
for (...) {
xo_open_instance("item");
xo_emit(...);
xo_close_instance_d();
}
xo_close_list_d();
Note that the XOF_WARN flag will also cause libxo to track open containers, lists, and instances. A warning is generated when the name given to the close function and the name recorded do not match.
Markers are used to protect and restore the state of open constructs. While a marker is open, no other open constructs can be closed. When a marker is closed, all constructs open since the marker was opened will be closed.
Markers use names which are not user-visible, allowing the caller to choose appropriate internal names.
In this example, the code whiffles through a list of fish, calling a function to emit details about each fish. The marker "fish‑guts" is used to ensure that any constructs opened by the function are closed properly.
for (i = 0; fish[i]; i++) {
xo_open_instance("fish");
xo_open_marker("fish-guts");
dump_fish_details(i);
xo_close_marker("fish-guts");
}