Generic Object Formatting
This appendix explains how arbitrary objects can be formatted into strings. This description is divided into two sections. The first section, Summary, covers the basic formatting concepts. This section is intended for users who want to get started quickly and are familiar with formatted printing in programming languages (printf()
in C language or Java Formatter
class). The second section, Details, covers the specific implementation details. It is intended for users who want more precise specification of formatting behavior.
Summary
This section is intended to provide a brief overview of formatting concepts. For precise behavioral details, refer to the Details section.
Format String Syntax
Every method which produces formatted output requires a format string and an argument list. The format string is a String which may contain fixed text and one or more embedded format specifiers. Consider the following example:
format("Duke's Birthday: %1$tm %1$te,%1$tY", d);
This format string is the first argument to the format method. It contains three format specifiers %1$tm
, %1$te
, and %1$tY
which indicate how the arguments should be processed and where they should be inserted in the text. The remaining portions of the format string are fixed text including "Dukes Birthday: " and any other spaces or punctuation. The argument list consists of all arguments passed to the method after the format string. In the above example, the argument list is of size one and consists of the Date d
.
- The format specifiers for general, character, and numeric types have the following syntax:
%[argument_index$][flags][width][.precision]conversion
The optional argument_index
is a decimal integer indicating the position of the argument in the argument list. The first argument is referenced by 1$
, the second by 2$
, etc.
The optional flags
is a set of characters that modify the output format. The set of valid flags depends on the conversion.
The optional width
is a non-negative decimal integer indicating the minimum number of characters to be written to the output.
The optional precision
is a non-negative decimal integer usually used to restrict the number of characters. The specific behavior depends on the conversion.
The required conversion
is a character indicating how the argument should be formatted. The set of valid conversions for a given argument depends on the argument's data type.
- The format specifiers for types which are used to represents dates and times have the following syntax:
%[argument_index$][flags][width]conversion
The optional argument_index
, flags and width are defined as above.
The required conversion
is a two character sequence. The first character is t
or T
. The second character indicates the format to be used.
- The format specifiers which do not correspond to arguments have the following syntax:
%[flags][width]conversion
The optional flags
and width
is defined as above.
The required conversion
is a character indicating content to be inserted in the output.
Conversions
Conversions are divided into the following categories:
1. General - may be applied to any argument type.
2. Character - may be applied to basic types which represent Unicode characters.
3. Numeric:
- 3.1. Integral - may be applied to integral types: Integer or Long.
- 3.2. Floating Point - may be applied to floating-point types: Float or Double.
4. Date/Time - may be applied to Date type.
5. Percent - produces a literal '%' ('\u0025')
6. Line Separator - produces the platform-specific line separator.
The following table summarizes the supported conversions. Conversions denoted by an upper-case character (i.e. 'B', 'H', 'S', 'C', 'X', 'E', 'G', 'A', and 'T')
are the same as those for the corresponding lower-case conversion characters except that the result is converted to upper case according to the rules of the prevailing locale.
Conversion | Argument Category | Description |
| general | If the argument arg is null, then the result is "false". If arg is a boolean or Boolean, then the result is the string. Otherwise, the result is "true". |
| general | If the argument arg is null, then the result is "null". Otherwise, the result is obtained by converting a number to hex string. |
| general | If the argument arg is null, then the result is "null". Otherwise, the arg is converted to a String. |
| character | The result is a Unicode character. |
| integral | The result is formatted as a decimal integer. |
| integral | The result is formatted as an octal integer. |
| integral | The result is formatted as a hexadecimal integer. |
| floating point | The result is formatted as a decimal number in computerized scientific notation. |
| floating point | The result is formatted as a decimal number. |
| floating point | The result is formatted using computerized scientific notation or decimal format, depending on the precision and the value after rounding. |
| floating point | The result is formatted as a hexadecimal floating-point number with a significand and an exponent. |
| date/time | Prefix for date and time conversion characters. See Date/Time Conversions. |
| percent | The result is a literal |
| line separator | The result is the platform-specific line separator. |
Any characters not explicitly defined as conversions are illegal and are reserved for future extensions.
Date/Time Conversions
The following date and time conversion suffix characters are defined for the 't'
and 'T'
conversions.
The following conversion characters are used for formatting times:
| Hour of the day for the 24-hour clock, formatted as two digits with a leading zero as necessary i.e. 00 - 23. |
| Hour for the 12-hour clock, formatted as two digits with a leading zero as necessary, i.e. 01 - 12. |
| Hour of the day for the 24-hour clock, i.e. 0 - 23. |
| Hour for the 12-hour clock, i.e. 1 - 12. |
| Minute within the hour formatted as two digits with a leading zero as necessary, i.e. 00 - 59. |
| Seconds within the minute, formatted as two digits with a leading zero as necessary, i.e. 00 - 60 ("60" is a special value required to support leap seconds). |
| Millisecond within the second formatted as three digits with leading zeros as necessary, i.e. 000 - 999. |
| Nanosecond within the second, formatted as nine digits with leading zeros as necessary, i.e. 000000000 - 999999999. |
| Locale-specific morning or afternoon marker in lower case, e.g."am" or "pm". Use of the conversion prefix 'T' forces this output to upper case. |
| RFC 822 style numeric time zone offset from GMT, e.g. -0800. |
| A string representing the abbreviation for the time zone. The Formatter's locale will supersede the locale of the argument (if any). |
| Seconds since the beginning of the epoch starting at 1 January 1970 00:00:00 UTC, i.e. min signed 64-bit integer divided by 1000 to max signed 64-bit integer divided by 1000. |
| Milliseconds since the beginning of the epoch starting at 1 January 1970 00:00:00 UTC, i.e. min signed 64-bit integer divided by 1000 to max signed 64-bit integer divided by 1000. |
The following conversion characters are used for formatting dates:
| Locale-specific full month name, e.g. "January", "February". |
| Locale-specific abbreviated month name, e.g. "Jan", "Feb". |
| Same as |
| Locale-specific full name of the day of the week, e.g. "Sunday", "Monday" |
| Locale-specific short name of the day of the week, e.g. "Sun", "Mon" |
| Four-digit year divided by 100, formatted as two digits with leading zero as necessary, i.e. 00 - 99. |
| Year, formatted as at least four digits with leading zeros as necessary, e.g. 0092 equals 92 CE for the Gregorian calendar. |
| Last two digits of the year, formatted with leading zeros as necessary, i.e. 00 - 99. |
| Day of year, formatted as three digits with leading zeros as necessary, e.g. 001 - 366 for the Gregorian calendar. |
| Month, formatted as two digits with leading zeros as necessary, i.e. 01 - 13. |
| Day of month, formatted as two digits with leading zeros as necessary, i.e. 01 - 31. |
| Day of month, formatted as two digits, i.e. 1 - 31. |
The following conversion characters are used for formatting common date/time compositions.
| Time formatted for the 24-hour clock as |
| Time formatted for the 24-hour clock as |
| Time formatted for the 12-hour clock as |
| Date formatted as |
| ISO 8601 complete date formatted as |
| Date and time formatted as |
Any characters not explicitly defined as date/time conversion suffixes are illegal and are reserved for future extensions.
Flags
The following table summarizes the supported flags. y means the flag is supported for the indicated argument types.
Flag | General | Character | Integral | Floating Point | Date/Time | Description |
|
|
|
|
|
| The result will be left-justified. |
|
|
|
|
|
| The result should use a conversion-dependent alternate form. |
|
|
|
|
|
| The result will always include a sign. |
|
|
|
|
|
| The result will include a leading space for positive values. |
|
|
|
|
|
| The result will be zero-padded. |
|
|
|
|
|
| The result will include locale-specific grouping separators. |
|
|
|
|
|
| The result will enclose negative numbers in parentheses. |
1 Depends on the definition.
2 For 'd'
conversion only.
3 For 'o'
, 'x'
, and 'X'
conversions only.
4 For 'd'
applied to Integer or Long.
5 For 'e'
, 'E'
, 'f'
, 'g'
, and 'G'
conversions only.
Any characters not explicitly defined as flags are illegal and are reserved for future extensions.
Width
The width is the minimum number of characters to be written to the output. For the line separator conversion, width is not applicable; if it is provided, an exception will be thrown.
Precision
For general argument types, the precision is the maximum number of characters to be written to the output.
For the floating-point conversions 'e'
, 'E'
, and 'f'
the precision is the number of digits after the decimal separator. If the conversion is 'g'
or 'G'
, then the precision is the total number of digits in the resulting magnitude after rounding. If the conversion is 'a'
or 'A'
, then the precision must not be specified.
For character, integral, and date/time argument types and the percent and line separator conversions, the precision is not applicable; if a precision is provided, an exception will be thrown.
Argument Index
The argument index is a decimal integer indicating the position of the argument in the argument list. The first argument is referenced by "1$"
, the second by "2$", etc.
Another way to reference arguments by position is to use the '<' ('\u003c')
flag, which causes the argument for the previous format specifier to be re-used. For example, the following two statements would produce identical strings:
format("Duke's Birthday: %1$tm %1$te,%1$tY", d)
format("Duke's Birthday: %1$tm %<te,%<tY", d)
Details
This section is intended to provide behavioral details for formatting, including conditions and exceptions, supported data types, localization, and interactions between flags, conversions, and data types. For an overview of formatting concepts, refer to the Summary.
Any characters not explicitly defined as conversions, date/time conversion suffixes, or flags are illegal and are reserved for future extensions.
If the format specifier contains a width or precision with an invalid value or which is otherwise unsupported, then formatting will fail.
If a format specifier contains a conversion character that is not applicable to the corresponding argument, then formatting will fail.
General
The following general conversions may be applied to any argument type:
|
| Produces either If the argument is null, then the result is |
|
| The upper-case variant of |
|
| Produces a string representing the hash code value of the object. |
|
| The upper-case variant of |
|
| Produces a string. If the argument is null, then the result is |
|
| The upper-case variant of |
The following flags apply to general conversions:
|
| Left justifies the output. Spaces |
|
| Requires the output use an alternate form. The definition of the form is specified by the conversion. |
The width is the minimum number of characters to be written to the output. If the length of the converted value is less than the width then the output will be padded by ' '
(\u0020')
until the total number of characters equals the width. The padding is on the left by default. If the '-'
flag is given, then the padding will be on the right. If the width is not specified then there is no minimum.
The precision is the maximum number of characters to be written to the output. The precision is applied before the width, thus the output will be truncated to precision characters even if the width is greater than the precision. If the precision is not specified then there is no explicit limit on the number of characters.
Numeric
Numeric conversions are divided into the following categories:
- Integer and Long
- Float and Double
Numeric types will be formatted according to the following algorithm:
Number Localization Algorithm
After digits are obtained for the integer part, fractional part, and exponent (as appropriate for the data type), the following transformation is applied:
- Each digit character d in the string is replaced by a locale-specific digit computed relative to the current locale's zero digitz; that is d -
'0'
+z
. - If a decimal separator is present, a locale-specific decimal separator is substituted.
- If the
','
('\u002c')
flag is given, then the locale-specific grouping separator is inserted by scanning the integer part of the string from least significant to most significant digits and inserting a separator at intervals defined by the locale's grouping size. - If the
'0'
flag is given, then the locale-specific zero digits are inserted after the sign character, if any, and before the first non-zero digit, until the length of the string is equal to the requested field width. - If the value is negative and the
'('
flag is given, then a'('
('\u0028')
is prepended and a')'
('\u0029')
is appended. - If the value is negative (or floating-point negative zero) and
'('
flag is not given, then a'-'
('\u002d')
is prepended. - If the
'+'
flag is given and the value is positive or zero (or floating-point positive zero), then a'+'
('\u002b')
will be prepended.
If the value is NaN or positive infinity the literal strings "NaN"
or "Infinity"
respectively, will be output. If the value is negative infinity, then the output will be "(Infinity)"
if the '('
flag is given otherwise the output will be "-Infinity"
. These values are not localized.
Integer and Long
The following conversions may be applied to Integer and Long values.
|
| Formats the argument as a decimal integer. The localization algorithm is applied. If the |
|
| Formats the argument as an integer in base eight. No localization is applied. If If the If the |
|
| Formats the argument as an integer in base sixteen. No localization is applied. If If the If the |
|
| The upper-case variant of |
If the conversion is 'o'
, 'x'
, or 'X'
and both the '#'
and the '0'
flags are given, then result will contain the radix indicator ('0'
for octal and "0x"
or "0X"
for hexadecimal), some number of zeros (based on the width), and the value.
If the '-'
flag is not given, then the space padding will occur before the sign.
The following flags apply to numeric integral conversions:
|
| Requires the output to include a positive sign for all positive numbers. If this flag is not given then only negative values will include a sign. |
|
| Requires the output to include a single extra space |
|
| Requires the output to be padded with leading zeros to the minimum field width following any sign or radix indicator except when converting NaN or infinity. |
|
| Requires the output to include the locale-specific group separators as described in the |
|
| Requires the output to prepend a |
If no flags are given the default formatting is as follows:
- The output is right-justified within the width
- Negative numbers begin with a
'-'
('\u002d')
- Positive numbers and zero do not include a sign or extra leading space
- No grouping separators are included
The width is the minimum number of characters to be written to the output. This includes any signs, digits, grouping separators, radix indicator, and parentheses. If the length of the converted value is less than the width then the output will be padded by spaces ('\u0020')
until the total number of characters equals width. The padding is on the left by default. If '-'
flag is given then the padding will be on the right. If width is not specified then there is no minimum.
The precision is not applicable.
Float and Double
The following conversions may be applied to Float and Double values.
|
| Requires the output to be formatted using computerized scientific notation. The localization algorithm is applied. The formatting of the magnitude m depends upon its value. If m is NaN or infinite, the literal strings If m is positive-zero or negative-zero, then the exponent will be Otherwise, the result is a string that represents the sign and magnitude (absolute value) of the argument. The formatting of the sign is described in the localization algorithm. The formatting of the magnitude m depends upon its value. Let n be the unique integer such that The number of digits in the result for the fractional part of m or a is equal to the precision. If the precision is not specified then the default value is 6. If the precision is less than the number of digits which would appear after the decimal point in the string, then the value will be rounded using the round half up algorithm. Otherwise, zeros may be appended to reach the precision. |
|
| The upper-case variant of |
|
| Requires the output to be formatted in general scientific notation as described below. The localization algorithm is applied. After rounding for the precision, the formatting of the resulting magnitude m depends on its value. If m is greater than or equal to 10-4 but less than 10precision then it is represented in decimal format. If m is less than 10-4 or greater than or equal to 10precision, then it is represented in computerized scientific notation. The total number of significant digits in m is equal to the precision. If the precision is not specified, then the default value is 6. If the precision is 0, then it is taken to be 1. |
|
| The upper-case variant of |
|
| Requires the output to be formatted using decimal format. The localization algorithm is applied. The result is a string that represents the sign and magnitude (absolute value) of the argument. The formatting of the sign is described in the localization algorithm. The formatting of the magnitude m depends upon its value. If m NaN or infinite, the literal strings The magnitude is formatted as the integer part of m, with no leading zeroes, followed by the decimal separator followed by one or more decimal digits representing the fractional part of m. The number of digits in the result for the fractional part of m or a is equal to the precision. If the precision is not specified then the default value is 6. If the precision is less than the number of digits which would appear after the decimal point in the string, then the value will be rounded using the round half up algorithm. Otherwise, zeros may be appended to reach the precision. |
|
| Requires the output to be formatted in hexadecimal exponential form. No localization is applied. The result is a string that represents the sign and magnitude (absolute value) of the argument x. If x is negative or a negative-zero value then the result will begin with If x is positive or a positive-zero value and the The formatting of the magnitude m depends upon its value.
|
|
| The upper-case variant of |
All flags defined for Integer and Long apply.
If the '#'
flag is given, then the decimal separator will always be present.
If no flags are given the default formatting is as follows:
- The output is right-justified within the width
- Negative numbers begin with a
'-'
- Positive numbers and positive zero do not include a sign or extra leading space
- No grouping separators are included
- The decimal separator will only appear if a digit follows it
The width is the minimum number of characters to be written to the output. This includes any signs, digits, grouping separators, decimal separators, exponential symbol, radix indicator, parentheses, and strings representing infinity and NaN as applicable. If the length of the converted value is less than the width then the output will be padded by spaces ('\u0020')
until the total number of characters equals width. The padding is on the left by default. If the '-'
flag is given then the padding will be on the right. If width is not specified then there is no minimum.
If the conversion is 'e'
, 'E'
or 'f'
, then the precision is the number of digits after the decimal separator. If the precision is not specified, then it is assumed to be 6.
If the conversion is 'g'
or 'G'
, then the precision is the total number of significant digits in the resulting magnitude after rounding. If the precision is not specified, then the default value is 6. If the precision is 0, then it is taken to be 1.
If the conversion is 'a'
or 'A'
, then the precision is the number of hexadecimal digits after the decimal separator. If the precision is not provided, then all of the digits will be output.
Date/Time
This conversion may be applied to Long and Date.
|
| Prefix for date and time conversion characters. |
|
| The upper-case variant of |
The following date and time conversion character suffixes are defined for the 't'
and 'T'
conversions.
The following conversion characters are used for formatting times:
|
| Hour of the day for the 24-hour clock, formatted as two digits with a leading zero as necessary i.e. 00 - 23. 00 corresponds to midnight. |
|
| Hour for the 12-hour clock, formatted as two digits with a leading zero as necessary, i.e. 01 - 12. 01 corresponds to one o'clock (either morning or afternoon). |
|
| Hour of the day for the 24-hour clock, i.e. 0 - 23. 0 corresponds to midnight. |
|
| Hour for the 12-hour clock, i.e. 1 - 12. 1 corresponds to one o'clock (either morning or afternoon). |
|
| Minute within the hour formatted as two digits with a leading zero as necessary, i.e. 00 - 59. |
|
| Seconds within the minute, formatted as two digits with a leading zero as necessary, i.e. 00 - 60 ( |
|
| Millisecond within the second formatted as three digits with leading zeros as necessary, i.e. 000 - 999. |
|
| Nanosecond within the second, formatted as nine digits with leading zeros as necessary, i.e. 000000000 - 999999999. The precision of this value is limited by the resolution of the underlying operating system or hardware. |
|
| Locale-specific morning or afternoon marker in lower case, e.g. |
|
| RFC 822 style numeric time zone offset from GMT, e.g. -0800. |
|
| A string representing the abbreviation for the time zone. |
|
| Seconds since the beginning of the epoch starting at 1 January 1970 00:00:00 UTC, i.e. min signed 64-bit integer divided by 1000 to max signed 64-bit integer divided by 1000. |
|
| Milliseconds since the beginning of the epoch starting at 1 January 1970 00:00:00 UTC, i.e. min signed 64-bit integer divided by 1000 to max signed 64-bit integer divided by 1000. The precision of this value is limited by the resolution of the underlying operating system or hardware. |
The following conversion characters are used for formatting dates:
|
| Locale-specific full month name, e.g. "January", "February". |
|
| Locale-specific abbreviated month name, e.g. "Jan", "Feb". |
|
| Same as |
|
| Locale-specific full name of the day of the week, e.g. "Sunday", "Monday" |
|
| Locale-specific short name of the day of the week, e.g. "Sun", "Mon" |
|
| Four-digit year divided by 100, formatted as two digits with leading zero as necessary, i.e. 00 - 99 |
|
| Year, formatted to at least four digits with leading zeros as necessary, e.g. 0092 equals 92 CE for the Gregorian calendar. |
|
| Last two digits of the year, formatted with leading zeros as necessary, i.e. 00 - 99. |
|
| Day of year, formatted as three digits with leading zeros as necessary, e.g. 001 - 366 for the Gregorian calendar. 001 corresponds to the first day of the year. |
|
| Month, formatted as two digits with leading zeros as necessary, i.e. 01 - 13, where "01" is the first month of the year and ("13" is a special value required to support lunar calendars). |
|
| Day of month, formatted as two digits with leading zeros as necessary, i.e. 01 - 31, where "01" is the first day of the month. |
|
| Day of month, formatted as two digits, i.e. 1 - 31 where "1" is the first day of the month. |
The following conversion characters are used for formatting common date/time compositions.
|
| Time formatted for the 24-hour clock as |
|
| Time formatted for the 24-hour clock as |
|
| Time formatted for the 12-hour clock as |
|
| Date formatted as |
|
| ISO 8601 complete date formatted as |
|
| Date and time formatted as |
The '-'
flag defined for General conversions applies. If the '#'
flag is given, then formatting will fail.
The width is the minimum number of characters to be written to the output. If the length of the converted value is less than the width then the output will be padded by spaces ('\u0020')
until the total number of characters equals width. The padding is on the left by default. If the '-'
flag is given then the padding will be on the right. If width is not specified then there is no minimum.
The precision is not applicable. If the precision is specified then formatting will fail.
Percent
The conversion does not correspond to any argument.
| The result is a literal The width is the minimum number of characters to be written to the output including the The The precision is not applicable. |
Line Separator
The conversion does not correspond to any argument.
| The platform-specific line separator. |
Flags, width, and precision are not applicable.
Argument Index
Format specifiers can reference arguments in three ways:
- Explicit indexing is used when the format specifier contains an argument index. The argument index is a decimal integer indicating the position of the argument in the argument list. The first argument is referenced by
"1$"
, the second by"2$"
, etc. An argument may be referenced more than once.
For example:
format("%4$s %3$s %2$s %1$s %4$s %3$s %2$s %1$s", "a", "b", "c", "d")
results to:
d c b a d c b a
Relative indexing is used when the format specifier contains a '<'
('\u003c')
flag which causes the argument for the previous format specifier to be re-used. If there is no previous argument, then formatting will fail.
format("%s %s %<s %<s", "a", "b", "c", "d")
results to:
a b b b
Parameters "c"
and "d"
are ignored because they are not referenced.
- Ordinary indexing is used when the format specifier contains neither an argument index nor a
'<'
flag. Each format specifier which uses ordinary indexing is assigned a sequential implicit index into argument list which is independent of the indices used by explicit or relative indexing.
format("%s %s %s %s", "a", "b", "c", "d")
results to:
a b c d
It is possible to have a format string which uses all forms of indexing, for example:
format("%2$s %s %<s %s", "a", "b", "c", "d")
results to:
b a a b
Parameters "c"
and "d"
are ignored because they are not referenced.
If the argument index is does not correspond to an available argument, then formatting will fail.
If there are more arguments than format specifiers, the extra arguments are ignored.
Was this page helpful?