Відкрити головне меню

unimarc.org.ua β

Зміни

Додаток J: Набори символів

17 921 байт додано, 08:48, 5 серпня 2019
Див. також
''==Додаток J: Монографічні ресурси в декількох частинах''Набори символів==Appendix J: Character Sets
''Appendix ==J: Character Sets''.1 Introduction==UNIMARC records may be encoded using either 7-bit or 8-bit character code values. The specifications for identifying and using various character sets are described in the following sections of this appendix; they are in conformance with those contained in ISO 2022. That standard should also be consulted.
???UNIMARC records may also be encoded using 16-bit character code values. See J.6 ISO 10646 character set.==J.2 Framework==A matrix for all character codes possible with 7‑bits is constructed as illustrated. Bits 7‑5 are represented by the columns, and bits 4‑1 by the rows. The ISO method of numbering is used, e.g. 7/15 not 7F for DEL.   columns        rows 01 2345670    SP     1          2 32   94 graphic characters  control       . functions       .          .          .          15         DEL7‑bit Code Matrix A 7‑bit code set accommodates 32 control functions, 94 graphic characters, SPACE, and DELETE. The individual characters are commonly referred to by their column and row position in the matrix using the notation 'c/r', thus the SPACE character is 2/0. Code values are assigned according to the following rules. The first two columns of a code matrix are reserved for system control functions; columns 2‑7 are for graphic characters. The two corner codes of the graphic columns are reserved for SPACE and DELETE characters. Data may also be encoded using 8‑bits per character, in which case the number of possible codes doubles, hence the code matrix doubles. Bits 8‑5 are represented by the column and bits 4‑1 by the rows. The 8‑bit matrix has four parts which are specified for control functions and graphic characters as illustrated.                      00 01 020304050607 08 09 1011121314150   SP              1                  2 32 94 graphic characters 32 94 graphic characters. control        control       . functions        functions       .                  .                  .                   15        DEL                            8-bit Code Matrix The additional bit is the left-most bit and it is 0 for a left-hand part and 1 for a right-hand part. Graphic sets may be represented by either one 7 or 8 bit combination per character or, where there are a large number of characters in the set, by multiple 7 or 8 bit combinations per character. Use of code sets require first the designation of the sets, then the invocation of a designated set as the working set. For both 7-bit and 8-bit codes, two sets of control functions and four graphic character sets may be designated at any given time. These designated sets are called the C0, C1 and G0, G1, G2, G3 sets. In 7-bits, two Cn sets and one Gn set may have invoked, working set status at a given time. In 8-bits, two Cn and two Gn sets may be in an invoked, working set, status at a given time. The following appendix sections specify the designation and invocation of code sets in UNIMARC. ==J.3 Control Function Sets== The C0 and C1 control function sets are fixed for UNIMARC. Thus they do not need to be designated and invoked in the record.The C0 set is the set of 32 control functions defined in ISO 646. This set contains the basic transmission controls and the subfield delimiter, field terminator, and record terminator. The C1 set is the set of control functions defined in ISO 6630, Bibliographic Control Characters. Only the NSB 'Non-sorting character(s) beginning', NSE 'Non‑sorting character(s) ending', PLD 'Partial Line Down' and PLU 'Partial Line Up' functions from that set are currently allowed in UNIMARC. In the 7-bit and 8-bit environment, the C0 set occupies columns 0 and 1 at all times. In a 7-bit record, the characters from the C1 set are represented by the two character 'ESC F' where ESC is the 1/11 control function in the C0 set and F is a bit combination from columns 4 and 5. The F bit combinations associated with each of the functions defined in ISO 6630 were assigned by ISO at the time of registration of the set and are identified for ISO 6630 in section J.7 of this appendix. Note especially that in the 7-bit environment the 'ESC F' substitutes for the code table bit combinations of the ISO 6630 functions.In an 8-bit record, the C1 set resides in columns 08 and 09, and the functions are represented by their code table bit combinations. ==J.4 Graphic Character Sets== The G0 graphic set for UNIMARC is always ISO 646. All of the characters in the RECORD LABEL, the DIRECTORY, and the coded fields/subfields are from ISO 646, as are the field indicators and subfield codes. Thus a record always begins with ISO 646 as the working set. Up to three additional graphic sets may be designated as G1, G2 and G3 in field 100, subfield $a, character positions 28-29, Character Sets, and positions 30-33, Additional Character Sets. If no more than four sets are used in a record, the field 100 information is all that is required to designate the graphic sets. The0y can then be invoked as needed. Note that since the RECORD LABEL, DIRECTORY, and coded data fields are all coded using ISO 646, the G1, G2, and G3 designations in field 100 can be accessed before any additional graphic sets are encountered in the record. ===J.4.1 7-Bit Environment=== In a 7-bit character record the four designated sets are invoked using the following ISO 2022 locking shifts: Acronym Full Name Bit Combination(s) Set InvokedSI Shift in 0/15 G0SO Shift out 0/14 G1LS2 Locking shift two ESC 6/14 G2LS3 Locking shift three ESC 6/15 G3 These shifts are locking, so the set invoked remains the working set until another set is specified by a shift function.Since the record begins with the G0 (ISO 646) set as the working set, the SI shift to the G0 set will only be used when there has been an invocation of one of the other Gn sets as the working set. The G0 (ISO 646) set must be the working set at the end of each subfield and field since the succeeding subfield codes or directory processing require ISO 646 as the working set. This shift back to the G0 (ISO 646) set should take place before the subfield delimiter or end of field mark. In 7-bits, a non-locking invocation of single characters from the designated G2 or G3 set is also possible. The following non-locking shifts are defined by ISO 2022:  Set from whichAcronym Full Name Bit Combinations Single Character InvokedSS2 Single shift two ESC 4/14 G2SS3 Single shift three ESC 4/15 G3 There is no need to reinvoke the working set after the single shifts as it is automatically reinstated after one character from the G2 or G3 set. Examples (for clarity, bit combinations are in bold) EX 1 SO SI500 11$aEdda S0/14æS0/15mundar.$mEnglish.$1Selections. In this record, the ISO 5426 Extended Latin set has been designated the G1 set and the single character 'æ' is accessed via an invocation of that set. EX 2 SS2 500 11$aEdda S1/11 4/14æmundar.$mEnglish.$1Selections. If in EX 1 ISO 5426 had been designated a G2 set, the single shift function could be used to invoke the 'æ'. EX 3 LS2 SI LS2 SI210 ##$a1/11 6/14Москва0/15$c"1/11 6/14Правда0/15"$d1968 In this record, ISO 5426 has been designated the G1 set and the basic Cyrillic set has been designated the G2 set. This field contains a Cyrillic name. Shifts into the G2 set must be made at the beginning of each subfield with shifts back into the G0 set at the end of each. ===J.4.2 8-bit Environment=== In an 8-bit code record the four designated sets are invoked using the following ISO 2022 locking shifts: Acronym Full Name Bit Combinations Set Invoked/ Into ColumnsLS0 Locking shift zero 00/15 G0/02‑07LS1 Locking shift one 00/14 G1/02‑07LS1R Locking shift one right ESC 7/14 G1/10‑15LS2 Locking shift two ESC 6/14 G2/02‑07LS2R Locking shift two right ESC 7/13 G2/10‑15LS3 Locking shift three ESC 6/15 G3/02‑07LS3R Locking shift three right ESC 7/12 G3/10‑15 These shifts are locking, so the set invoked remains the working set until another set is invoked by a shift function. Since the record begins with the G0 set (ISO 646) in columns 02‑07 and the G1 set in columns 10‑15, the shift functions to those sets will only be used when there has been an invocation of the G2 or G3 set into those columns. The G0 set must be the working set in columns 02‑07 at the end of each subfield and each field. The shift back to the G0 set when it has been temporarily displaced should occur before the subfield delimiter or end of field mark. The G1 set designated in field 100 is considered the default set for columns 10‑15; thus it should always be restored at the end of a field that has shifted another set into those columns. In 8-bits, non-locking single shifts are not used in UNIMARC. Examples (for clarity, bit combinations are in bold) EX 1: 500 11$aEdda Sæmundar.$mEnglish.$1 Selections. The ISO 5426 Extended Latin set has been designated the G1 set. No shift is required to use it in the 8-bit environment. EX 2: LS2R LS1R  500 11$aEdda S1/11 7/13æ1/11 7/14mundar.$mEnglish.$1Selections. The basic Cyrillic set has been designated the G1 set and the ISO 5426 Extended Latin set has been designated the G2 set. The G2 set is invoked to columns 10‑15 using the LS2R, displacing the default G1 set. Following the use of the G2 set, the G1 set is reinvoked into columns 10‑15. EX 3: LS2R LS1R210 #$al/11 7/13Москва$c"Правда1/11 7/14"$d1968 ISO 5426 is the default G1 set and the basic Cyrillic set has been designated the G2 set. The G2 set is invoked into columns 10‑15 when needed. Since the subfield code comes from the G0 set and it is still the column 02‑07 working set at the end of the $a subfield, no shift need take place before the '$c'. The default G1 set is restored to columns 10‑15, however, at the end of the use of the Cyrillic set in this field. EX 4: 305 ##$aВпервые иэдано в С.петерЬурге на нем. яэ. в 1770-1784 в 4-х LS2R LS1Rчастях под эаглавием "Reise durch Ru1/11 7/13ß1/11 7/14land zur Untersuchung der drey Natur-Reiche". Ч.4 на рус. яэ. не переведена Basic Latin and Basic Cyrillic are the designated G0 and G1sets, and Extended Latin the G2 set (100 $a/26-33 = 010203##). The Basic Latin and Cyrillic characters can be accessed without change to the settings. The German 'ss' character (ß) is found in the Extended Latin set, which is invoked into columns 10-15 byLS2R (ESC 7/13), temporarily displacing Basic Cyrillic. This is then restored by LS1R(ESC 7/14). ==J.5 Additional Graphic sets== In some instances more than the four graphic sets designated in field 100 may be required in a UNIMARC record. Additional sets may be substituted for the sets designated in field 100 through an escape of the form 'ESC I F'. 'I', which may be one or more characters in length, indicates the Gn designation of the set according to the following values:  Single Byte per CharacterMultiple Bytes per CharacterGn Designation  2/8 or 2/12 2/4 2/8 or 2/4 2/12 G0 2/9 or 2/13 2/4 2/9 or 2/4 2/13 G1 2/10 or 2/14 2/4 2/10 or 2/4 2/14 G2 2/11 or 2/15 2/4 2/11 or 2/4 2/15 G3 F', the Final character, indicates the graphic set being designated. It is a bit combination from columns 4 to 7 that is assigned by ISO when the set is registered. The Final characters for the sets approved for use with UNIMARC are listed below. Final characters for other approved sets have not yet been assigned. F Graphic Set4/0 ISO 646 (IRV), Basic Latin set5/0 ISO 5426‑1980, Extended Latin set4/14 ISO Registration #37, Basic Cyrillic5/1 ISO 5427-1984, Extended Cyrillic set5/3 ISO 5428‑1980, Greek set4/13 ISO 6438‑1983, African coded character set  If a fifth, etc., graphic set is needed in a UNIMARC field, it must first be designated through the escape sequence, then it may be invoked with shift functions as specified in Section J.4. When an additional set has been designated and invoked in a field, before the end of the field the original set specified in field 100 should be redesignated for the Gn via an escape sequence. When a field is exited, the G0, G1, G2, G3 designated sets must be those specified in field 100. Example (for clarity, bit combinations are alternately bold and italic) Designation of LS1R Greek set as G1454 #0$1700#0$aXenophon.$150010$a1/11 2/9 5/3 1/11 7/14'Áπομνημονευματα1/11 2/9 5/0 1/11 7/14 Redesignation of LS1R Extended Latin set as G1 set The record is for a Bulgarian translation of a Greek work and the language of cataloguing is English. The agency has designated in field 100 the following sets: G0 ISO 646, Basic LatinG1 ISO 5426, Extended LatinG2 ISO Registration #37, Basic CyrillicG3 ISO DIS 5427, Extended Cyrillic When the Greek set is needed in the 454 field to give the original title in Greek, it is designated as the G1 set via the sequence ESC 2/9 5/3 and then invoked into columns 10‑15 via the sequence ESC 7/14. Before exiting the field, the Extended Latin set is restored to the G1 designation via ESC 2/9 5/0 and it is reinvoked into columns 10‑15 via ESC 7/14. ==J.6 ISO 10646 character set== ISO 10646, being a 16-bit character set, contains all necessary characters. This will be used for the C0, C1 and all G sets. ==J.7 Character set tables== Sections J.8 through J.10 contain the code tables for some of the character sets specified for use in UNIMARC records. These character sets are reproduced with the permission of the International Organization for Standardization (ISO). Copies of the complete standards can be obtained from the ISO Central Secretariat, Case postale 56, 1211 GENEVA 20, Switzerland, and from any ISO Member Body.  ==J.8 Basic Control Set – ISO 646 (IRV)== This control set is the C0 set for UNIMARC records. The following positions are the only ones to be used in UNIMARC PositionAcronymName0/14SOShift Out0/15SIShift In1/11ESCEscape1/13IS3Information Separator Three1/14IS2Information Separator Two1/15IS1Information Separator One In this Manual, the symbols for the Information Separators are : IS1 $ (Subfield deliminator)IS2 @ (Field separator) In most examples the end of field mark is not shownIS3 % (Record terminator) ==J.9 Bibliographic Control Set – ISO 6630: 1986== This control set contains control functions required for filing, sorting, permuting, etc. It is the C1 set for UNIMARC records.The following positions are the only ones to be used in UNIMARC: PositionAcronymName08/08NSBNon-Sorting Character(s), Beginning08/09NSENon-Sorting Character(s), End08/11PLDPartial Line Down08/12PLUPartial Line Up In this Manual, the symbols for the non-sorting characters are: NSB ¹NSB¹NSE ¹NSE¹ PLU is used both to produce superscript text and to restore to the previous position subscript text created by the use of PLD. The reverse is also true, as is shown in the following example: 2³+3² is expressed as 2¹PLU¹3¹PLD¹+3¹PLU¹2¹PLD¹ J.10 Basic Latin Set – ISO 646 (IRV) This graphic set is specified in ISO 646. It is the default G0 set for UNIMARC records. PositionNamePositionName2/0Space, Blank5/0Capital Letter P2/1Exclamation Mark5/1Capital Letter Q2/2Quotation Mark5/2Capital Letter R2/3Number Sign5/3Capital Letter S2/4Dollar Sign5/4Capital Letter T2/5Per Cent Sign5/5Capital Letter U2/6Ampersand5/6Capital Letter V2/7Apostrophe5/7Capital Letter W2/8Left Parenthesis5/8Capital Letter X2/9Right Parenthesis5/9Capital Letter Y2/10Asterisk5/10Capital Letter Z2/11Plus Sign5/11Left Square Bracket2/12Comma5/12Reverse Solidus2/13Hyphen, Minus Sign5/13Right Square Bracket2/14Full Stop, Period5/14Circumflex Accent2/15Solidus5/15Underline    3/0Digit Zero6/0Grave Accent3/1Digit One6/1Small Letter a3/2Digit Two6/2Small Letter b3/3Digit Three6/3Small Letter c3/4Digit Four6/4Small Letter d3/5Digit Five6/5Small Letter e3/6Digit Six6/6Small Letter f3/7Digit Seven6/7Small Letter g3/8Digit Eight6/8Small Letter h3/9Digit Nine6/9Small Letter i3/10Colon6/10Small Letter j3/11Semi‑colon6/11Small Letter k3/12Less than Sign6/12Small Letter l3/13Equals Sign6/13Small Letter m3/14Greater than Sign6/14Small Letter n3/15Question Mark6/15Small Letter o    4/0Commercial At7/0Small Letter p4/1Capital Letter A7/1Small Letter q4/2Capital Letter B7/2Small Letter r4/3Capital Letter C7/3Small Letter s4/4Capital Letter D7/4Small Letter t4/5Capital Letter E7/5Small Letter u4/6Capital Letter F7/6Small Letter v4/7Capital Letter G7/7Small Letter w4/8Capital Letter H7/8Small Letter x4/9Capital Letter I7/9Small Letter y4/10Capital Letter J7/10Small Letter z4/11Capital Letter K7/11Left Curly Bracket4/12Capital Letter L7/12Vertical Line4/13Capital Letter M7/13Right Curly Bracket4/14Capital Letter N7/14Tilde4/15Capital Letter O  N.B. If this set is used in combination with ISO 5426 positions 5/15, 6/0 and 7/14 in ISO 646 should not be used. Positions 5/8, 4/1 and 4/5 in ISO 5426 should be used instead.
== Див. також ==
* [[Додатки UNIMARC]]
* [httphttps://homeunimarc.izumorg.siua/izumifla/e-prirocnikibiblio/1_COMARC_BAppendices/Dodatek_JUNIMARC_2008_IFLA-Appendix_J_Character_sets_729-737.pdf Dodatek Appendix J Character sets] 729- Monografski viri v več delih]737 pp., UNIMARC Bibliographic, IFLA, 2008 (англ.) * Annexe J – Jeux de caractères, IZUM Bibliographic Transition in France (словенфр.)
[[Категорія:Додатки UNIMARC]]