parse-cldr

To use this script, simply execute it. By default, it will process the whole of the CLDR. Because it must load files in a particular order, it is not easily parallelizable. Thus, it is recommended to run the command on just a few letters at a time. To run on languages whose codes begin with a, i, and x, use

raku parse-cldr.raku a i x

You must ensure that a copy of the CLDR data is included in the resources folder, and renamed to exclude the version information, that is, under cldr-common (such that the path to English's data is resources/cldr-common/common/main/en.xml. This is excluded from distribution to reduce file size, although technically Unicode's license would permit it.

The basic process for generating the data files is as follows:

1. Read the base XML files (en, es, etc)
  2. For each sub file, deep copy using C<from-json to-json %hash.raku>, and apply the new data (en-US, es-ES, etc)
     on top of it using C<parse>
  3. Then, using C<encode>, generate the two data files -- a binary tree file and a strings file.
  4. The strings file is interpreted as a giant array, so that the binary file may easily reference
     the strings (many strings are repeated, so this saves space).

The <alias> tag only exists for root, and not for any other language. They are ignored by parse, and the fallback interpretations are generally handled in the encode methods. This means data may be duplicated (but duplicate strings are practically free). The slight increase in memory is well worth the speed improvements.

There are a number of subs available to parse via Intl::CLDR::Util::XML-Helper to keep things simple: - elem $xml, $tag OR $xml.&elem($tag) Returns a single child element matching the tag when we know there will be only one element. Dies if more than one. - elems $xml, $tags OR $xml.&elems($tag) Returns a child elements matching the tag - contents $xml Returns the text content of the tag

Intl::CLDR v0.7.2

A module providing access to the Unicode Common Language Data Repository

Authors

Matthew ‘Matéu’ Stephen STUCKWISCH

License

Artistic-2.0

Dependencies

Intl::LanguageTag:auth<zef:guifa>:ver<0.12.1+>Intl::UserLanguage:auth<zef:guifa>:ver<0.4.0+>

Test Dependencies

Provides

Intl::CLDR
Intl::CLDR::Core
Intl::CLDR::Enums
Intl::CLDR::Genders
Intl::CLDR::NumberSystems::Digits
Intl::CLDR::NumberSystems::Ge'ez
Intl::CLDR::NumberSystems::Mayan
Intl::CLDR::NumberSystems::Roman
Intl::CLDR::Types::AppendItems
Intl::CLDR::Types::AvailableFormats
Intl::CLDR::Types::Calendar
Intl::CLDR::Types::Calendars
Intl::CLDR::Types::Characters
Intl::CLDR::Types::CompoundUnitSet
Intl::CLDR::Types::CompoundUnits
Intl::CLDR::Types::ContextTransformUsage
Intl::CLDR::Types::ContextTransforms
Intl::CLDR::Types::CoordinateWidth
Intl::CLDR::Types::Coordinates
Intl::CLDR::Types::Currencies
Intl::CLDR::Types::Currency
Intl::CLDR::Types::CurrencyFormatSystem
Intl::CLDR::Types::CurrencyFormats
Intl::CLDR::Types::CyclicNameContext
Intl::CLDR::Types::CyclicNameSet
Intl::CLDR::Types::CyclicNameSets
Intl::CLDR::Types::CyclicNameWidth
Intl::CLDR::Types::Database
Intl::CLDR::Types::DateFormat
Intl::CLDR::Types::DateFormats
Intl::CLDR::Types::DateTimeFormat
Intl::CLDR::Types::DateTimeFormats
Intl::CLDR::Types::Dates
Intl::CLDR::Types::DayContext
Intl::CLDR::Types::DayPeriodContext
Intl::CLDR::Types::DayPeriodRule
Intl::CLDR::Types::DayPeriodRuleSets
Intl::CLDR::Types::DayPeriodRules
Intl::CLDR::Types::DayPeriodWidth
Intl::CLDR::Types::DayPeriods
Intl::CLDR::Types::DayWidth
Intl::CLDR::Types::Days
Intl::CLDR::Types::DecimalFormatSystem
Intl::CLDR::Types::DecimalFormats
Intl::CLDR::Types::Delimiters
Intl::CLDR::Types::Derivation
Intl::CLDR::Types::DerivationComponent
Intl::CLDR::Types::DerivationCompound
Intl::CLDR::Types::Derivations
Intl::CLDR::Types::Durations
Intl::CLDR::Types::Ellipses
Intl::CLDR::Types::EraWidth
Intl::CLDR::Types::Eras
Intl::CLDR::Types::ExemplarCharacters
Intl::CLDR::Types::ExtensionName
Intl::CLDR::Types::ExtensionNames
Intl::CLDR::Types::Field
Intl::CLDR::Types::FieldWidth
Intl::CLDR::Types::Fields
Intl::CLDR::Types::Grammar
Intl::CLDR::Types::IntervalFormat
Intl::CLDR::Types::IntervalFormats
Intl::CLDR::Types::Language
Intl::CLDR::Types::LanguageGroups
Intl::CLDR::Types::LanguageNames
Intl::CLDR::Types::Languages
Intl::CLDR::Types::Layout
Intl::CLDR::Types::ListPattern
Intl::CLDR::Types::ListPatternWidth
Intl::CLDR::Types::ListPatterns
Intl::CLDR::Types::LocaleDisplayNames
Intl::CLDR::Types::LocaleDisplayPatterns
Intl::CLDR::Types::LocaleExtensionTypes
Intl::CLDR::Types::MeasurementSystemNames
Intl::CLDR::Types::Messages
Intl::CLDR::Types::Metazones
Intl::CLDR::Types::MinimalPairs
Intl::CLDR::Types::MiscellaneousPatternSet
Intl::CLDR::Types::MiscellaneousPatterns
Intl::CLDR::Types::MonthContext
Intl::CLDR::Types::MonthPatternContext
Intl::CLDR::Types::MonthPatternWidth
Intl::CLDR::Types::MonthPatterns
Intl::CLDR::Types::MonthWidth
Intl::CLDR::Types::Months
Intl::CLDR::Types::NumberFormat
Intl::CLDR::Types::NumberFormatSet
Intl::CLDR::Types::NumberingSystems
Intl::CLDR::Types::Numbers
Intl::CLDR::Types::Orientation
Intl::CLDR::Types::PercentFormatSystem
Intl::CLDR::Types::PercentFormats
Intl::CLDR::Types::PluralRangeRuleSet
Intl::CLDR::Types::PluralRuleSet
Intl::CLDR::Types::Plurals
Intl::CLDR::Types::Posix
Intl::CLDR::Types::QuarterContext
Intl::CLDR::Types::QuarterWidth
Intl::CLDR::Types::Quarters
Intl::CLDR::Types::RegionFormat
Intl::CLDR::Types::RelativeTime
Intl::CLDR::Types::ScientificFormatSystem
Intl::CLDR::Types::ScientificFormats
Intl::CLDR::Types::ScriptNames
Intl::CLDR::Types::SimpleUnitSet
Intl::CLDR::Types::SimpleUnits
Intl::CLDR::Types::Subdivision
Intl::CLDR::Types::Subdivisions
Intl::CLDR::Types::Supplement
Intl::CLDR::Types::SymbolSet
Intl::CLDR::Types::Symbols
Intl::CLDR::Types::TerritoryNames
Intl::CLDR::Types::TimeFormat
Intl::CLDR::Types::TimeFormats
Intl::CLDR::Types::TimezoneMaps
Intl::CLDR::Types::TimezoneNames
Intl::CLDR::Types::Units
Intl::CLDR::Types::VariantNames
Intl::CLDR::Types::WindowsZoneMap
Intl::CLDR::Types::WindowsZoneMaps
Intl::CLDR::Types::Zone
Intl::CLDR::Types::ZoneWidth
Intl::CLDR::Types::Zones
Intl::CLDR::Util::StrDecode
Intl::CLDR::Util::StrEncode
Intl::CLDR::Util::XML-Helper