parse-cldr

To use this script, simply execute it. By default, it will process the whole of the CLDR. Because it must load files in a particular order, it is not easily parallelizable. Thus, it is recommended to run the command on just a few letters at a time. To run on languages whose codes begin with a, i, and x, use

raku parse-cldr.raku a i x

You must ensure that a copy of the CLDR data is included in the resources folder, and renamed to exclude the version information, that is, under cldr-common (such that the path to English's data is resources/cldr-common/common/main/en.xml. This is excluded from distribution to reduce file size, although technically Unicode's license would permit it.

The basic process for generating the data files is as follows:

1. Read the base XML files (en, es, etc)
  2. For each sub file, deep copy using C<from-json to-json %hash.raku>, and apply the new data (en-US, es-ES, etc)
     on top of it using C<parse>
  3. Then, using C<encode>, generate the two data files -- a binary tree file and a strings file.
  4. The strings file is interpreted as a giant array, so that the binary file may easily reference
     the strings (many strings are repeated, so this saves space).

The <alias> tag only exists for root, and not for any other language. They are ignored by parse, and the fallback interpretations are generally handled in the encode methods. This means data may be duplicated (but duplicate strings are practically free). The slight increase in memory is well worth the speed improvements.

There are a number of subs available to parse via Intl::CLDR::Util::XML-Helper to keep things simple: - elem $xml, $tag OR $xml.&elem($tag) Returns a single child element matching the tag when we know there will be only one element. Dies if more than one. - elems $xml, $tags OR $xml.&elems($tag) Returns a child elements matching the tag - contents $xml Returns the text content of the tag

Intl::CLDR v0.7.2

A module providing access to the Unicode Common Language Data Repository

Authors

  • Matthew β€˜MatΓ©u’ Stephen STUCKWISCH

License

Artistic-2.0

Dependencies

Intl::LanguageTag:auth<zef:guifa>:ver<0.12.1+>Intl::UserLanguage:auth<zef:guifa>:ver<0.4.0+>

Test Dependencies

Provides

  • Intl::CLDR
  • Intl::CLDR::Core
  • Intl::CLDR::Enums
  • Intl::CLDR::Genders
  • Intl::CLDR::NumberSystems::Digits
  • Intl::CLDR::NumberSystems::Ge'ez
  • Intl::CLDR::NumberSystems::Mayan
  • Intl::CLDR::NumberSystems::Roman
  • Intl::CLDR::Types::AppendItems
  • Intl::CLDR::Types::AvailableFormats
  • Intl::CLDR::Types::Calendar
  • Intl::CLDR::Types::Calendars
  • Intl::CLDR::Types::Characters
  • Intl::CLDR::Types::CompoundUnitSet
  • Intl::CLDR::Types::CompoundUnits
  • Intl::CLDR::Types::ContextTransformUsage
  • Intl::CLDR::Types::ContextTransforms
  • Intl::CLDR::Types::CoordinateWidth
  • Intl::CLDR::Types::Coordinates
  • Intl::CLDR::Types::Currencies
  • Intl::CLDR::Types::Currency
  • Intl::CLDR::Types::CurrencyFormatSystem
  • Intl::CLDR::Types::CurrencyFormats
  • Intl::CLDR::Types::CyclicNameContext
  • Intl::CLDR::Types::CyclicNameSet
  • Intl::CLDR::Types::CyclicNameSets
  • Intl::CLDR::Types::CyclicNameWidth
  • Intl::CLDR::Types::Database
  • Intl::CLDR::Types::DateFormat
  • Intl::CLDR::Types::DateFormats
  • Intl::CLDR::Types::DateTimeFormat
  • Intl::CLDR::Types::DateTimeFormats
  • Intl::CLDR::Types::Dates
  • Intl::CLDR::Types::DayContext
  • Intl::CLDR::Types::DayPeriodContext
  • Intl::CLDR::Types::DayPeriodRule
  • Intl::CLDR::Types::DayPeriodRuleSets
  • Intl::CLDR::Types::DayPeriodRules
  • Intl::CLDR::Types::DayPeriodWidth
  • Intl::CLDR::Types::DayPeriods
  • Intl::CLDR::Types::DayWidth
  • Intl::CLDR::Types::Days
  • Intl::CLDR::Types::DecimalFormatSystem
  • Intl::CLDR::Types::DecimalFormats
  • Intl::CLDR::Types::Delimiters
  • Intl::CLDR::Types::Derivation
  • Intl::CLDR::Types::DerivationComponent
  • Intl::CLDR::Types::DerivationCompound
  • Intl::CLDR::Types::Derivations
  • Intl::CLDR::Types::Durations
  • Intl::CLDR::Types::Ellipses
  • Intl::CLDR::Types::EraWidth
  • Intl::CLDR::Types::Eras
  • Intl::CLDR::Types::ExemplarCharacters
  • Intl::CLDR::Types::ExtensionName
  • Intl::CLDR::Types::ExtensionNames
  • Intl::CLDR::Types::Field
  • Intl::CLDR::Types::FieldWidth
  • Intl::CLDR::Types::Fields
  • Intl::CLDR::Types::Grammar
  • Intl::CLDR::Types::IntervalFormat
  • Intl::CLDR::Types::IntervalFormats
  • Intl::CLDR::Types::Language
  • Intl::CLDR::Types::LanguageGroups
  • Intl::CLDR::Types::LanguageNames
  • Intl::CLDR::Types::Languages
  • Intl::CLDR::Types::Layout
  • Intl::CLDR::Types::ListPattern
  • Intl::CLDR::Types::ListPatternWidth
  • Intl::CLDR::Types::ListPatterns
  • Intl::CLDR::Types::LocaleDisplayNames
  • Intl::CLDR::Types::LocaleDisplayPatterns
  • Intl::CLDR::Types::LocaleExtensionTypes
  • Intl::CLDR::Types::MeasurementSystemNames
  • Intl::CLDR::Types::Messages
  • Intl::CLDR::Types::Metazones
  • Intl::CLDR::Types::MinimalPairs
  • Intl::CLDR::Types::MiscellaneousPatternSet
  • Intl::CLDR::Types::MiscellaneousPatterns
  • Intl::CLDR::Types::MonthContext
  • Intl::CLDR::Types::MonthPatternContext
  • Intl::CLDR::Types::MonthPatternWidth
  • Intl::CLDR::Types::MonthPatterns
  • Intl::CLDR::Types::MonthWidth
  • Intl::CLDR::Types::Months
  • Intl::CLDR::Types::NumberFormat
  • Intl::CLDR::Types::NumberFormatSet
  • Intl::CLDR::Types::NumberingSystems
  • Intl::CLDR::Types::Numbers
  • Intl::CLDR::Types::Orientation
  • Intl::CLDR::Types::PercentFormatSystem
  • Intl::CLDR::Types::PercentFormats
  • Intl::CLDR::Types::PluralRangeRuleSet
  • Intl::CLDR::Types::PluralRuleSet
  • Intl::CLDR::Types::Plurals
  • Intl::CLDR::Types::Posix
  • Intl::CLDR::Types::QuarterContext
  • Intl::CLDR::Types::QuarterWidth
  • Intl::CLDR::Types::Quarters
  • Intl::CLDR::Types::RegionFormat
  • Intl::CLDR::Types::RelativeTime
  • Intl::CLDR::Types::ScientificFormatSystem
  • Intl::CLDR::Types::ScientificFormats
  • Intl::CLDR::Types::ScriptNames
  • Intl::CLDR::Types::SimpleUnitSet
  • Intl::CLDR::Types::SimpleUnits
  • Intl::CLDR::Types::Subdivision
  • Intl::CLDR::Types::Subdivisions
  • Intl::CLDR::Types::Supplement
  • Intl::CLDR::Types::SymbolSet
  • Intl::CLDR::Types::Symbols
  • Intl::CLDR::Types::TerritoryNames
  • Intl::CLDR::Types::TimeFormat
  • Intl::CLDR::Types::TimeFormats
  • Intl::CLDR::Types::TimezoneMaps
  • Intl::CLDR::Types::TimezoneNames
  • Intl::CLDR::Types::Units
  • Intl::CLDR::Types::VariantNames
  • Intl::CLDR::Types::WindowsZoneMap
  • Intl::CLDR::Types::WindowsZoneMaps
  • Intl::CLDR::Types::Zone
  • Intl::CLDR::Types::ZoneWidth
  • Intl::CLDR::Types::Zones
  • Intl::CLDR::Util::StrDecode
  • Intl::CLDR::Util::StrEncode
  • Intl::CLDR::Util::XML-Helper

The Camelia image is copyright 2009 by Larry Wall. "Raku" is trademark of the Yet Another Society. All rights reserved.