class Lingo::Database::Source

Die Klasse Source stellt eine einheitliche Schnittstelle auf die unterschiedlichen Formate von Wörterbuch-Quelldateien bereit. Die Identifizierung der Quelldatei erfolgt über die ID der Datei, so wie sie in der Sprachkonfigurationsdatei de.lang unter language/dictionary/databases hinterlegt ist.

Die Verarbeitung der Wörterbücher erfolgt mittels des Iterators each, der für jede Zeile der Quelldatei ein Array bereitstellt in der Form [ key, [val1, val2, ...] ].

Nicht korrekt erkannte Zeilen werden abgewiesen und in eine Revoke-Datei gespeichert, die an der Dateiendung .rev zu erkennen ist.

Constants

PRINTABLE_CHAR
UTF8_BASLAT

Define Basic Latin printable characters for UTF-8 encoding from U+0000 to U+007f

UTF8_CHAR

Collect all UTF-8 printable characters in Unicode range U+0000 to U+02af

UTF8_DIGIT

Define printable characters for tokenizer for UTF-8 encoding

UTF8_IPAEXT

Define IPA Extension printable characters for UTF-8 encoding from U+024f to U+02af

UTF8_LAT1SP

Define Latin-1 Supplement printable characters for UTF-8 encoding from U+0080 to U+00ff

UTF8_LATEXA

Define Latin Extended-A printable characters for UTF-8 encoding from U+0100 to U+017f

UTF8_LATEXB

Define Latin Extended-B printable characters for UTF-8 encoding from U+0180 to U+024f

Attributes

pos[R]

Public Class Methods

get(name, *args) click to toggle source
# File lib/lingo/database/source.rb, line 67
def self.get(name, *args)
  Lingo.get_const(name, self).new(*args)
end
new(id, lingo) click to toggle source
# File lib/lingo/database/source.rb, line 73
def initialize(id, lingo)
  @config = lingo.database_config(id)

  source_file = Lingo.find(:dict, name = @config['name'], relax: true)

  reject_file = begin
    Lingo.find(:store, source_file) << '.rev'
  rescue NoWritableStoreError, SourceFileNotFoundError
  end

  @src = Pathname.new(source_file)
  @rej = Pathname.new(reject_file) if reject_file

  raise SourceFileNotFoundError.new(name, id) unless @src.exist?

  @def = @config.fetch('def-wc', Language::LA_UNKNOWN).downcase
  @sep = @config['separator']

  @wrd = "(?:#{PRINTABLE_CHAR}|#{LEGAL_CHAR})+"
  @pat = %r^#{@wrd}$/

  @pos = 0
end

Public Instance Methods

each() { |convert_line(line, $1, $2)| ... } click to toggle source
# File lib/lingo/database/source.rb, line 101
def each
  reject_file = @rej.open('w', encoding: ENC) if @rej

  @src.each_line($/, encoding: ENC) { |line|
    @pos += length = line.bytesize

    next if line =~ %r\A\s*#/ || line.strip.empty?

    line.chomp!
    line.downcase!

    if length < 4096 && line =~ @pat
      yield convert_line(line, $1, $2)
    else
      reject_file.puts(line) if reject_file
    end
  }

  self
ensure
  if reject_file
    reject_file.close
    @rej.delete if @rej.size == 0
  end
end
set(db, key, val) click to toggle source
# File lib/lingo/database/source.rb, line 127
def set(db, key, val)
  db[key] = val
end
size() click to toggle source
# File lib/lingo/database/source.rb, line 97
def size
  @src.size
end