File::Metadata::Libextractor

Use libextractor to read file metadata

NAME

File::Metadata::Libextractor - Use libextractor to read file metadata

SYNOPSIS

use File::Metadata::Libextractor;

#| This program extracts all the information about a file
sub MAIN($file! where { .IO.f // die "file '$file' not found" })
{
  my File::Metadata::Libextractor $e .= new;
  my @info = $e.extract($file);
  for @info -> %record {
    for %record.kv -> $k, $v {
      say "$k: $v"
    }
    say '-' x 50;
  }
}

DESCRIPTION

File::Metadata::Libextractor provides an OO interface to libextractor in order to query files' metadata.

As the Libextractor site (https://www.gnu.org/software/libextractor) states, it is able to read information in the following file types:

  • HTML

  • MAN

  • PS

  • DVI

  • OLE2 (DOC, XLS, PPT)

  • OpenOffice (sxw)

  • StarOffice (sdw)

  • FLAC

  • MP3 (ID3v1 and ID3v2)

  • OGG

  • WAV

  • S3M (Scream Tracker 3)

  • XM (eXtended Module)

  • IT (Impulse Tracker)

  • NSF(E) (NES music)

  • SID (C64 music)

  • EXIV2

  • JPEG

  • GIF

  • PNG

  • TIFF

  • DEB

  • RPM

  • TAR(.GZ)

  • LZH

  • LHA

  • RAR

  • ZIP

  • CAB

  • 7-ZIP

  • AR

  • MTREE

  • PAX

  • CPIO

  • ISO9660

  • SHAR

  • RAW

  • XAR FLV

  • REAL

  • RIFF (AVI)

  • MPEG

  • QT

  • ASF

Also, various additional MIME types are detected.

new(Bool :$in-process?)

Creates a File::Metadata::Libextractor object.

libextractor interfaces to several libraries in order to extract the metadata. To work safely it starts sub-processes to perform the actual extraction work.

This might cause problems in a concurrent envirnment with locks. A possible solution is to run the extraction process inside the program's own process. It's less secure, but it may avoid locking problems.

The optional argument $in-process allows the execution of the extraction job in the parent's process.

extract(file' not found" --> List)

Reads all the possible information from an existing file, or fails if the file doesn't exist. The output List is actually a List of Hashes. Each hash has the following keys:

  • mime-type The file's mime-type

  • plugin-name The name of the plugin the library used to find out the information

  • plugin-type The plugin subtype used for the operation

  • plugin-format The format of the plugin's output

  • data-type The value returned by the plugin subtype

The possible values for plugin-format are:

  • EXTRACTOR_METAFORMAT_UNKNOWN

  • EXTRACTOR_METAFORMAT_UTF8

  • EXTRACTOR_METAFORMAT_BINARY

  • EXTRACTOR_METAFORMAT_C_STRING

The possible values for the plugin-type field are listed in File::Metadata::Libextractor::Constants, in the EXTRACTOR_MetaType enum (231 values as for v3.1.6).

Prerequisites

This module requires the libextractor library to be installed. It has been successfully tested on the following Linux distributions:

  • Debian 9

  • Debian sid

  • Ubuntu 16.04

  • Ubuntu 18.04

It doesn't work with the version of the library that comes with Ubuntu 14.04.

sudo apt-get install libextractor3

This module looks for a library called libextractor.so.3 .

Installation

To install it using zef (a module management tool):

$ zef install File::Metadata::Libextractor

Testing

To run the tests:

$ prove -e "raku -Ilib"

AUTHOR

Fernando Santagata [email protected]

COPYRIGHT AND LICENSE

Copyright 2018 Fernando Santagata

This library is free software; you can redistribute it and/or modify it under the Artistic License 2.0.

File::Metadata::Libextractor v0.0.3

Use libextractor to read file metadata

Authors

  • Fernando Santagata

License

Artistic-2.0

Dependencies

Test Dependencies

Provides

  • File::Metadata::Libextractor
  • File::Metadata::Libextractor::Constants
  • File::Metadata::Libextractor::Raw

Documentation

The Camelia image is copyright 2009 by Larry Wall. "Raku" is trademark of the Yet Another Society. All rights reserved.