Spreadsheet::Readによるデータ処理
戻る
Perl 5.8.x Unicode関連
# excel2sql1.pl
use strict;
use Spreadsheet::Read;
use Encode qw(from_to);
my(%t,$n,$ref,$cell);
$t{file} = 'makers.xls';
# ファイルをオープン
$ref = ReadData("../input/$t{file}") or die "Can't open the file $t{file}\n";
# B列のすべてのデータを読む
for $n ( 1 .. $#{ $ref->[1]{cell}[2] } ) {
$t{XY1} = 'B' . $n;
$t{XY2} = 'E' . $n;
$t{one1} = $ref->[1]{$t{XY1}};
$t{one2} = $ref->[1]{$t{XY2}};
if (length($t{one1}) < 2 ) {
$t{NO}=0;
$t{Elist}='';
next;
}
$t{one1} = Encode::encode("shiftjis",$t{one1});
$t{one2} = Encode::encode("shiftjis",$t{one2});
$t{NO}++;
if ( $t{NO} == 1 ) {
push(@{ $t{TO} },$t{one1});
if ( length($t{one2}) > 2 ) {
$t{Elist} = $t{one2};
}
} elsif ( $t{NO} == 2 ) {
push(@{ $t{ATTN} },$t{one1});
if ( length($t{one2}) > 2 ) {
$t{Elist} = $t{Elist} . ',' . $t{one2};
}
} elsif ( $t{NO} == 3 ) {
push(@{ $t{TEL} },$t{one1});
if ( length($t{one2}) > 2 ) {
$t{Elist} = $t{Elist} . ',' . $t{one2};
}
push(@{ $t{Es} }, $t{Elist});
}
}
$t{sanma}='様';
# テスト用
#open(OUT1,">../sql/makerstmp.txt");
#for $n ( 0 .. $#{ $t{TO} } ) {
# $t{Memo} = $t{Es}[$n];
# print OUT1 '"';
# print OUT1 $t{Memo},'"';
# print OUT1 "\n";
#}
#close(OUT1);
#exit;
# Write to a sql file.
open(OUT,">../sql/makers.sql");
print OUT 'DROP TABLE IF EXISTS makers;',"\n";
print OUT 'CREATE TABLE makers', "\n";
print OUT '(',"\n";
print OUT "\t",'id INT AUTO_INCREMENT,',"\n";
print OUT "\t",'company CHAR(100),',"\n";
print OUT "\t",'name CHAR(40),',"\n";
print OUT "\t",'TEL CHAR(100),',"\n";
print OUT "\t",'Memo CHAR(200),',"\n";
print OUT "\t",'PRIMARY KEY (id)',"\n";
print OUT ');',"\n\n";
for $n ( 0 .. $#{ $t{TO} } ) {
$t{company} = $t{TO}[$n];
$t{company} =~ s/TO:\s*//;
from_to($t{company},"shiftjis","utf8");
$t{name} = $t{ATTN}[$n];
$t{name} =~ s/ATTN:\s*//;
$t{name} =~ s/$t{sanma}//g;
$t{name} =~ s/\s*$//;
$t{name} =~ s/^\s*//;
from_to($t{name},"shiftjis","utf8");
$t{tel} = $t{TEL}[$n];
from_to($t{tel},"shiftjis","utf8");
$t{Memo} = $t{Es}[$n];
from_to($t{Memo},"shiftjis","utf8");
print OUT 'INSERT INTO makers ';
print OUT '(company,name,TEL,Memo) VALUES("';
print OUT $t{company},'","';
print OUT $t{name},'","';
print OUT $t{tel},'","';
print OUT $t{Memo};
print OUT '");',"\n";
}
close(OUT);
---------------------------------------------------------------
shiftjisコートファイルをutf8コートファイルに変更
---------------------------------------------------------------
C:>mysql cookbook < makers.sql -u cbuser -p
Enter password: ******
ERROR 1366 (HY000) at line 12: Incorrect string value: '\x83A\x83N\x83g...' for
column 'company' at row 1
---------------------------------------------------------------
---------------------------------------------------------------
B列が読めましたが、ほかの列だめ。Win32::OLEを使います。==>Sheetの名前に注意!
---------------------------------------------------------------
# excel2sql1.pl
use strict;
use Spreadsheet::Read;
use Encode;
my(%t,$n,$ref,$cell);
$t{file} = 'makers.xls';
# ファイルをオープン
$ref = ReadData("../input/$t{file}") or die "Can't open the file $t{file}\n";
# B列のすべてのデータを読む
for $n ( 1 .. $#{ $ref->[1]{cell}[2] } ) {
$t{XY1} = 'B' . $n;
$t{one1} = $ref->[1]{$t{XY1}};
if (length($t{one1}) < 2 ) {
$t{NO}=0;
$t{Elist}='';
next;
}
$t{one1} = Encode::encode("shiftjis",$t{one1});
$t{NO}++;
if ( $t{NO} == 1 ) {
push(@{ $t{TO} },$t{one1});
} elsif ( $t{NO} == 2 ) {
push(@{ $t{ATTN} },$t{one1});
} elsif ( $t{NO} == 3 ) {
push(@{ $t{TEL} },$t{one1});
}
}
for $n ( 0 .. $#{ $t{TO} } ) {
print "TO=$t{TO}[$n],";
print "ATTN=$t{ATTN}[$n],";
print "TEL=$t{TEL}[$n]\n";
}
__END__;
-------------------------------------------------------------------------
正しい使い方
use Encode;
$t{one1} = $ref->[1]{B1};
$t{one1} = Encode::encode("shiftjis",$t{one1});
-------------------------------------------------------------------------
間違った
use Encode qw(from_to);
from_to($t{one1}, "utf8", "Shift_JIS");
Cannot decode string with wide characters at C:/Perl/lib/Encode.pm line 186.
-------------------------------------------------------------------------
NAME
Spreadsheet::Read - Read the data from a spreadsheet
SYNOPSYS
use Spreadsheet::Read; my $xls = ReadData ("test.xls"); my $sxc =
ReadData ("test.sxc");
DESCRIPTION
Spreadsheet::Read tries to transparantly read *any* spreadsheet and
return it's content in a universal manner independant of the parsing
module that does the actual spreadsheet scanning.
For OpenOffice this module uses Spreadsheet::ReadSXC
For Excel this module uses Spreadsheet::ParseExcel
Data structure
The data is returned as an array reference:
$ref = [
# Entry 0 is the overall control hash
{ sheets => 2,
sheet => {
"Sheet 1" => 1,
"Sheet 2" => 2,
},
type => "xls",
},
# Entry 1 is the first sheet
{ label => "Sheet 1",
maxrow => 2,
maxcol => 4,
cell => [ undef,
[ undef, 1 ],
[ undef, undef, undef, undef, undef, "Nugget" ],
],
A1 => 1,
B4 => "Nugget",
},
# Entry 2 is the second sheet
{ label => "Sheet 2",
:
:
To keep as close contact to spreadsheet users, row and column 1 have
index 1 too in the "cell" element of the sheet hash, so cell "A1" is the
same as "cell" [1, 1] (column first). To switch between the two, there
are two helper functions available: "cell2cr ()" and "cr2cell ()".
The "cell" hash entry contains unformatted data, while the hash entries
with the traditional labels contain the formatted values (if
applicable).
The control hash (the first entry in the returned array ref), contains
some spreadsheet metadata. The entry "sheet" is there to be able to find
the sheets when accessing them by name:
my %sheet2 = %{$ref->[$ref->[0]{sheet}{"Sheet 2"}]};
Functions
"my $ref = ReadData ("file.xls");"
"my $ref = ReadData ("file.sxc");"
"my $ref = ReadData ("content.xml");"
"my $ref = ReadData ($content);"
Tries to convert the given file or string to the data structure
described above.
Currently ReadSXC does not preserve sheet order.
"my $cell = cr2cell (col, row)"
"cr2cell ()" converts a "(column, row)" pair (1 based) to the
traditional cell notation:
my $cell = cr2cell ( 4, 14); # $cell now "D14"
my $cell = cr2cell (28, 4); # $cell now "AB4"
"my ($col, $row) = cell2cr ($cell)"
TODO
Cell attributes
Future plans include cell attributes, available as for example:
{ label => "Sheet 1",
maxrow => 2,
maxcol => 4,
cell => [ undef,
[ undef, 1 ],
[ undef, undef, undef, undef, undef, "Nugget" ],
],
attr => [ undef,
[ undef, {
color => "Red",
font => "Arial",
size => "12",
format => "## ###.##",
align => "right",
}, ]
[ undef, undef, undef, undef, undef, {
color => "#e2e2e2",
font => "LetterGothic",
size => "15",
format => undef,
align => "left",
}, ]
A1 => 1,
B4 => "Nugget",
},
Options
Try to transparently support as many options as the encapsulated
modules support regarding (un)formatted values, (date) formats,
hidden columns rows or fields etc. These could be implemented like
"attr" above but names "meta", or just be new values in the "attr"
hashes.
Other spreadsheet formats
I consider adding CSV
Safety / flexibility
Make the different formats/modules just load if available and ignore
if not available.
OO-ify
Consider making the ref an object, though I currently don't see the
big advantage (yet). Maybe I'll make it so that it is a hybrid
functional / OO interface.
SEE ALSO
Spreadsheet::ParseExcel
http://search.cpan.org/~kwitknr/
Spreadsheet::ReadSXC
http://search.cpan.org/~terhechte/
Text::CSV_XS, Text::CSV
http://search.cpan.org/~jwied/ http://search.cpan.org/~alancitt/
AUTHOR
H.Merijn Brand,
COPYRIGHT AND LICENSE
Copyright (C) 2005-2005 H.Merijn Brand
This library is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
戻る