Spreadsheet::Readによるデータ処理

戻る

Perl 5.8.x Unicode関連 # excel2sql1.pl use strict; use Spreadsheet::Read; use Encode qw(from_to); my(%t,$n,$ref,$cell); $t{file} = 'makers.xls'; # ファイルをオープン $ref = ReadData("../input/$t{file}") or die "Can't open the file $t{file}\n"; # B列のすべてのデータを読む for $n ( 1 .. $#{ $ref->[1]{cell}[2] } ) { $t{XY1} = 'B' . $n; $t{XY2} = 'E' . $n; $t{one1} = $ref->[1]{$t{XY1}}; $t{one2} = $ref->[1]{$t{XY2}}; if (length($t{one1}) < 2 ) { $t{NO}=0; $t{Elist}=''; next; } $t{one1} = Encode::encode("shiftjis",$t{one1}); $t{one2} = Encode::encode("shiftjis",$t{one2}); $t{NO}++; if ( $t{NO} == 1 ) { push(@{ $t{TO} },$t{one1}); if ( length($t{one2}) > 2 ) { $t{Elist} = $t{one2}; } } elsif ( $t{NO} == 2 ) { push(@{ $t{ATTN} },$t{one1}); if ( length($t{one2}) > 2 ) { $t{Elist} = $t{Elist} . ',' . $t{one2}; } } elsif ( $t{NO} == 3 ) { push(@{ $t{TEL} },$t{one1}); if ( length($t{one2}) > 2 ) { $t{Elist} = $t{Elist} . ',' . $t{one2}; } push(@{ $t{Es} }, $t{Elist}); } } $t{sanma}='様'; # テスト用 #open(OUT1,">../sql/makerstmp.txt"); #for $n ( 0 .. $#{ $t{TO} } ) { # $t{Memo} = $t{Es}[$n]; # print OUT1 '"'; # print OUT1 $t{Memo},'"'; # print OUT1 "\n"; #} #close(OUT1); #exit; # Write to a sql file. open(OUT,">../sql/makers.sql"); print OUT 'DROP TABLE IF EXISTS makers;',"\n"; print OUT 'CREATE TABLE makers', "\n"; print OUT '(',"\n"; print OUT "\t",'id INT AUTO_INCREMENT,',"\n"; print OUT "\t",'company CHAR(100),',"\n"; print OUT "\t",'name CHAR(40),',"\n"; print OUT "\t",'TEL CHAR(100),',"\n"; print OUT "\t",'Memo CHAR(200),',"\n"; print OUT "\t",'PRIMARY KEY (id)',"\n"; print OUT ');',"\n\n"; for $n ( 0 .. $#{ $t{TO} } ) { $t{company} = $t{TO}[$n]; $t{company} =~ s/TO:\s*//; from_to($t{company},"shiftjis","utf8"); $t{name} = $t{ATTN}[$n]; $t{name} =~ s/ATTN:\s*//; $t{name} =~ s/$t{sanma}//g; $t{name} =~ s/\s*$//; $t{name} =~ s/^\s*//; from_to($t{name},"shiftjis","utf8"); $t{tel} = $t{TEL}[$n]; from_to($t{tel},"shiftjis","utf8"); $t{Memo} = $t{Es}[$n]; from_to($t{Memo},"shiftjis","utf8"); print OUT 'INSERT INTO makers '; print OUT '(company,name,TEL,Memo) VALUES("'; print OUT $t{company},'","'; print OUT $t{name},'","'; print OUT $t{tel},'","'; print OUT $t{Memo}; print OUT '");',"\n"; } close(OUT); --------------------------------------------------------------- shiftjisコートファイルをutf8コートファイルに変更 --------------------------------------------------------------- C:>mysql cookbook < makers.sql -u cbuser -p Enter password: ****** ERROR 1366 (HY000) at line 12: Incorrect string value: '\x83A\x83N\x83g...' for column 'company' at row 1 --------------------------------------------------------------- --------------------------------------------------------------- B列が読めましたが、ほかの列だめ。Win32::OLEを使います。==>Sheetの名前に注意! --------------------------------------------------------------- # excel2sql1.pl use strict; use Spreadsheet::Read; use Encode; my(%t,$n,$ref,$cell); $t{file} = 'makers.xls'; # ファイルをオープン $ref = ReadData("../input/$t{file}") or die "Can't open the file $t{file}\n"; # B列のすべてのデータを読む for $n ( 1 .. $#{ $ref->[1]{cell}[2] } ) { $t{XY1} = 'B' . $n; $t{one1} = $ref->[1]{$t{XY1}}; if (length($t{one1}) < 2 ) { $t{NO}=0; $t{Elist}=''; next; } $t{one1} = Encode::encode("shiftjis",$t{one1}); $t{NO}++; if ( $t{NO} == 1 ) { push(@{ $t{TO} },$t{one1}); } elsif ( $t{NO} == 2 ) { push(@{ $t{ATTN} },$t{one1}); } elsif ( $t{NO} == 3 ) { push(@{ $t{TEL} },$t{one1}); } } for $n ( 0 .. $#{ $t{TO} } ) { print "TO=$t{TO}[$n],"; print "ATTN=$t{ATTN}[$n],"; print "TEL=$t{TEL}[$n]\n"; } __END__; ------------------------------------------------------------------------- 正しい使い方 use Encode; $t{one1} = $ref->[1]{B1}; $t{one1} = Encode::encode("shiftjis",$t{one1}); ------------------------------------------------------------------------- 間違った use Encode qw(from_to); from_to($t{one1}, "utf8", "Shift_JIS"); Cannot decode string with wide characters at C:/Perl/lib/Encode.pm line 186. -------------------------------------------------------------------------
NAME Spreadsheet::Read - Read the data from a spreadsheet SYNOPSYS use Spreadsheet::Read; my $xls = ReadData ("test.xls"); my $sxc = ReadData ("test.sxc"); DESCRIPTION Spreadsheet::Read tries to transparantly read *any* spreadsheet and return it's content in a universal manner independant of the parsing module that does the actual spreadsheet scanning. For OpenOffice this module uses Spreadsheet::ReadSXC For Excel this module uses Spreadsheet::ParseExcel Data structure The data is returned as an array reference: $ref = [ # Entry 0 is the overall control hash { sheets => 2, sheet => { "Sheet 1" => 1, "Sheet 2" => 2, }, type => "xls", }, # Entry 1 is the first sheet { label => "Sheet 1", maxrow => 2, maxcol => 4, cell => [ undef, [ undef, 1 ], [ undef, undef, undef, undef, undef, "Nugget" ], ], A1 => 1, B4 => "Nugget", }, # Entry 2 is the second sheet { label => "Sheet 2", : : To keep as close contact to spreadsheet users, row and column 1 have index 1 too in the "cell" element of the sheet hash, so cell "A1" is the same as "cell" [1, 1] (column first). To switch between the two, there are two helper functions available: "cell2cr ()" and "cr2cell ()". The "cell" hash entry contains unformatted data, while the hash entries with the traditional labels contain the formatted values (if applicable). The control hash (the first entry in the returned array ref), contains some spreadsheet metadata. The entry "sheet" is there to be able to find the sheets when accessing them by name: my %sheet2 = %{$ref->[$ref->[0]{sheet}{"Sheet 2"}]}; Functions "my $ref = ReadData ("file.xls");" "my $ref = ReadData ("file.sxc");" "my $ref = ReadData ("content.xml");" "my $ref = ReadData ($content);" Tries to convert the given file or string to the data structure described above. Currently ReadSXC does not preserve sheet order. "my $cell = cr2cell (col, row)" "cr2cell ()" converts a "(column, row)" pair (1 based) to the traditional cell notation: my $cell = cr2cell ( 4, 14); # $cell now "D14" my $cell = cr2cell (28, 4); # $cell now "AB4" "my ($col, $row) = cell2cr ($cell)" TODO Cell attributes Future plans include cell attributes, available as for example: { label => "Sheet 1", maxrow => 2, maxcol => 4, cell => [ undef, [ undef, 1 ], [ undef, undef, undef, undef, undef, "Nugget" ], ], attr => [ undef, [ undef, { color => "Red", font => "Arial", size => "12", format => "## ###.##", align => "right", }, ] [ undef, undef, undef, undef, undef, { color => "#e2e2e2", font => "LetterGothic", size => "15", format => undef, align => "left", }, ] A1 => 1, B4 => "Nugget", }, Options Try to transparently support as many options as the encapsulated modules support regarding (un)formatted values, (date) formats, hidden columns rows or fields etc. These could be implemented like "attr" above but names "meta", or just be new values in the "attr" hashes. Other spreadsheet formats I consider adding CSV Safety / flexibility Make the different formats/modules just load if available and ignore if not available. OO-ify Consider making the ref an object, though I currently don't see the big advantage (yet). Maybe I'll make it so that it is a hybrid functional / OO interface. SEE ALSO Spreadsheet::ParseExcel http://search.cpan.org/~kwitknr/ Spreadsheet::ReadSXC http://search.cpan.org/~terhechte/ Text::CSV_XS, Text::CSV http://search.cpan.org/~jwied/ http://search.cpan.org/~alancitt/ AUTHOR H.Merijn Brand, <h.m.brand@xs4all.nl> COPYRIGHT AND LICENSE Copyright (C) 2005-2005 H.Merijn Brand This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
戻る