GOAL: Convert old gallery v1 (or even v2) .dat serialized data to SQL data
Gallery v1.0: About 2001 ~ 2002, which used PHP serialized data as DB
Gallery v1.x: About 2002 ~ 2004, which I still used PHP serialized data as Database
These codes are to parse & convert legacy Gallery V1 DB *.dat file into SQL
in order to build them as static HTML pages (or Dynamic Pages Integration, yet in fact static archive).
Of course, only some of fields were retrieved/parse.
Note: Main stream Gallery v2/v3+ has integrated with MySQL, no such issues any more!
Q: What fields were reterieved from Gallery v1/v1.x album.dat?
A:
gallery_type => For my internal usage. (e.g. I got many different Gallery v1/1.x ... repositories)
albumname => Gallery v1 album folder name
imgitem => Album Caption or Album Title
imgtype => I set it as "RDIR" for root album, 'DIR' for sub-album
caption => Detail Description of this album(sub-album)
datetime => Creation date of this album
comment => Only useful for 'ROOT Album", contains all sub-album in CSV format. Empty if this is a sub-album
Q: What fields were reterieved from Gallery v1/v1.x photos.dat?
A:
gallery_type => For my internal usage. (e.g. I got many seperated/different Gallery v1/1.x ... repositories)
albumname => Gallery v1 album folder name
imgitem => Image Name
imgtype => Image Type (jpg/png/gif/...)
caption => Detail Description of this IMAGE (Image title)
datetime => Creation date of this album (== uploadDate)
comment => The comment of this Image, only include "username/datetime/comment", use some big5-utf8 translate.
Q: It looks like that you combined "photos.dat & album.dat" into single Database Table?
A: Yes, because my requirement is very simple,
make gallery v1 photo repositories into static pages.
(though the "static pages" are in fact dynamic generated from PHP + DB)
Q: Didn't see your code parsing sub-album(DIR) CSV into root album comment field?
A: To make the conversion faster, and, my root album are not that many,
so I did that __manually__.
Q: Why not using Gallery Project's Import Tool? Do you ever use official tools to import v1.0 into v2.x
A: True, I ever tried it years ago around 2007 ~ 2011;
yet, futile; the Gallery v1.x album.dat & photos.dat looked too legacy to be imported.
Q: I got lots of SPAM messages in my Gallery v1 photos.dat, what I can do?
A: I have the same issue,
but since you cloud make them into database,
SPAM-comment removal will be much easier by using something like
代碼:
SELECT * from gallery_database_table where comment
like '%some bad words%' or comment like '%another bad words%'
compared digging into thousands of photos.dat
Convert USAGE Example:
代碼:
$ find /var/www/gallery-v1/ -name "album.dat" | xargs parse_album_dat_to_database_sql.php
parse_album_dat_to_database_sql.php
代碼:
<?PHP
/*
Goal: Convert Gallery v1 album.dat format into DataBase SQL
*/
/* data */
$title = array();
$desc = array();
$dtime = array();
function is_line_still_value($line_str)
{
if (strstr($line_str, "' => "))
return false;
else
return true;
}
function get_val($str, $is_str = true, $multiple_lines = false)
{
if ($multiple_lines == false)
list($a, $b) = explode("' => ", str_replace("\r", "", $str));
else
$b = str_replace("\r", "", $str);
if ($is_str)
{
/* 字串要去掉頭尾 quote */
if (substr($b, 0, 1) == "'" && $multiple_lines == false)
{
$b = substr($b, 1, strlen($b) - 1);
}
$b = str_replace("',\n", '', $b);
}
else
{
$b = intval(str_replace(',', '', $b));
}
return $b;
}
function get_photosdat_pass_1($str, $idx)
{
global $title, $desc;
$done_1 = 0;
$done_2 = 0;
foreach(preg_split("/((\r?\n)|(\r\n?))/", $str) as $line)
{
$line .= "\n";
if ($done_1 == 0 && strstr($line, "'title' => "))
{
$title[$idx] = get_val($line);
$done_1++;
}
if ($done_2 == 0 && strstr($line, "'description' => "))
{
$desc[$idx] = get_val($line);
$done_2++;
}
if ($done_2 == 0 && strstr($line, "'description' => "))
{
$desc[$idx] = get_val($line);
$done_2++;
}
if ($done_1 && $done_2)
break;
}
/*
* Optional
*/
$title[$idx] = mb_convert_encoding($title[$idx], 'UTF-8', 'BIG-5');
$desc[$idx] = mb_convert_encoding($desc[$idx], 'UTF-8', 'BIG-5');
if (trim($desc[$idx]) == 'No description')
{
$desc[$idx] = '';
}
}
/*
* BEGIN
*/
if (isset($argv[1]) && file_exists($argv[1]))
{
echo "CHECK albums.dat Integrity and Parse it, file count=".count($argv)." ...\n";
for ($i=1; $i<(count($argv)); $i++)
{
$str = file_get_contents($argv[$i]);
$var = unserialize($str);
// echo "[$i] Checking ".$argv[$i]."\n";
if ($var === false || strlen($str) <= 3) {
echo "_____ERROR_____ @ ". $argv[$i] . "\n";
echo "ABORT: Please Fix this first! You could restore it from backup album.dat.[1~N]\n";
exit;
}
//echo "[$i] Parsing ".$argv[$i]."\n";
$str = var_export($var, true);
get_photosdat_pass_1($str, $i);
// echo $argv[$i].' => '.$title[$i].'_'.$desc[$i]."__EOF\n";
list($a, $folder, $c) = explode('/', $argv[$i]);
echo "Insert into gallery_database_table (gallery_type, albumname, imgitem, imgtype, caption, datetime, comment)
values(0, '".$folder."', '".$title[$i]."', 'DIR', '".$desc[$i]."', 0, '');\n";
// "Insert into gallery_database_table (gallery_type, albumname, imgtype, imgitem, caption)
// values(1, '$argv[$i]', '$title[$i]', 'DIR', '0', '$desc[$i]')"
}
}
else
{
echo 'find /gallery-folder/ -name "album.dat" | xargs php parse_album_dat.php';
}
exit;
parse_photos_dat_to_database_sql.php
代碼:
<?PHP
/*
Goal: Convert Gallery v1 photos.dat format into DataBase SQL
*/
/*
找法:
1. 先 parse var export 一行一行找,找到關鍵字
2. 如果下一行,沒有 '=>' '[' ']' 那麼下一行還是資料
資料欄位
auto-idx, folder, imgname, imgtype, caption, date, comment(serialized)
*/
/* data */
$imgname = array();
$imgtype = array();
$caption = array();
$dateint = array(); /* int */
$comment = array(); /* three diementional array */
function is_line_still_value($line_str)
{
if (strstr($line_str, "' => "))
return false;
else
return true;
}
function get_val($str, $is_str = true, $multiple_lines = false)
{
// 'name' => 'waidai_05',
// 'type' => 'jpg',
//
// 'uploadDate' => 123124312,
if ($multiple_lines == false)
list($a, $b) = explode("' => ", str_replace("\r", "", $str));
else
$b = str_replace("\r", "", $str);
if ($is_str)
{
/* 字串要去掉頭尾 quote */
if (substr($b, 0, 1) == "'" && $multiple_lines == false)
{
$b = substr($b, 1, strlen($b) - 1);
}
$b = str_replace("',\n", '', $b);
}
else
{
$b = intval(str_replace(',', '', $b));
}
return $b;
}
/*
Pass #1 從零開始找
*/
function get_photosdat_pass_1($str, $idx)
{
global $imgname, $imgtype, $caption, $dateint;
$done_1 = 0;
$done_2 = 0;
$done_3 = 0;
$done_4 = 0;
foreach(preg_split("/((\r?\n)|(\r\n?))/", $str) as $line)
{
$line .= "\n";
if ($done_1 == 0 && strstr($line, "'name' => "))
{
$imgname[$idx] = get_val($line);
$done_1++;
}
if ($done_2 == 0 && strstr($line, "'type' => "))
{
$imgtype[$idx] = get_val($line);
$done_2++;
}
if (($done_3 < 31 && $done_3 > 0) || strstr($line, "'caption' => "))
{
if ($done_3 == 0)
$caption[$idx] = get_val($line);
else
{
if (is_line_still_value($line) == true)
$caption[$idx] .= get_val($line, true, true);
else
$done_3 = 31;
}
$done_3++;
}
if ($done_4 == 0 && strstr($line, "'uploadDate' => "))
{
$dateint[$idx] = get_val($line, false);
$done_4++;
}
if ($done_1 && $done_2 && $done_3 >= 31 && $done_4)
break;
}
/*
* Optional
*/
$imgname[$idx] = mb_convert_encoding($imgname[$idx], 'UTF-8', 'BIG-5');
$caption[$idx] = mb_convert_encoding($caption[$idx], 'UTF-8', 'BIG-5');
}
/*
Pass #2 從 ["comments"]=> 往後找下列三個
["commentText"]=>
["datePosted"]=>
["name"]=>
*/
function get_photosdat_pass_2($str, $idx)
{
global $comment;
$done_1 = 0;
$done_2 = 0;
$done_3 = 0;
$done_4 = 0;
$comment_count = 0;
$comment[$idx] = array();
$comment[$idx][$comment_count] = array(3);
list($tmp, $str) = explode("'comments' => ", $str);
foreach(preg_split("/((\r?\n)|(\r\n?))/", $str) as $line)
{
$line .= "\n";
if ($done_1 == 0 && strstr($line, "'name' => "))
{
$comment[$idx][$comment_count][0] = get_val($line);
$done_1++;
}
if ($done_2 == 0 && strstr($line, "'datePosted' => "))
{
$comment[$idx][$comment_count][1] = get_val($line, false);
$done_2++;
}
if (($done_3 < 31 && $done_3 > 0) || strstr($line, "'commentText' => "))
{
if ($done_3 == 0)
$comment[$idx][$comment_count][2] = get_val($line);
else
{
if (is_line_still_value($line) == true)
$comment[$idx][$comment_count][2] .= get_val($line, true, true);
else
$done_3 = 31;
}
$done_3++;
}
if ($done_3 >= 31 && $done_1 && $done_2)
{
$done_1 = 0;
$done_2 = 0;
$done_3 = 0;
/*
* Clean Gallery's SPAM comments
*/
if ( (stristr($comment[$idx][$comment_count][2], "url="))
|| (stristr($comment[$idx][$comment_count][2], "url:"))
|| (substr_count($comment[$idx][$comment_count][2], 'http') >= 2)
|| (strlen(trim($comment[$idx][$comment_count][2])) <= 2)
|| ($comment[$idx][$comment_count][2] == $comment[$idx][$comment_count][0])
|| (mb_detect_encoding($comment[$idx][$comment_count][0]) == 'ASCII'
&& mb_detect_encoding($comment[$idx][$comment_count][2]) == 'ASCII')
)
{
$comment[$idx][$comment_count] = null;
continue;
}
/*
* Optional: convert BIG-5 --> UTF-8
*/
$comment[$idx][$comment_count][0] = mb_convert_encoding($comment[$idx][$comment_count][0], 'UTF-8', 'BIG-5');
$comment[$idx][$comment_count][2] = mb_convert_encoding($comment[$idx][$comment_count][2], 'UTF-8', 'BIG-5');
$comment_count++;
$comment[$idx][$comment_count] = array(3);
}
}
}
/*
* BEGIN
*/
if (isset($argv[1]) && file_exists($argv[1]))
{
echo "CHECK Photos.dat Integrity and Parse it, file count=".count($argv)." ...\n";
for ($i=1; $i<(count($argv)); $i++)
{
$str = file_get_contents($argv[$i]);
$var = unserialize($str);
// echo "[$i] Checking ".$argv[$i]."\n";
if ($var === false || strlen($str) <= 3) {
echo "_____ERROR_____ @ ". $argv[$i] . "\n";
echo "ABORT: Please Fix this first! You could restore it from backup photos.dat.[1~N]\n";
exit;
}
// echo "[$i] Parsing ".$argv[$i]."\n";
for ($j=0; $j<count($var); $j++)
{
$str = var_export($var[$j], true);
get_photosdat_pass_1($str, $j);
get_photosdat_pass_2($str, $j);
/*
var_dump($imgname[$j]);
var_dump($imgtype[$j]);
var_dump($caption[$j]);
var_dump($dateint[$j]);
var_dump($comment[$j]);
*/
$comment[$j] = serialize($comment[$j]);
list($a, $folder, $c) = explode('/', $argv[$i]);
echo "Insert into gallery_database_table (gallery_type, albumname, imgitem, imgtype, caption, datetime, comment)
values(0, '".$folder."', '".$imgname[$j]."', '".$imgtype[$j]."', '".$caption[$j]."', ".$dateint[$j].",
'".$comment[$j]."');\n";
}
}
}
else
{
echo 'find /gallery-folder/ -name "photos.dat" | xargs php parse_photos_dat.php';
}
exit;
________________
美術插畫設計案子報價系統 v0.1 Beta
爪哇禾雀
|