首頁 發表: 美術繪圖 | 美術設計 | 攝影 | 熱門標籤 美術討論: 美術工作 | 美術比賽 | 展覽活動 | 美術相關 | 一般討論 | 美術同好 CG 討論 :: Photoshop | Painter | 3D 行動 | AMP

【 立即註冊 】 : 更改個人資料 : : 登入

會員名稱: 登入密碼: 保持登入

PHP: Convert Gallery 1/1.x photos.dat & album.dat to SQL(DB)

發表新主題 回覆主題 討論區 Windows, Linux, Perl, PHP, C/C++, Driver, Web 理論、應用、硬體、軟體

| 1頁, 共1
發表人 人氣點閱:1309
Type



繪圖畫廊設計藝廊
攝影相簿留言板
最愛收藏分類標籤
暱稱: Type
註冊: 2002-11-30
發表: 9071
來自: vovo2000.com
V幣: 887257
@ 1F
PHP: Convert Gallery 1/1.x photos.dat & album.dat to SQL(DB) 2014-06-30 00:25



GOAL: Convert old gallery v1 (or even v2) .dat serialized data to SQL data

Gallery v1.0: About 2001 ~ 2002, which used PHP serialized data as DB
Gallery v1.x: About 2002 ~ 2004, which I still used PHP serialized data as Database


These codes are to parse & convert legacy Gallery V1 DB *.dat file into SQL
in order to build them as static HTML pages (or Dynamic Pages Integration, yet in fact static archive).
Of course, only some of fields were retrieved/parse.

Note: Main stream Gallery v2/v3+ has integrated with MySQL, no such issues any more!



Q: What fields were reterieved from Gallery v1/v1.x album.dat?
A:
gallery_type => For my internal usage. (e.g. I got many different Gallery v1/1.x ... repositories)
albumname => Gallery v1 album folder name
imgitem => Album Caption or Album Title
imgtype => I set it as "RDIR" for root album, 'DIR' for sub-album
caption => Detail Description of this album(sub-album)
datetime => Creation date of this album
comment => Only useful for 'ROOT Album", contains all sub-album in CSV format. Empty if this is a sub-album


Q: What fields were reterieved from Gallery v1/v1.x photos.dat?
A:
gallery_type => For my internal usage. (e.g. I got many seperated/different Gallery v1/1.x ... repositories)
albumname => Gallery v1 album folder name
imgitem => Image Name
imgtype => Image Type (jpg/png/gif/...)
caption => Detail Description of this IMAGE (Image title)
datetime => Creation date of this album (== uploadDate)
comment => The comment of this Image, only include "username/datetime/comment", use some big5-utf8 translate.


Q: It looks like that you combined "photos.dat & album.dat" into single Database Table?
A: Yes, because my requirement is very simple,
make gallery v1 photo repositories into static pages.
(though the "static pages" are in fact dynamic generated from PHP + DB)

Q: Didn't see your code parsing sub-album(DIR) CSV into root album comment field?
A: To make the conversion faster, and, my root album are not that many,
so I did that __manually__.


Q: Why not using Gallery Project's Import Tool? Do you ever use official tools to import v1.0 into v2.x
A: True, I ever tried it years ago around 2007 ~ 2011;
yet, futile; the Gallery v1.x album.dat & photos.dat looked too legacy to be imported.


Q: I got lots of SPAM messages in my Gallery v1 photos.dat, what I can do?

A: I have the same issue,
but since you cloud make them into database,
SPAM-comment removal will be much easier by using something like
代碼:

SELECT * from gallery_database_table where comment
like '%some bad words%' or comment like '%another bad words%'

compared digging into thousands of photos.dat




Convert USAGE Example:
代碼:

$ find /var/www/gallery-v1/ -name "album.dat" | xargs parse_album_dat_to_database_sql.php



parse_album_dat_to_database_sql.php


代碼:

<?PHP

/*
Goal: Convert Gallery v1 album.dat format into DataBase SQL
*/

/* data */
$title = array();
$desc = array();
$dtime = array();

function is_line_still_value($line_str)
{
   if (strstr($line_str, "' => "))
      return false;
   else
      return true;
}

function get_val($str, $is_str = true, $multiple_lines = false)
{

   if ($multiple_lines == false)
      list($a, $b) = explode("' => ", str_replace("\r", "", $str));
   else
      $b =  str_replace("\r", "", $str);

   if ($is_str)
   {
      /* 字串要去掉頭尾 quote */
      if (substr($b, 0, 1) == "'" && $multiple_lines == false)
      {
         $b = substr($b, 1, strlen($b) - 1);
      }
      
      $b = str_replace("',\n", '', $b);
   }
   else
   {
      $b = intval(str_replace(',', '', $b));
   }

   return $b;
}

function get_photosdat_pass_1($str, $idx)
{
   global $title, $desc;
   $done_1 = 0;
   $done_2 = 0;
   foreach(preg_split("/((\r?\n)|(\r\n?))/", $str) as $line)
   {
      $line .= "\n";
   
      if ($done_1 == 0 && strstr($line, "'title' => "))
      {
         $title[$idx] = get_val($line);
         $done_1++;
      }
      
      if ($done_2 == 0 && strstr($line, "'description' => "))
      {
         $desc[$idx] = get_val($line);
         $done_2++;
      }
      
      if ($done_2 == 0 && strstr($line, "'description' => "))
      {
         $desc[$idx] = get_val($line);
         $done_2++;
      }

      if ($done_1 && $done_2)
         break;
   }

   /*
    * Optional
    */
   $title[$idx] = mb_convert_encoding($title[$idx], 'UTF-8', 'BIG-5');
   $desc[$idx] = mb_convert_encoding($desc[$idx], 'UTF-8', 'BIG-5');

   if (trim($desc[$idx]) == 'No description')
   {
      $desc[$idx] = '';
   }
}


/*
* BEGIN
*/
if (isset($argv[1]) && file_exists($argv[1]))
{

   echo "CHECK albums.dat Integrity and Parse it, file count=".count($argv)." ...\n";

   for ($i=1; $i<(count($argv)); $i++)
   {
      $str = file_get_contents($argv[$i]);
      $var = unserialize($str);

      // echo "[$i] Checking ".$argv[$i]."\n";
      if ($var === false || strlen($str) <= 3) {
         echo "_____ERROR_____ @ ". $argv[$i] . "\n";
         echo "ABORT: Please Fix this first! You could restore it from backup album.dat.[1~N]\n";
         exit;
      }

      //echo "[$i] Parsing ".$argv[$i]."\n";
      
      $str = var_export($var, true);
      get_photosdat_pass_1($str, $i);
      
      // echo $argv[$i].' => '.$title[$i].'_'.$desc[$i]."__EOF\n";

      list($a, $folder, $c) = explode('/', $argv[$i]);

      echo "Insert into gallery_database_table (gallery_type, albumname, imgitem, imgtype, caption, datetime, comment)
             values(0, '".$folder."', '".$title[$i]."', 'DIR', '".$desc[$i]."', 0, '');\n";

      // "Insert into gallery_database_table (gallery_type, albumname, imgtype, imgitem, caption)
      //   values(1, '$argv[$i]', '$title[$i]', 'DIR', '0', '$desc[$i]')"
   }

}
else
{
   echo 'find /gallery-folder/ -name "album.dat" | xargs php parse_album_dat.php';
}
exit;



parse_photos_dat_to_database_sql.php


代碼:

<?PHP

/*
Goal: Convert Gallery v1 photos.dat format into DataBase SQL
*/


/*
找法:
1. 先 parse var export 一行一行找,找到關鍵字
2. 如果下一行,沒有 '=>' '[' ']'   那麼下一行還是資料

資料欄位
auto-idx, folder, imgname, imgtype, caption, date, comment(serialized)
*/


/* data */
$imgname = array();
$imgtype = array();
$caption = array();
$dateint = array(); /* int */
$comment = array(); /* three diementional array */

function is_line_still_value($line_str)
{
   if (strstr($line_str, "' => "))
      return false;
   else
      return true;
}

function get_val($str, $is_str = true, $multiple_lines = false)
{
   //   'name' => 'waidai_05',
   //  'type' => 'jpg',
   //
   //  'uploadDate' => 123124312,     

   if ($multiple_lines == false)
      list($a, $b) = explode("' => ", str_replace("\r", "", $str));
   else
      $b =  str_replace("\r", "", $str);

   if ($is_str)
   {
      /* 字串要去掉頭尾 quote */
      if (substr($b, 0, 1) == "'" && $multiple_lines == false)
      {
         $b = substr($b, 1, strlen($b) - 1);
      }
      
      $b = str_replace("',\n", '', $b);
   }
   else
   {
      $b = intval(str_replace(',', '', $b));
   }

   return $b;
}

/*
Pass #1 從零開始找
*/
function get_photosdat_pass_1($str, $idx)
{
   global $imgname, $imgtype, $caption, $dateint;
   $done_1 = 0;
   $done_2 = 0;
   $done_3 = 0;
   $done_4 = 0;
   foreach(preg_split("/((\r?\n)|(\r\n?))/", $str) as $line)
   {
      $line .= "\n";
   
      if ($done_1 == 0 && strstr($line, "'name' => "))
      {
         $imgname[$idx] = get_val($line);
         $done_1++;
      }
      
      if ($done_2 == 0 && strstr($line, "'type' => "))
      {
         $imgtype[$idx] = get_val($line);
         $done_2++;
      }
      
      if (($done_3 < 31 && $done_3 > 0) || strstr($line, "'caption' => "))
      {
         if ($done_3 == 0)
            $caption[$idx] = get_val($line);
         else
         {
            if (is_line_still_value($line) == true)
               $caption[$idx] .= get_val($line, true, true);
            else
               $done_3 = 31;
         }
         $done_3++;
      }
      
      if ($done_4 == 0 && strstr($line, "'uploadDate' => "))
      {
         $dateint[$idx] = get_val($line, false);
         $done_4++;
      }

      if ($done_1 && $done_2 && $done_3 >= 31  && $done_4)
         break;
   }

   /*
    * Optional
    */
   $imgname[$idx] = mb_convert_encoding($imgname[$idx], 'UTF-8', 'BIG-5');
   $caption[$idx] = mb_convert_encoding($caption[$idx], 'UTF-8', 'BIG-5');
}

/*
Pass #2 從 ["comments"]=> 往後找下列三個
["commentText"]=>
["datePosted"]=>
["name"]=>
*/
function get_photosdat_pass_2($str, $idx)
{
   global $comment;
   $done_1 = 0;
   $done_2 = 0;
   $done_3 = 0;
   $done_4 = 0;

   $comment_count = 0;

   $comment[$idx] = array();
   $comment[$idx][$comment_count] = array(3);

   list($tmp, $str) = explode("'comments' => ", $str);

   foreach(preg_split("/((\r?\n)|(\r\n?))/", $str) as $line)
   {
      $line .= "\n";

      if ($done_1 == 0 && strstr($line, "'name' => "))
      {
         $comment[$idx][$comment_count][0] = get_val($line);
         $done_1++;
      }
      
      if ($done_2 == 0 && strstr($line, "'datePosted' => "))
      {
         $comment[$idx][$comment_count][1] = get_val($line, false);
         $done_2++;
      }
      
      if (($done_3 < 31 && $done_3 > 0) || strstr($line, "'commentText' => "))
      {
         if ($done_3 == 0)
            $comment[$idx][$comment_count][2] = get_val($line);
         else
         {
            if (is_line_still_value($line) == true)
               $comment[$idx][$comment_count][2] .= get_val($line, true, true);
            else
               $done_3 = 31;
         }
         $done_3++;
      }
   
      if ($done_3 >= 31 && $done_1 && $done_2)
      {

         $done_1 = 0;
         $done_2 = 0;
         $done_3 = 0;

         /*
          * Clean Gallery's SPAM comments
          */
         if (  (stristr($comment[$idx][$comment_count][2], "url="))
            || (stristr($comment[$idx][$comment_count][2], "url:"))
            || (substr_count($comment[$idx][$comment_count][2], 'http') >= 2)
            || (strlen(trim($comment[$idx][$comment_count][2])) <= 2)
            || ($comment[$idx][$comment_count][2] == $comment[$idx][$comment_count][0])
            || (mb_detect_encoding($comment[$idx][$comment_count][0]) == 'ASCII'
               && mb_detect_encoding($comment[$idx][$comment_count][2]) == 'ASCII')
               )   
         {
            $comment[$idx][$comment_count] = null;
            continue;
         }
         
         /*
          * Optional: convert BIG-5 --> UTF-8
          */
         $comment[$idx][$comment_count][0] = mb_convert_encoding($comment[$idx][$comment_count][0], 'UTF-8', 'BIG-5');
         $comment[$idx][$comment_count][2] = mb_convert_encoding($comment[$idx][$comment_count][2], 'UTF-8', 'BIG-5');
         

         $comment_count++;
         $comment[$idx][$comment_count] = array(3);
      }
   }
}

/*
* BEGIN
*/
if (isset($argv[1]) && file_exists($argv[1]))
{

   echo "CHECK Photos.dat Integrity and Parse it, file count=".count($argv)." ...\n";

   for ($i=1; $i<(count($argv)); $i++)
   {
      $str = file_get_contents($argv[$i]);
      $var = unserialize($str);

//      echo "[$i] Checking ".$argv[$i]."\n";
      if ($var === false || strlen($str) <= 3) {
         echo "_____ERROR_____ @ ". $argv[$i] . "\n";
         echo "ABORT: Please Fix this first! You could restore it from backup photos.dat.[1~N]\n";
         exit;
      }

//      echo "[$i] Parsing ".$argv[$i]."\n";
      for ($j=0; $j<count($var); $j++)
      {
         $str = var_export($var[$j], true);
         get_photosdat_pass_1($str, $j);
         get_photosdat_pass_2($str, $j);

         /*
         var_dump($imgname[$j]);
         var_dump($imgtype[$j]);
         var_dump($caption[$j]);
         var_dump($dateint[$j]);
         var_dump($comment[$j]);
          */
         $comment[$j] = serialize($comment[$j]);
         list($a, $folder, $c) = explode('/', $argv[$i]);
         
         echo "Insert into gallery_database_table (gallery_type, albumname, imgitem, imgtype, caption, datetime, comment)
               values(0, '".$folder."', '".$imgname[$j]."', '".$imgtype[$j]."', '".$caption[$j]."', ".$dateint[$j].",
               '".$comment[$j]."');\n";
      }
   }
}
else
{
   echo 'find /gallery-folder/ -name "photos.dat" | xargs php parse_photos_dat.php';
}
exit;



________________

美術插畫設計案子報價系統 v0.1 Beta 版
爪哇禾雀





資訊相關理論、技術、管理、應用、產品等
發表新主題 回覆主題
一般討論
發表:美術繪圖、插畫設計
新聞、活動、評論、公告
最近 10 則討論
日本秘境賞櫻【花美*爛漫中】
By yeasonth R:0 V:88 02/08
日本秘境賞櫻【花美*花間奏】
By yeasonth R:0 V:84 02/08
日本秘境賞櫻【美花*收藏中】
By yeasonth R:0 V:97 02/07
2019 早療協會豬年賀圖
By Type R:0 V:131 02/02
勵友中心 祝福您 新年快樂
By Type R:0 V:106 02/02
最近 10 則討論
2017銀河遊俠2018
By .*(ॐ)宇宙的聲音 *. R:18 V:1250 02/19
a story
By Type R:1 V:426 02/19
practice
By Type R:2 V:418 02/19
府城摩登大遊行
By 小創作 R:2 V:98 02/18
系列作第二張:綠林
By 杜青 R:2 V:771 02/17
母雞帶小雞 陳正浩-釘線畫
By 杜青 R:3 V:2505 02/17
白荷:蓮花
By 杜青 R:2 V:46 02/17
-野棋-
By 知徐 R:0 V:47 02/16
戀與制作人-李澤言
By 文子 R:2 V:85 02/16
doodle
By Type R:1 V:57 02/16
最近 10 則討論
陳肇彤金屬創作展
By 金車文藝中心 R:0 H:28 02/16
吳文哲-油畫創作展「港都城事」
By 金車文藝中心 R:0 H:47 02/16
蘭陽藝遊-金車油畫獎歷屆首獎作品展
By 金車文藝中心 R:0 H:347 12/24
蘭陽藝遊-1-4月
By 金車文藝中心 R:0 H:297 12/24