首頁 繪圖設計 工作閒聊 比賽活動 美術討論 標籤 圖片
PHP: Convert Gallery 1/1.x photos.dat & album.dat to SQL(DB)
Type(Type) 2014/6/30 00:25

GOAL: Convert old gallery v1 (or even v2) .dat serialized data to SQL data Gallery v1.0: About 2001 ~ 2002, which used PHP serialized data as DBGallery v1.x: About 2002 ~ 2004, which I still used PHP serialized data as DatabaseThese codes are to parse & convert legacy Gallery V1 DB *.dat file into SQLin order to build them as static HTML pages (or Dynamic Pages Integration, yet in fact static archive).Of course, only some of fields were retrieved/parse. Note: Main stream Gallery v2/v3+ has integrated with MySQL, no such issues any more!Q: What fields were reterieved from Gallery v1/v1.x album.dat?A: gallery_type => For my internal usage. (e.g. I got many different Gallery v1/1.x ... repositories)albumname => Gallery v1 album folder name imgitem => Album Caption or Album Titleimgtype => I set it as "RDIR" for root album, 'DIR' for sub-albumcaption => Detail Description of this album(sub-album)datetime => Creation date of this albumcomment => Only useful for 'ROOT Album", contains all sub-album in CSV format. Empty if this is a sub-albumQ: What fields were reterieved from Gallery v1/v1.x photos.dat?A: gallery_type => For my internal usage. (e.g. I got many seperated/different Gallery v1/1.x ... repositories)albumname => Gallery v1 album folder name imgitem => Image Nameimgtype => Image Type (jpg/png/gif/...)caption => Detail Description of this IMAGE (Image title)datetime => Creation date of this album (== uploadDate)comment => The comment of this Image, only include "username/datetime/comment", use some big5-utf8 translate.Q: It looks like that you combined "photos.dat & album.dat" into single Database Table?A: Yes, because my requirement is very simple, make gallery v1 photo repositories into static pages.(though the "static pages" are in fact dynamic generated from PHP + DB)Q: Didn't see your code parsing sub-album(DIR) CSV into root album comment field?A: To make the conversion faster, and, my root album are not that many, so I did that __manually__. Q: Why not using Gallery Project's Import Tool? Do you ever use official tools to import v1.0 into v2.xA: True, I ever tried it years ago around 2007 ~ 2011;yet, futile; the Gallery v1.x album.dat & photos.dat looked too legacy to be imported.Q: I got lots of SPAM messages in my Gallery v1 photos.dat, what I can do?A: I have the same issue,but since you cloud make them into database,SPAM-comment removal will be much easier by using something like代碼:

SELECT * from gallery_database_table where comment like '%some bad words%' or comment like '%another bad words%'
compared digging into thousands of photos.datConvert USAGE Example: 代碼:
$ find /var/www/gallery-v1/ -name "album.dat" | xargs parse_album_dat_to_database_sql.php

parse_album_dat_to_database_sql.php

代碼:
<?PHP/* Goal: Convert Gallery v1 album.dat format into DataBase SQL *//* data */$title = array();$desc = array();$dtime = array();function is_line_still_value($line_str){   if (strstr($line_str, "' => "))      return false;   else      return true;}function get_val($str, $is_str = true, $multiple_lines = false){   if ($multiple_lines == false)      list($a, $b) = explode("' => ", str_replace("\r", "", $str));   else      $b =  str_replace("\r", "", $str);   if ($is_str)   {      /* 字串要去掉頭尾 quote */      if (substr($b, 0, 1) == "'" && $multiple_lines == false)      {         $b = substr($b, 1, strlen($b) - 1);      }            $b = str_replace("',\n", '', $b);   }   else   {      $b = intval(str_replace(',', '', $b));   }   return $b;}function get_photosdat_pass_1($str, $idx){   global $title, $desc;   $done_1 = 0;   $done_2 = 0;   foreach(preg_split("/((\r?\n)|(\r\n?))/", $str) as $line)   {      $line .= "\n";         if ($done_1 == 0 && strstr($line, "'title' => "))      {         $title[$idx] = get_val($line);         $done_1++;      }            if ($done_2 == 0 && strstr($line, "'description' => "))      {         $desc[$idx] = get_val($line);         $done_2++;      }            if ($done_2 == 0 && strstr($line, "'description' => "))      {         $desc[$idx] = get_val($line);         $done_2++;      }      if ($done_1 && $done_2)         break;   }   /*    * Optional    */   $title[$idx] = mb_convert_encoding($title[$idx], 'UTF-8', 'BIG-5');   $desc[$idx] = mb_convert_encoding($desc[$idx], 'UTF-8', 'BIG-5');   if (trim($desc[$idx]) == 'No description')   {      $desc[$idx] = '';   }}/* * BEGIN */if (isset($argv[1]) && file_exists($argv[1])){   echo "CHECK albums.dat Integrity and Parse it, file count=".count($argv)." ...\n";   for ($i=1; $i<(count($argv)); $i++)   {      $str = file_get_contents($argv[$i]);      $var = unserialize($str);      // echo "[$i] Checking ".$argv[$i]."\n";      if ($var === false || strlen($str) <= 3) {         echo "_____ERROR_____ @ ". $argv[$i] . "\n";         echo "ABORT: Please Fix this first! You could restore it from backup album.dat.[1~N]\n";         exit;      }      //echo "[$i] Parsing ".$argv[$i]."\n";            $str = var_export($var, true);      get_photosdat_pass_1($str, $i);            // echo $argv[$i].' => '.$title[$i].'_'.$desc[$i]."__EOF\n";      list($a, $folder, $c) = explode('/', $argv[$i]);      echo "Insert into gallery_database_table (gallery_type, albumname, imgitem, imgtype, caption, datetime, comment)             values(0, '".$folder."', '".$title[$i]."', 'DIR', '".$desc[$i]."', 0, '');\n";      // "Insert into gallery_database_table (gallery_type, albumname, imgtype, imgitem, caption)      //   values(1, '$argv[$i]', '$title[$i]', 'DIR', '0', '$desc[$i]')"   }}else{   echo 'find /gallery-folder/ -name "album.dat" | xargs php parse_album_dat.php';}exit;



parse_photos_dat_to_database_sql.php

代碼:
<?PHP/* Goal: Convert Gallery v1 photos.dat format into DataBase SQL *//*找法:1. 先 parse var export 一行一行找,找到關鍵字2. 如果下一行,沒有 '=>' '[' ']'   那麼下一行還是資料資料欄位auto-idx, folder, imgname, imgtype, caption, date, comment(serialized)*//* data */$imgname = array();$imgtype = array();$caption = array();$dateint = array(); /* int */$comment = array(); /* three diementional array */function is_line_still_value($line_str){   if (strstr($line_str, "' => "))      return false;   else      return true;}function get_val($str, $is_str = true, $multiple_lines = false){   //   'name' => 'waidai_05',   //  'type' => 'jpg',   //   //  'uploadDate' => 123124312,        if ($multiple_lines == false)      list($a, $b) = explode("' => ", str_replace("\r", "", $str));   else      $b =  str_replace("\r", "", $str);   if ($is_str)   {      /* 字串要去掉頭尾 quote */      if (substr($b, 0, 1) == "'" && $multiple_lines == false)      {         $b = substr($b, 1, strlen($b) - 1);      }            $b = str_replace("',\n", '', $b);   }   else   {      $b = intval(str_replace(',', '', $b));   }   return $b;}/*Pass #1 從零開始找*/function get_photosdat_pass_1($str, $idx){   global $imgname, $imgtype, $caption, $dateint;   $done_1 = 0;   $done_2 = 0;   $done_3 = 0;   $done_4 = 0;   foreach(preg_split("/((\r?\n)|(\r\n?))/", $str) as $line)   {      $line .= "\n";         if ($done_1 == 0 && strstr($line, "'name' => "))      {         $imgname[$idx] = get_val($line);         $done_1++;      }            if ($done_2 == 0 && strstr($line, "'type' => "))      {         $imgtype[$idx] = get_val($line);         $done_2++;      }            if (($done_3 < 31 && $done_3 > 0) || strstr($line, "'caption' => "))      {         if ($done_3 == 0)            $caption[$idx] = get_val($line);         else         {            if (is_line_still_value($line) == true)               $caption[$idx] .= get_val($line, true, true);            else               $done_3 = 31;         }         $done_3++;      }            if ($done_4 == 0 && strstr($line, "'uploadDate' => "))      {         $dateint[$idx] = get_val($line, false);         $done_4++;      }      if ($done_1 && $done_2 && $done_3 >= 31  && $done_4)         break;   }   /*    * Optional    */   $imgname[$idx] = mb_convert_encoding($imgname[$idx], 'UTF-8', 'BIG-5');   $caption[$idx] = mb_convert_encoding($caption[$idx], 'UTF-8', 'BIG-5');}/*Pass #2 從 ["comments"]=> 往後找下列三個["commentText"]=>["datePosted"]=>["name"]=> */function get_photosdat_pass_2($str, $idx){   global $comment;   $done_1 = 0;   $done_2 = 0;   $done_3 = 0;   $done_4 = 0;   $comment_count = 0;   $comment[$idx] = array();   $comment[$idx][$comment_count] = array(3);   list($tmp, $str) = explode("'comments' => ", $str);   foreach(preg_split("/((\r?\n)|(\r\n?))/", $str) as $line)   {      $line .= "\n";      if ($done_1 == 0 && strstr($line, "'name' => "))      {         $comment[$idx][$comment_count][0] = get_val($line);         $done_1++;      }            if ($done_2 == 0 && strstr($line, "'datePosted' => "))      {         $comment[$idx][$comment_count][1] = get_val($line, false);         $done_2++;      }            if (($done_3 < 31 && $done_3 > 0) || strstr($line, "'commentText' => "))      {         if ($done_3 == 0)            $comment[$idx][$comment_count][2] = get_val($line);         else         {            if (is_line_still_value($line) == true)               $comment[$idx][$comment_count][2] .= get_val($line, true, true);            else               $done_3 = 31;         }         $done_3++;      }         if ($done_3 >= 31 && $done_1 && $done_2)      {         $done_1 = 0;         $done_2 = 0;         $done_3 = 0;         /*          * Clean Gallery's SPAM comments          */         if (  (stristr($comment[$idx][$comment_count][2], "url="))            || (stristr($comment[$idx][$comment_count][2], "url:"))            || (substr_count($comment[$idx][$comment_count][2], 'http') >= 2)            || (strlen(trim($comment[$idx][$comment_count][2])) <= 2)            || ($comment[$idx][$comment_count][2] == $comment[$idx][$comment_count][0])            || (mb_detect_encoding($comment[$idx][$comment_count][0]) == 'ASCII'               && mb_detect_encoding($comment[$idx][$comment_count][2]) == 'ASCII')               )            {            $comment[$idx][$comment_count] = null;            continue;         }                  /*          * Optional: convert BIG-5 --> UTF-8          */         $comment[$idx][$comment_count][0] = mb_convert_encoding($comment[$idx][$comment_count][0], 'UTF-8', 'BIG-5');         $comment[$idx][$comment_count][2] = mb_convert_encoding($comment[$idx][$comment_count][2], 'UTF-8', 'BIG-5');                  $comment_count++;         $comment[$idx][$comment_count] = array(3);      }   }}/* * BEGIN */if (isset($argv[1]) && file_exists($argv[1])){   echo "CHECK Photos.dat Integrity and Parse it, file count=".count($argv)." ...\n";   for ($i=1; $i<(count($argv)); $i++)   {      $str = file_get_contents($argv[$i]);      $var = unserialize($str);//      echo "[$i] Checking ".$argv[$i]."\n";      if ($var === false || strlen($str) <= 3) {         echo "_____ERROR_____ @ ". $argv[$i] . "\n";         echo "ABORT: Please Fix this first! You could restore it from backup photos.dat.[1~N]\n";         exit;      }//      echo "[$i] Parsing ".$argv[$i]."\n";      for ($j=0; $j<count($var); $j++)      {         $str = var_export($var[$j], true);         get_photosdat_pass_1($str, $j);         get_photosdat_pass_2($str, $j);         /*         var_dump($imgname[$j]);         var_dump($imgtype[$j]);         var_dump($caption[$j]);         var_dump($dateint[$j]);         var_dump($comment[$j]);          */         $comment[$j] = serialize($comment[$j]);         list($a, $folder, $c) = explode('/', $argv[$i]);                  echo "Insert into gallery_database_table (gallery_type, albumname, imgitem, imgtype, caption, datetime, comment)               values(0, '".$folder."', '".$imgname[$j]."', '".$imgtype[$j]."', '".$caption[$j]."', ".$dateint[$j].",                '".$comment[$j]."');\n";      }   }}else{   echo 'find /gallery-folder/ -name "photos.dat" | xargs php parse_photos_dat.php';}exit;


(1,437 views)
[更多討論] 討論區 Windows, Linux, Perl, PHP, C/C++, Driver, Web 理論、應用、硬體、軟體


"PHP: Convert Gallery 1/1.x photos.dat & album.dat to SQL(DB)" 傳統頁面(電腦版)

首頁 繪圖設計 工作閒聊 比賽活動 美術討論 標籤 圖片
傳統桌面版
© Vovo2000.com Mobile Version 小哈手機版 2019