Just another simple script to parse contents

Forum for volunteer developers working on Baka-Tsuki related applications (Baka-Reader, BTprince, etc).

Moderators: thelastguardian, Fringe Security Bureau, Senior Editors, Senior Translators, Executive Council, Project Translators, Project Editors, Alt. Language Translator/Editor

Just another simple script to parse contents

Postby kuroneko » Sat Mar 23, 2013 8:20 am

This is a simple code that I coded a while ago, it's kind of rush and needs a bit tweaking like.I used url-style user-defined functions of Wordpress (which is under GNU GPL) here just so we have unified format of "fetching" data "individually".

What I mean by I coded this a while ago have something to do on the functions prefix. A advanced generator + cms is a good plan but It's not viable at the moment though because I'm currently finalizing a cms that's why I shared this code.

If you find the code useful then great.

Index.php
Code: Select all
<?php
   include_once ( 'functions.php' );
   $url = 'http://www.baka-tsuki.org/project/index.php?title=';
?>
<?php frost_get_doctype( 'type=xhtml_1_0_strict' ); ?>
<html lang="en" dir="ltr">
<head>
<meta charset="UTF-8" />
<title><?php level4_get_content( 'url=' . $url . '&page=Chrome_Shelled_Regios:Volume1_Chapter1&tag_start=<!-- firstHeading -->&tag_end=<!-- /firstHeading -->&strip=true' ); ?></title>
<meta name="generator" content="Level4" />
</head>
<body>
<?php
level4_get_content( 'url=' . $url . '&page=Chrome_Shelled_Regios:Volume1_Chapter1&tag_start=<!-- firstHeading -->&tag_end=<!-- /firstHeading -->' );
print level4_content( 'url=' . $url . '&page=Chrome_Shelled_Regios:Volume1_Chapter1&tag_start=<!-- bodycontent -->&tag_end=<!-- /bodycontent -->' );
?>
</body>
</html>


functions.php
Code: Select all
<?php
//if ( ! defined( 'direct_access' ) ) { die( 'Level4' ); }

/**
 * Level4 Fetch wiki content.
 *
 * @package Level4
 */

/**
 * Navigates through an array and removes slashes from the values.
 *
 * If an array is passed, the array_map() function causes a callback to pass the
 * value back to the function. The slashes from this value will removed.
 *
 * @source Wordpress
 * @since 2.0.0
 *
 * @param array|string $value The array or string to be stripped.
 * @return array|string Stripped array (or string in the callback).
 */
function level4_stripslashes_deep( $value )
{
   if ( is_array( $value ) )
   {
      $value = array_map( 'level4_stripslashes_deep', $value );
   }
   elseif ( is_object( $value ) )
   {
      $vars = get_object_vars( $value );
      foreach ( $vars as $key => $data )
      {
         $value->{$key} = level4_stripslashes_deep( $data );
      }
   }
   else
   {
      $value = stripslashes( $value );
   }
   return $value;
}

/**
 * Parses a string into variables to be stored in an array.
 *
 * Uses {@link http://www.php.net/parse_str parse_str()} and stripslashes if
 * {@link http://www.php.net/magic_quotes magic_quotes_gpc} is on.
 *
 * @source Wordpress
 * @since 2.2.1
 * @uses apply_filters() for the 'level4_parse_str' filter.
 *
 * @param string $string The string to be parsed.
 * @param array $array Variables will be stored in this array.
 */
function level4_parse_str( $string, &$array )
{
   parse_str( $string, $array );
   if ( get_magic_quotes_gpc() )
   {
      $array = level4_stripslashes_deep( $array );
   }
}

/**
 * Merge user defined arguments into defaults array.
 *
 * This function is used throughout WordPress to allow for both string or array
 * to be merged into another array.
 *
 * @source Wordpress
 * @since 2.2.0
 *
 * @param string|array $args Value to merge with $defaults
 * @param array $defaults Array that serves as the defaults.
 * @return array Merged user defined values with defaults.
 */
function level4_parse_args( $args, $defaults = '' )
{
   if ( is_object( $args ) )
   {
      $r = get_object_vars( $args );
   }
   elseif ( is_array( $args ) )
   {
      $r =& $args;
   }
   else
   {
      level4_parse_str( $args, $r );
   }
   if ( is_array( $defaults ) )
   {
      return array_merge( $defaults, $r );
   }
   return $r;
}

/**
 * This function will sanitize any string.
 *
 * @param string $value - The value you want to modify.
 * @param string $type - The type of sanitation to process.
 *  clean - This will clean unwanted html tags.
 *
 * @note We need multi-line maybe. Look if api supports this otherwise do multi-line search.
 *
 * @return data
 */
function level4_sanitize( $value, $type )
{
   if ( $type == 'clean' )
   {
      $data = str_replace( '<p><br />', '<p>&nbsp;</p><p>', $value );
      $data = str_replace( '<h2>', '<h1>', $data );
      $data = str_replace( '</h2>', '</h1>', $data );
      $data = str_replace( '“', '&ldquo;', $data );
      $data = str_replace( '”', '&rdquo;', $data );
      $data = str_replace( '’', '&rsquo;', $data );
      $data = str_replace( '‘', '&lsquo;', $data );
   }
   else
   {
      $data = null;
   }
   return $data;
}

/**
 * The function will display the doctype declaration.
 *
 * Insert this function before the <html> tag of a template file.
 *
 * @since 10.0.1
 *
 * @param string $type - A blank value will display the HTML5.
 *  html5 (Default)
 *  html_4_01_strict
 *  html_4_01_transitional
 *  html_4_01_frameset
 *  xhtml_1_0_strict
 *  xhtml_1_0_transitional
 *  xhtml_1_0_frameset
 *  xhtml_1_1
 *
 * @print data
 */
function frost_get_doctype( $args = '' )
{
   $defaults = array
   (
      'type' => 'html5'
   );
   $args_p = level4_parse_args( $args, $defaults );
   extract( $args_p, EXTR_SKIP );
   $data = null;
   if ( $type == 'html5' )
   {
      $data = '<!DOCTYPE html>' . "\n";
   }
   elseif ( $type == 'html_4_01_strict' )
   {
      $data = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">' . "\n";
   }
   elseif ( $type == 'html_4_01_transitional' )
   {
      $data = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">' . "\n";
   }
   elseif ( $type == 'html_4_01_frameset' )
   {
      $data = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">' . "\n";
   }
   elseif ( $type == 'xhtml_1_0_strict' )
   {
      $data = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">' . "\n";
   }
   elseif ( $type == 'xhtml_1_0_transitional' )
   {
      $data = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">' . "\n";
   }
   elseif ( $type == 'xhtml_1_0_frameset' )
   {
      $data = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">' . "\n";
   }
   elseif ( $type == 'xhtml_1_1' )
   {
      $data = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">' . "\n";
   }
   print $data;
}

/**
 * This function will fetch the content of any wikimedia by reading a certain tag.
 *
 * @note This was intended to be a class then modified to a function instead for flexibility.
 *
 * @source This function is based on WikiParser by Steve Blinch under GNU LGP 2.1.
 * @url http://code.blitzaffe.com
 *
 * @param string $url - The wikimedia's url (ex. http://frostproject/single/).
 * @param string $page - The wikimedia's current page (ex. http://frostproject/single/sample-page).
 * @param string $tag_start - Start getting contents in starting this tag. See the wikimedia's source code to determine the tag to use.
 *  '<!-- content -->' (default)
 * @param string $tag_end - Close the content based on the ending tag. See the wikimedia's source code to determine the tag to use.
 *  '<!-- /content -->'(default)
 * @param boolean $strip - If set to true will strip html tags.
 *  false (default)
 *
 * @return data
 */
function level4_fetch_content( $args = '' )
{
   $defaults = array
   (
      'url' => '',
      'page' => '',
      'tag_start' => '<!-- content -->',
      'tag_end' => '<!-- /content -->',
      'strip' => false
   );
   $args_p = level4_parse_args( $args, $defaults );
   extract( $args_p, EXTR_SKIP );
   $content = @file_get_contents( $url . $page );
   if ( $content )
   {
      $content = str_replace( 'href="/', 'href="' . $url . '/', $content );
      $content = level4_sanitize( $content, 'clean' );
      preg_match_all( '#' . $tag_start . '(.*?)' . $tag_end . '#es', $content, $array );
      if ( is_array( $array[1] ) )
      {
         if ( $strip == true )
         {
            $data = strip_tags( $array[1][0] );
         }
         elseif ( $strip == false )
         {
            $data = $array[1][0];
         }
      }
      unset( $content );
   }
   else
   {
      $data = 'Failed to get content!' . "\n";
   }
   return $data;
}

/**
 * This function will fetch the content of any wikimedia by reading a certain tag.
 *
 * @note This was intended to be a class then modified to a function instead for flexibility.
 *
 * @param string|array $args
 * @print level4_fetch_content()
 */
function level4_get_content( $args = '' ) { print level4_fetch_content( $args ); }

/**
 * This function will fetch the content of any wikimedia by reading a certain tag.
 *
 * @note This was intended to be a class then modified to a function instead for flexibility.
 *
 * @param string|array $args
 * @return level4_fetch_content()
 */
function level4_content( $args = '' ) { return level4_fetch_content( $args ); }
?>
Visit my site for the highest-quality ePubs. Each are meticulously crafted by hand, not automated, not converted.

Polyaness.com
User avatar
kuroneko
Devoted Haruhiist
 
Posts: 50
Joined: Tue Mar 27, 2012 1:58 am

Re: Just another simple script to parse contents

Postby Teh_ping » Sat Mar 23, 2013 8:28 am

I don't really know what I'm supposed to do with the codes, tbh...pardon my stupidity here.
User avatar
Teh_ping
Editor-in-Assistance
 
Posts: 1728
Joined: Thu Sep 17, 2009 10:32 pm
Location: Magdala

Re: Just another simple script to parse contents

Postby kuroneko » Wed Apr 03, 2013 4:32 pm

Upload in your web server or local environment. Easiest way is to download and install Wampmanager.
Visit my site for the highest-quality ePubs. Each are meticulously crafted by hand, not automated, not converted.

Polyaness.com
User avatar
kuroneko
Devoted Haruhiist
 
Posts: 50
Joined: Tue Mar 27, 2012 1:58 am


Return to Developers and Code

Who is online

Users browsing this forum: No registered users and 1 guest

cron