Just another simple script to parse contents

Forum for volunteer developers working on Baka-Tsuki related applications (Baka-Reader, BTprince, etc).

Moderators: thelastguardian, Fringe Security Bureau, Senior Editors, Senior Translators, Alt. Language Translator/Editor, Executive Council, Project Translators, Project Editors

Post Reply
User avatar
kuroneko
Devoted Haruhiist
Posts: 50
Joined: Tue Mar 27, 2012 1:58 am
Favourite Light Novel: Ahouka!
Contact:

Just another simple script to parse contents

Post by kuroneko »

This is a simple code that I coded a while ago, it's kind of rush and needs a bit tweaking like.I used url-style user-defined functions of Wordpress (which is under GNU GPL) here just so we have unified format of "fetching" data "individually".

What I mean by I coded this a while ago have something to do on the functions prefix. A advanced generator + cms is a good plan but It's not viable at the moment though because I'm currently finalizing a cms that's why I shared this code.

If you find the code useful then great.

Index.php

Code: Select all

<?php
	include_once ( 'functions.php' );
	$url = 'http://www.baka-tsuki.org/project/index.php?title=';
?>
<?php frost_get_doctype( 'type=xhtml_1_0_strict' ); ?>
<html lang="en" dir="ltr">
<head>
<meta charset="UTF-8" />
<title><?php level4_get_content( 'url=' . $url . '&page=Chrome_Shelled_Regios:Volume1_Chapter1&tag_start=<!-- firstHeading -->&tag_end=<!-- /firstHeading -->&strip=true' ); ?></title>
<meta name="generator" content="Level4" />
</head>
<body>
<?php
level4_get_content( 'url=' . $url . '&page=Chrome_Shelled_Regios:Volume1_Chapter1&tag_start=<!-- firstHeading -->&tag_end=<!-- /firstHeading -->' );
print level4_content( 'url=' . $url . '&page=Chrome_Shelled_Regios:Volume1_Chapter1&tag_start=<!-- bodycontent -->&tag_end=<!-- /bodycontent -->' );
?>
</body>
</html>
functions.php

Code: Select all

<?php
//if ( ! defined( 'direct_access' ) ) { die( 'Level4' ); }

/**
 * Level4 Fetch wiki content.
 *
 * @package Level4
 */

/**
 * Navigates through an array and removes slashes from the values.
 *
 * If an array is passed, the array_map() function causes a callback to pass the
 * value back to the function. The slashes from this value will removed.
 *
 * @source Wordpress
 * @since 2.0.0
 *
 * @param array|string $value The array or string to be stripped.
 * @return array|string Stripped array (or string in the callback).
 */
function level4_stripslashes_deep( $value )
{
	if ( is_array( $value ) )
	{
		$value = array_map( 'level4_stripslashes_deep', $value );
	}
	elseif ( is_object( $value ) )
	{
		$vars = get_object_vars( $value );
		foreach ( $vars as $key => $data )
		{
			$value->{$key} = level4_stripslashes_deep( $data );
		}
	}
	else
	{
		$value = stripslashes( $value );
	}
	return $value;
}

/**
 * Parses a string into variables to be stored in an array.
 *
 * Uses {@link http://www.php.net/parse_str parse_str()} and stripslashes if
 * {@link http://www.php.net/magic_quotes magic_quotes_gpc} is on.
 *
 * @source Wordpress
 * @since 2.2.1
 * @uses apply_filters() for the 'level4_parse_str' filter.
 *
 * @param string $string The string to be parsed.
 * @param array $array Variables will be stored in this array.
 */
function level4_parse_str( $string, &$array )
{
	parse_str( $string, $array );
	if ( get_magic_quotes_gpc() )
	{
		$array = level4_stripslashes_deep( $array );
	}
}

/**
 * Merge user defined arguments into defaults array.
 *
 * This function is used throughout WordPress to allow for both string or array
 * to be merged into another array.
 *
 * @source Wordpress
 * @since 2.2.0
 *
 * @param string|array $args Value to merge with $defaults
 * @param array $defaults Array that serves as the defaults.
 * @return array Merged user defined values with defaults.
 */
function level4_parse_args( $args, $defaults = '' )
{
	if ( is_object( $args ) )
	{
		$r = get_object_vars( $args );
	}
	elseif ( is_array( $args ) )
	{
		$r =& $args;
	}
	else
	{
		level4_parse_str( $args, $r );
	}
	if ( is_array( $defaults ) )
	{
		return array_merge( $defaults, $r );
	}
	return $r;
}

/**
 * This function will sanitize any string.
 *
 * @param string $value - The value you want to modify.
 * @param string $type - The type of sanitation to process.
 *  clean - This will clean unwanted html tags.
 *
 * @note We need multi-line maybe. Look if api supports this otherwise do multi-line search.
 *
 * @return data
 */
function level4_sanitize( $value, $type )
{
	if ( $type == 'clean' )
	{
		$data = str_replace( '<p><br />', '<p>&nbsp;</p><p>', $value );
		$data = str_replace( '<h2>', '<h1>', $data );
		$data = str_replace( '</h2>', '</h1>', $data );
		$data = str_replace( '“', '&ldquo;', $data );
		$data = str_replace( '”', '&rdquo;', $data );
		$data = str_replace( '’', '&rsquo;', $data );
		$data = str_replace( '‘', '&lsquo;', $data );
	}
	else
	{
		$data = null;
	}
	return $data;
}

/**
 * The function will display the doctype declaration.
 *
 * Insert this function before the <html> tag of a template file.
 *
 * @since 10.0.1
 *
 * @param string $type - A blank value will display the HTML5.
 *  html5 (Default)
 *  html_4_01_strict
 *  html_4_01_transitional
 *  html_4_01_frameset
 *  xhtml_1_0_strict
 *  xhtml_1_0_transitional
 *  xhtml_1_0_frameset
 *  xhtml_1_1
 *
 * @print data
 */
function frost_get_doctype( $args = '' )
{
	$defaults = array
	(
		'type' => 'html5'
	);
	$args_p = level4_parse_args( $args, $defaults );
	extract( $args_p, EXTR_SKIP );
	$data = null;
	if ( $type == 'html5' )
	{
		$data = '<!DOCTYPE html>' . "\n";
	}
	elseif ( $type == 'html_4_01_strict' )
	{
		$data = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">' . "\n";
	}
	elseif ( $type == 'html_4_01_transitional' )
	{
		$data = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">' . "\n";
	}
	elseif ( $type == 'html_4_01_frameset' )
	{
		$data = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">' . "\n";
	}
	elseif ( $type == 'xhtml_1_0_strict' )
	{
		$data = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">' . "\n";
	}
	elseif ( $type == 'xhtml_1_0_transitional' )
	{
		$data = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">' . "\n";
	}
	elseif ( $type == 'xhtml_1_0_frameset' )
	{
		$data = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">' . "\n";
	}
	elseif ( $type == 'xhtml_1_1' )
	{
		$data = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">' . "\n";
	}
	print $data;
}

/**
 * This function will fetch the content of any wikimedia by reading a certain tag.
 *
 * @note This was intended to be a class then modified to a function instead for flexibility.
 *
 * @source This function is based on WikiParser by Steve Blinch under GNU LGP 2.1.
 * @url http://code.blitzaffe.com
 *
 * @param string $url - The wikimedia's url (ex. http://frostproject/single/).
 * @param string $page - The wikimedia's current page (ex. http://frostproject/single/sample-page).
 * @param string $tag_start - Start getting contents in starting this tag. See the wikimedia's source code to determine the tag to use.
 *  '<!-- content -->' (default)
 * @param string $tag_end - Close the content based on the ending tag. See the wikimedia's source code to determine the tag to use.
 *  '<!-- /content -->'(default)
 * @param boolean $strip - If set to true will strip html tags.
 *  false (default)
 *
 * @return data
 */
function level4_fetch_content( $args = '' )
{
	$defaults = array
	(
		'url' => '',
		'page' => '',
		'tag_start' => '<!-- content -->',
		'tag_end' => '<!-- /content -->',
		'strip' => false
	);
	$args_p = level4_parse_args( $args, $defaults );
	extract( $args_p, EXTR_SKIP );
	$content = @file_get_contents( $url . $page );
	if ( $content )
	{
		$content = str_replace( 'href="/', 'href="' . $url . '/', $content );
		$content = level4_sanitize( $content, 'clean' );
		preg_match_all( '#' . $tag_start . '(.*?)' . $tag_end . '#es', $content, $array );
		if ( is_array( $array[1] ) )
		{
			if ( $strip == true )
			{
				$data = strip_tags( $array[1][0] );
			}
			elseif ( $strip == false )
			{
				$data = $array[1][0];
			}
		}
		unset( $content );
	}
	else
	{
		$data = 'Failed to get content!' . "\n";
	}
	return $data;
}

/**
 * This function will fetch the content of any wikimedia by reading a certain tag.
 *
 * @note This was intended to be a class then modified to a function instead for flexibility.
 *
 * @param string|array $args
 * @print level4_fetch_content()
 */
function level4_get_content( $args = '' ) { print level4_fetch_content( $args ); }

/**
 * This function will fetch the content of any wikimedia by reading a certain tag.
 *
 * @note This was intended to be a class then modified to a function instead for flexibility.
 *
 * @param string|array $args
 * @return level4_fetch_content()
 */
function level4_content( $args = '' ) { return level4_fetch_content( $args ); }
?>
Visit my site for the highest-quality ePubs. Each are meticulously crafted by hand, not automated, not converted.

Polyaness.com
User avatar
Teh_ping
Editor-in-Assistance
Posts: 1729
Joined: Thu Sep 17, 2009 10:32 pm
Favourite Light Novel: Ahouka!
Location: Magdala

Re: Just another simple script to parse contents

Post by Teh_ping »

I don't really know what I'm supposed to do with the codes, tbh...pardon my stupidity here.
User avatar
kuroneko
Devoted Haruhiist
Posts: 50
Joined: Tue Mar 27, 2012 1:58 am
Favourite Light Novel: Ahouka!
Contact:

Re: Just another simple script to parse contents

Post by kuroneko »

Upload in your web server or local environment. Easiest way is to download and install Wampmanager.
Visit my site for the highest-quality ePubs. Each are meticulously crafted by hand, not automated, not converted.

Polyaness.com
Post Reply

Return to “Developers and Code”