erisco
12-18-2006, 09:15 PM
Wow, been a while since I remembered that this was here. I would like to state that I do not consider this ideal as-is for a production website. However, for simple tasks or for non-critical production websites this may be useful. At the very least I hope it is some form of example and gives someone an idea.
PHP File System Search (PFSS)
... so that I could put this in my sig :P
I assume you have used PHP before, know how to make your server parse a PHP file, know how to navigate a filesystem, and know how to use basic HTML.
The code below can be stuck wherever you want. However, this script is for PHP 5 only. The usage of str_ireplace() and scandir(), that I know of, restricts this from running under PHP 4.
Generally just follow the comments to setup your search engine. The script needs to know your base directory you will be searching from, and some other nitty gritty details.
Note that this is only going to search your filesystem, search results, weight results, and then order them. The directories it searches under can be very well dynamic (like you'd search different forums in HTMLforums), which would allow for specific searches. You'll gain the most if you know some PHP. The rest of your page needs to be styled and written yourself.
How does the script search your files? First thing to know is ALL readable files are opened and searched. All HTML and PHP tags are stripped, and then the files are searched (this does not edit your actual files, do not worry). Your sensitive PHP code will not even show up, and files that are all within PHP delimiters will not return in the results whatsoever. So you can safely assume only the text that would be typically outputted to the browser will be searched and shown as results.
The search engine just takes all words inputted by the user and searches every file for those words. There is no word filter, so words like 'a', 'the', 'it', will return a lot of results but they will not be relevant. This isn't Google either, no quoted phrases, no words that results cannot contain, etc... you get the idea. Just words and search. Therefore, without editing anything, I would just advise users to use relevant wording. Misspelt words are not corrected, suffixes and prefixes are not added, we are talking bare-bones if THIS WORD is equal to THAT WORD then it shows up as a result. The only thing that is ignored is case. But, this is simply a lightweight search engine that could fit into any ordinary site.
Need help? Questions? Feel free to post. Otherwise, enjoy!
(not much of a tutorial eh?)
<?php
// Config
$dir = 'bulldog'; // Base directory to run search in, DO NOT END WITH A SLASH
$form = '<form action="?" method="get">
Search: <input type="text" name="search" />
<br /><input type="submit" value="Go!" />
</form>'; // Your HTML search form, WATCH THE QUOTES!
$noresults = '<p><strong>No results found.</strong></p>'; // No results found message, will be followed by $form
$cliplength = 150; // Number of characters displayed in search results
$negativeoffset = 10; // Number of characters displayed before first match
#### Config complete! Only edit below with caution!
function BuildSearch($path) {
$list = scandir($path);
$contents = array();
$i = 0;
while ($i < sizeof($list)) {
if ($list[$i] != '.' && $list[$i] != '..') {
if (is_file($path.'/'.$list[$i])) {
$fs = filesize($path.'/'.$list[$i]);
if ($fs > 0 && is_readable($path.'/'.$list[$i])) {
$fh = fopen($path.'/'.$list[$i], 'r');
$read = fread($fh, $fs);
fclose($fh);
$strip = trim(strip_tags($read));
if (strlen($strip) > 0) {
$contents[$path.'/'.$list[$i]] = $strip;
}
}
}
elseif (is_dir($path.'/'.$list[$i])) {
$return = BuildSearch($path.'/'.$list[$i]);
if (!empty($return)) {
$contents = array_merge($contents, $return);
}
}
}
$i++;
}
return $contents;
}
function sortByWeight($a, $b)
{
if( $a['weight'] == $b['weight'] )
{
return 0;
}
return ($a['weight'] < $b['weight']) ? 1 : -1;
}
if (!empty($_GET['search'])) {
$items = BuildSearch($dir);
$term = explode(' ', str_replace('/', '\/', preg_quote($_GET['search'])));
$i = 0;
while ($i < sizeof($term)) {
foreach ($items as $index => $value) {
if ($int = preg_match_all("/\ ({$term[$i]})\ /i", preg_quote($value), $matches)) {
if (isset($found[$index])) {
$found[$index]['weight'] = $found[$index]['weight'] + $int;
$found[$index]['clip'] = str_ireplace(' '.$term[$i].' ', ' <strong>'.$term[$i].'</strong> ', $found[$index]['clip']);
}
else {
$pos = stripos($value, ' '.$term[$i].' ');
if ($pos > $negativeoffset) $pos = $pos - ($negativeoffset);
$found[$index]['weight'] = $int;
$found[$index]['clip'] = '"'.str_ireplace(' '.$term[$i].' ', ' <strong>'.$term[$i].'</strong> ', substr($value, $pos, $cliplength)).'"';
}
}
}
$i++;
}
if (!empty($found)) {
uasort($found, 'sortByWeight');
foreach ($found as $fname => $file) {
echo '<p><strong><a href="'.$fname.'">'.basename($fname).'</a></strong>';
foreach ($file as $key => $value) {
if ($key == 'weight') {
echo '<br />Word matches: '.$value;
}
elseif ($key == 'clip') {
echo '<br />'.$value;
}
}
echo '</p>';
}
}
else {
echo $noresults.$form;
}
}
else {
echo $form;
}
?>
PHP File System Search (PFSS)
... so that I could put this in my sig :P
I assume you have used PHP before, know how to make your server parse a PHP file, know how to navigate a filesystem, and know how to use basic HTML.
The code below can be stuck wherever you want. However, this script is for PHP 5 only. The usage of str_ireplace() and scandir(), that I know of, restricts this from running under PHP 4.
Generally just follow the comments to setup your search engine. The script needs to know your base directory you will be searching from, and some other nitty gritty details.
Note that this is only going to search your filesystem, search results, weight results, and then order them. The directories it searches under can be very well dynamic (like you'd search different forums in HTMLforums), which would allow for specific searches. You'll gain the most if you know some PHP. The rest of your page needs to be styled and written yourself.
How does the script search your files? First thing to know is ALL readable files are opened and searched. All HTML and PHP tags are stripped, and then the files are searched (this does not edit your actual files, do not worry). Your sensitive PHP code will not even show up, and files that are all within PHP delimiters will not return in the results whatsoever. So you can safely assume only the text that would be typically outputted to the browser will be searched and shown as results.
The search engine just takes all words inputted by the user and searches every file for those words. There is no word filter, so words like 'a', 'the', 'it', will return a lot of results but they will not be relevant. This isn't Google either, no quoted phrases, no words that results cannot contain, etc... you get the idea. Just words and search. Therefore, without editing anything, I would just advise users to use relevant wording. Misspelt words are not corrected, suffixes and prefixes are not added, we are talking bare-bones if THIS WORD is equal to THAT WORD then it shows up as a result. The only thing that is ignored is case. But, this is simply a lightweight search engine that could fit into any ordinary site.
Need help? Questions? Feel free to post. Otherwise, enjoy!
(not much of a tutorial eh?)
<?php
// Config
$dir = 'bulldog'; // Base directory to run search in, DO NOT END WITH A SLASH
$form = '<form action="?" method="get">
Search: <input type="text" name="search" />
<br /><input type="submit" value="Go!" />
</form>'; // Your HTML search form, WATCH THE QUOTES!
$noresults = '<p><strong>No results found.</strong></p>'; // No results found message, will be followed by $form
$cliplength = 150; // Number of characters displayed in search results
$negativeoffset = 10; // Number of characters displayed before first match
#### Config complete! Only edit below with caution!
function BuildSearch($path) {
$list = scandir($path);
$contents = array();
$i = 0;
while ($i < sizeof($list)) {
if ($list[$i] != '.' && $list[$i] != '..') {
if (is_file($path.'/'.$list[$i])) {
$fs = filesize($path.'/'.$list[$i]);
if ($fs > 0 && is_readable($path.'/'.$list[$i])) {
$fh = fopen($path.'/'.$list[$i], 'r');
$read = fread($fh, $fs);
fclose($fh);
$strip = trim(strip_tags($read));
if (strlen($strip) > 0) {
$contents[$path.'/'.$list[$i]] = $strip;
}
}
}
elseif (is_dir($path.'/'.$list[$i])) {
$return = BuildSearch($path.'/'.$list[$i]);
if (!empty($return)) {
$contents = array_merge($contents, $return);
}
}
}
$i++;
}
return $contents;
}
function sortByWeight($a, $b)
{
if( $a['weight'] == $b['weight'] )
{
return 0;
}
return ($a['weight'] < $b['weight']) ? 1 : -1;
}
if (!empty($_GET['search'])) {
$items = BuildSearch($dir);
$term = explode(' ', str_replace('/', '\/', preg_quote($_GET['search'])));
$i = 0;
while ($i < sizeof($term)) {
foreach ($items as $index => $value) {
if ($int = preg_match_all("/\ ({$term[$i]})\ /i", preg_quote($value), $matches)) {
if (isset($found[$index])) {
$found[$index]['weight'] = $found[$index]['weight'] + $int;
$found[$index]['clip'] = str_ireplace(' '.$term[$i].' ', ' <strong>'.$term[$i].'</strong> ', $found[$index]['clip']);
}
else {
$pos = stripos($value, ' '.$term[$i].' ');
if ($pos > $negativeoffset) $pos = $pos - ($negativeoffset);
$found[$index]['weight'] = $int;
$found[$index]['clip'] = '"'.str_ireplace(' '.$term[$i].' ', ' <strong>'.$term[$i].'</strong> ', substr($value, $pos, $cliplength)).'"';
}
}
}
$i++;
}
if (!empty($found)) {
uasort($found, 'sortByWeight');
foreach ($found as $fname => $file) {
echo '<p><strong><a href="'.$fname.'">'.basename($fname).'</a></strong>';
foreach ($file as $key => $value) {
if ($key == 'weight') {
echo '<br />Word matches: '.$value;
}
elseif ($key == 'clip') {
echo '<br />'.$value;
}
}
echo '</p>';
}
}
else {
echo $noresults.$form;
}
}
else {
echo $form;
}
?>