Announcement

Collapse
No announcement yet.

Your site not database driven? Search it anyways!

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Your site not database driven? Search it anyways!

    Edit:
    Wow, been a while since I remembered that this was here. I would like to state that I do not consider this ideal as-is for a production website. However, for simple tasks or for non-critical production websites this may be useful. At the very least I hope it is some form of example and gives someone an idea.


    PHP File System Search (PFSS)
    ... so that I could put this in my sig

    I assume you have used PHP before, know how to make your server parse a PHP file, know how to navigate a filesystem, and know how to use basic HTML.

    The code below can be stuck wherever you want. However, this script is for PHP 5 only. The usage of str_ireplace() and scandir(), that I know of, restricts this from running under PHP 4.

    Generally just follow the comments to setup your search engine. The script needs to know your base directory you will be searching from, and some other nitty gritty details.

    Note that this is only going to search your filesystem, search results, weight results, and then order them. The directories it searches under can be very well dynamic (like you'd search different forums in HTMLforums), which would allow for specific searches. You'll gain the most if you know some PHP. The rest of your page needs to be styled and written yourself.

    How does the script search your files? First thing to know is ALL readable files are opened and searched. All HTML and PHP tags are stripped, and then the files are searched (this does not edit your actual files, do not worry). Your sensitive PHP code will not even show up, and files that are all within PHP delimiters will not return in the results whatsoever. So you can safely assume only the text that would be typically outputted to the browser will be searched and shown as results.

    The search engine just takes all words inputted by the user and searches every file for those words. There is no word filter, so words like 'a', 'the', 'it', will return a lot of results but they will not be relevant. This isn't Google either, no quoted phrases, no words that results cannot contain, etc... you get the idea. Just words and search. Therefore, without editing anything, I would just advise users to use relevant wording. Misspelt words are not corrected, suffixes and prefixes are not added, we are talking bare-bones if THIS WORD is equal to THAT WORD then it shows up as a result. The only thing that is ignored is case. But, this is simply a lightweight search engine that could fit into any ordinary site.

    Need help? Questions? Feel free to post. Otherwise, enjoy!

    (not much of a tutorial eh?)
    PHP Code:
    <?php

    // Config
    $dir 'bulldog'// Base directory to run search in, DO NOT END WITH A SLASH
    $form '<form action="?" method="get">
        Search: <input type="text" name="search" />
        <br /><input type="submit" value="Go!" />
        </form>'
    // Your HTML search form, WATCH THE QUOTES!

    $noresults '<p><strong>No results found.</strong></p>'// No results found message, will be followed by $form
    $cliplength 150// Number of characters displayed in search results
    $negativeoffset 10// Number of characters displayed before first match

    #### Config complete! Only edit below with caution!

    function BuildSearch($path) {

    $list scandir($path);
    $contents = array();

    $i 0;
    while (
    $i sizeof($list)) {

        if (
    $list[$i] != '.' && $list[$i] != '..') {
            if (
    is_file($path.'/'.$list[$i])) {
                
    $fs filesize($path.'/'.$list[$i]);
                if (
    $fs && is_readable($path.'/'.$list[$i])) {
                    
    $fh fopen($path.'/'.$list[$i], 'r');
                    
    $read fread($fh$fs);
                    
    fclose($fh);
                    
    $strip trim(strip_tags($read));
                    if (
    strlen($strip) > 0) {
                        
    $contents[$path.'/'.$list[$i]] = $strip;
                    }
                }
            }

            elseif (
    is_dir($path.'/'.$list[$i])) {
                
    $return BuildSearch($path.'/'.$list[$i]);
                if (!empty(
    $return)) {
                    
    $contents array_merge($contents$return);
                }
            }

        }

        
    $i++;

    }
    return 
    $contents;

    }

    function 
    sortByWeight($a$b)
    {
     if( 
    $a['weight'] == $b['weight'] )
     {
      return 
    0;
     }

     return (
    $a['weight'] < $b['weight']) ? : -1;
    }

    if (!empty(
    $_GET['search'])) {
        
    $items BuildSearch($dir);

        
    $term explode(' 'str_replace('/''\/'preg_quote($_GET['search'])));
        
    $i 0;
        while (
    $i sizeof($term)) {
            foreach (
    $items as $index => $value) {
                if (
    $int preg_match_all("/\ ({$term[$i]})\ /i"preg_quote($value), $matches)) {
                    if (isset(
    $found[$index])) {
                        
    $found[$index]['weight'] = $found[$index]['weight'] + $int;
                        
    $found[$index]['clip'] = str_ireplace(' '.$term[$i].' '' <strong>'.$term[$i].'</strong> '$found[$index]['clip']);
                    }
                    else {
                        
    $pos stripos($value' '.$term[$i].' ');
                        if (
    $pos $negativeoffset$pos $pos - ($negativeoffset);
                        
    $found[$index]['weight'] = $int;
                        
    $found[$index]['clip'] = '"'.str_ireplace(' '.$term[$i].' '' <strong>'.$term[$i].'</strong> 'substr($value$pos$cliplength)).'"';
                    }
                }
            }
            
    $i++;
        }

        if (!empty(
    $found)) {
            
    uasort($found'sortByWeight');
            foreach (
    $found as $fname => $file) {
                echo 
    '<p><strong><a href="'.$fname.'">'.basename($fname).'</a></strong>';
                foreach (
    $file as $key => $value) {
                    if (
    $key == 'weight') {
                        echo 
    '<br />Word matches: '.$value;
                    }
                    elseif (
    $key == 'clip') {
                        echo 
    '<br />'.$value;
                    }
                }
                echo 
    '</p>';
            }

        }
        else {
            echo 
    $noresults.$form;
        }
    }
    else {
        echo 
    $form;
    }

    ?>

  • #2
    erisco,
    To implement this into a site would you have to use something like this? Or am i way off base?

    HTML Code:
    <form method="get" action="search.php">
    
    <input type="text"   name="search" size="31"
     maxlength="200" value="" />
    <input type="submit" value="Search Site" />
    thanks for your help,

    ascskate


    Off Topic:

    700 POST!

    Comment


    • #3
      You could, sure. But otherwise just copy+paste the code to wherever you want the search results to appear.

      Comment


      • #4
        so i would just copy the php code into for example, testpage.php, and the form that you have in the PHP would show up? and then once the user placed a search the results would appear on the same page? im not exactly a PHP expert

        Edit:
        i got it. you can see it here www.ascskate.ueuo.com/searchpage.php thanks for this great tut, erisco. how would i incorporate this into a regular page, like have the search box in a div or whatnot and then have the results appear in the same div.

        i really need to learn PHP. ive been meaning to for so long but i just never have

        Comment


        • #5
          Glad it worked for you. If you want a custom form, you can simply use something like you posted above. If you do not want the form to appear defined in the PHP source, simply delete the HTML from the $form variable (make it blank).

          If your page was like this:
          HTML Code:
          <html>
          <head>
          <title>Ascskate</title>
          </head>
          <body>
          <h1>Search my site</h1>
          <form method="get" action="<?php echo $_SERVER['PHP_SELF']; ?>"> <input type="text" name="search" size="31"  maxlength="200" value="" /> <input type="submit" value="Search Site" />
          <!-- insert PHP code here -->
          </body>
          </html>
          The PHP would just be inserted right into that, and that is where the results would be displayed. Does that make sense?

          Comment


          • #6
            It made perfect sense. thanks Erisco!

            If its not to much to ask, is there a way to search only the text in the <body>, because as of now, it will return results of CSS. For example, if you search design, it returns the <title> and shows some of the CSS in the pages.

            And one more thing, do you have any books or sites you recommend I should check out to learn PHP?

            Thanks again,


            Edit:
            found another small issue with your search engine. when on my site, you search photoshop, you get the results from the images saying what program the image was created in and the weird looking characters that im guessing make up the images. you can see it here, http://ascskate.ueuo.com/searchpage....arch=photoshop that could be a problem. but as you said above it is just a bare bones search engine, but i think that at least filtering that out and just searching the body would be a worthwile modification to your code

            Tyler

            Comment


            • #7
              Meep. Option one is you chown or chmod your images so that PHP cannot read them, the script is already setup to ignore files it cannot read.

              Secondly, I could release a second version that allows file extension restrictions. So you could say not to return files ending with '.jpg' in your case. Also, through the exact same setup, you would be able to exclude directories as well. I think I will indeed add such a feature. *tomorrow *

              Comment


              • #8
                Wow, very nice erisco. Looking forward to the final version

                Comment


                • #9
                  Sure to be out before the new year. Aside from setting banned extensions and directories... what other features are you guys looking for?

                  Comment


                  • #10
                    Paginated results maybe?

                    Comment


                    • #11
                      :S that is possible. Unfortunately, it will take the same processing power as "indexing" all of the results... as I need to know the results and their order before I start slicing the arrays. Still, I can make it an option.

                      Comment


                      • #12
                        Woohoo nice one fella. Was looking for an example of this function. I'll pick through it when I get some time.

                        Top man for sharing

                        Comment


                        • #13
                          Unfortunately, I liked your sc`riptErisco,, but it is written form PHP 5 only am I am working with PHP 4. I basically need a script to search for archived pdf files. Any suggestions would be greatly appreciated.

                          Comment


                          • #14
                            My script would only be capable of understanding text-based files. I do not believe a PDF would work very well, sorry. You could check php.net, they may have something for you.

                            Comment

                            Working...
                            X