 |
|
|
07-24-2008, 07:27 AM
|
|
#1
|
 |
|
Lord (Level 16)
Join Date: Aug 2007
Location: Melbourne, Australia
Posts: 522
|
Creating Tag Clouds? The nuts and bolts...
Okay, Tag Clouds - How do we go from an article to a tag cloud?
Here's a great Tag Cloud generator... but I want to pull it apart and know how it works. Anyone got any links or tutorials on how this works?
http://www.tag-cloud.de/
Fill in any website address, and put like 6 as word count... then do like 100x100 and click the link provided at the bottom to recieve the tag cloud.
__________________
Joey,
DRUNKhooligan - visit blog
|
|
Add to del.icio.us
Can you digg it?
|
|
|
07-24-2008, 08:39 AM
|
|
#2
|
 |
|
Mod of the Underlay
Join Date: Jun 2002
Location: At a desk, hooked up and ready to rock
Posts: 17,317
|
that site seems to be a little broken - i'm clicking on the "recently created" links and nothing is happening...
now - your basic tag cloud has the tags as seperate keywords to the content of an article (or whatever your unit item is) - some sites like the public add tags - others are less free in their input...
i will be describing a skeleton system - i am very tired - results may vary, and any code is under a negative warranty
now, for this basic model, all you need is a text input, and specify that tags are seperated by a specific character (comma is a good one - this allows you to have two-word tags...
then, in your backend, you explode/split the tag string, and now you have an array of tags for that article.
the database is really where all the action is at...
near as I can tell, the best thing is to have a table where the primary key is the article ID combined with the tag
easily done in MySQL:
Code:
CREATE TABLE tag (
article_id BIGINT UNSIGNED,
tag VARCHAR(255),
PRIMARY KEY article_tag (article_id, tag)
);
note: i know that MyISAM allows FULLTEXT indexes, but I'm not sure how that works for primary keys, so I specified VARCHAR with the maximum allocation
so, that array of tags for an article, just run through it with a REPLACE query for each tag:
Code:
REPLACE INTO tag SET id=$article_id, tag=$tag;
after sufficient processing and cleaning of input, obviously
MySQL REPLACE is great
Want the tags for a specific article?
Code:
SELECT tag FROM tag WHERE article_id=$article_id
Want a cloud?
Code:
SELECT UNIQUE tag, count(tag) FROM tag;
gets you a list of all tags and how often they're used - then some maths and manipulation of relative font-sizes, and you have a nice looking tag cloud...
changing, deleting, and adding are all just as easy with this db setup, really
possible enhancements
well, I haven't really covered presentation...
auto-tagging (from your content) - now that would be complex
tag hints - start tagging, and then suggest related tags that could also be used
|
|
Add to del.icio.us
Can you digg it?
|
|
|
07-24-2008, 07:43 PM
|
|
#3
|
 |
|
Lord (Level 16)
Join Date: Aug 2007
Location: Melbourne, Australia
Posts: 522
|
Quote:
Originally Posted by Horus_Kol
that site seems to be a little broken - i'm clicking on the "recently created" links and nothing is happening...
now - your basic tag cloud has the tags as seperate keywords to the content of an article (or whatever your unit item is) - some sites like the public add tags - others are less free in their input...
i will be describing a skeleton system - i am very tired - results may vary, and any code is under a negative warranty
now, for this basic model, all you need is a text input, and specify that tags are seperated by a specific character (comma is a good one - this allows you to have two-word tags...
then, in your backend, you explode/split the tag string, and now you have an array of tags for that article.
the database is really where all the action is at...
near as I can tell, the best thing is to have a table where the primary key is the article ID combined with the tag
easily done in MySQL:
Code:
CREATE TABLE tag (
article_id BIGINT UNSIGNED,
tag VARCHAR(255),
PRIMARY KEY article_tag (article_id, tag)
);
note: i know that MyISAM allows FULLTEXT indexes, but I'm not sure how that works for primary keys, so I specified VARCHAR with the maximum allocation
so, that array of tags for an article, just run through it with a REPLACE query for each tag:
Code:
REPLACE INTO tag SET id=$article_id, tag=$tag;
after sufficient processing and cleaning of input, obviously
MySQL REPLACE is great
Want the tags for a specific article?
Code:
SELECT tag FROM tag WHERE article_id=$article_id
Want a cloud?
Code:
SELECT UNIQUE tag, count(tag) FROM tag;
gets you a list of all tags and how often they're used - then some maths and manipulation of relative font-sizes, and you have a nice looking tag cloud...
changing, deleting, and adding are all just as easy with this db setup, really
possible enhancements
well, I haven't really covered presentation...
auto-tagging (from your content) - now that would be complex
tag hints - start tagging, and then suggest related tags that could also be used
|
Thanks Horus,
I was reading and reading and reading... got excited about the SQL... BUT! You finally tickled my funny bone!
Quote:
|
auto-tagging (from your content) - now that would be complex
|
YES! This is what I want... I already have a nice CMS in place, I don't want to go ripping apart code and adding stuff into it... since the whole bloody thing is dynamic and would be a headache to debug. I want to be able to call a function and pass through all the content... and then spit out a cloud tag. That would be nice.
I'll do some experimenting at lunch with your example and see what ideas I can conjure.
__________________
Joey,
DRUNKhooligan - visit blog
|
|
Add to del.icio.us
Can you digg it?
|
|
|
07-24-2008, 07:47 PM
|
|
#4
|
 |
|
Mod of the Underlay
Join Date: Jun 2002
Location: At a desk, hooked up and ready to rock
Posts: 17,317
|
well, i've been thinking about the auto-tag's in some backwater of my mind - but I haven't formed anything clear out of it yet...
i'll see what I come up with over the day, and post back later
|
|
Add to del.icio.us
Can you digg it?
|
|
|
07-24-2008, 10:35 PM
|
|
#5
|
 |
|
Lord (Level 16)
Join Date: Aug 2007
Location: Melbourne, Australia
Posts: 522
|
I'm just going to brainstorm...
We really need something that... records a string temporarily.
Code:
$var myString = "The big brown fox jumped over the fence, but the fox died on the mental gate";
Then we need to grab every word and store it with the amount of times it's been used, so maybe a two dimensional Array of words and count.
Code:
1 [ 'The' ] [ 4 ]
2 [ 'big' ] [ 1 ]
3 [ 'brown' ] [ 1 ]
4 [ 'fox' ] [ 2 ]
etc.
Loop through the string, and add to the array - depending if it exists or not in the array already.
From that we can then create our tag cloud... but. Will it work? Or is there a better way? Storing in a file? Database? How will this go if we had a massive 1,000 word article... Do we want to store multiple words... Like you said above using comma delimiters.
__________________
Joey,
DRUNKhooligan - visit blog
|
|
Add to del.icio.us
Can you digg it?
|
|
|
07-25-2008, 12:39 AM
|
|
#6
|
 |
|
Mod of the Underlay
Join Date: Jun 2002
Location: At a desk, hooked up and ready to rock
Posts: 17,317
|
well, we certainly want a stop word list for common words like "the", "and", etc...
but, how to pick keywords from the thing?
Maybe only select the most common words from text (after the stop).
Also, give extra weight to any words used in the title.
extra weight based on which paragraph?
subsections and subtitles?
I'd say the storage would be the same with the linked table I created above, though.
|
|
Add to del.icio.us
Can you digg it?
|
|
|
07-25-2008, 03:45 AM
|
|
#7
|
 |
|
Blonde Bimbo
Join Date: Jul 2004
Posts: 2,354
|
Something you may find interesting wordle
¥
__________________
I may have opened the door, but you entered of your own free will
|
|
Add to del.icio.us
Can you digg it?
|
|
|
07-25-2008, 08:02 PM
|
|
#8
|
 |
|
Mod of the Underlay
Join Date: Jun 2002
Location: At a desk, hooked up and ready to rock
Posts: 17,317
|
hmmm... interesting... but it highlights what I see as a deficiency with the automated tagging system...
there's a bunch of words that I just wouldn't normally tag (get, set, one) - and it doesn't do multiple words (such as "back injury" which would be more useful than just "back" since that word has multiple meanings) - and sometimes, a salient tag just isn't in the actual content (example, an item on back injury might not mention the word "health" anywhere, but it would be a useful tag to link it to similar articles).
|
|
Add to del.icio.us
Can you digg it?
|
|
|
07-26-2008, 02:10 AM
|
|
#9
|
 |
|
Blonde Bimbo
Join Date: Jul 2004
Posts: 2,354
|
There is mention on the FAQ page about using a tilde ( ~ ) to make multiple words/phrases ... obviously that's no good for producing tags for real pages ... it is just for making up "pretty pictures" of your text though, and not really a tag generator as such.
The blog software that I use has a tagging system, so I whipped together a tag cloud plugin ... ooops, nope, someone else whipped up the tag cloud plugin, I whipped up a search cloud plugin :p ... but the idea's the same.
I got fed up of trying to remember which tags I'd used previously, so I coded an auto-suggest tag plugin which works petty well ... when I get chance I really need to add it to the core.
I agree that any automated tagging system is going to be flawed, about the best you can really get is a system that can tell you what keywords you've used on a page, and even that will have the same "single words only" limitations.
¥
__________________
I may have opened the door, but you entered of your own free will
|
|
Add to del.icio.us
Can you digg it?
|
|
|
07-26-2008, 04:02 AM
|
|
#10
|
 |
|
Lord (Level 16)
Join Date: Aug 2007
Location: Melbourne, Australia
Posts: 522
|
Great stuff above.
I like the tag suggestion - I don't mind having to write up an article and then a script that analyzes the content, suggests lets say 10 tags and the user can correct them.
In regards to double and triple words? Can it be done automatically?
__________________
Joey,
DRUNKhooligan - visit blog
|
|
Add to del.icio.us
Can you digg it?
|
|
|
07-26-2008, 04:15 AM
|
|
#11
|
 |
|
Blonde Bimbo
Join Date: Jul 2004
Posts: 2,354
|
The plugin I coded works off previously used tags, it doesn't analyse your content in any way. If you want to have a play with it you can download it here ( AM Auto Tags V 1.0 ).
It uses javascript to ask the server for all tags beginning with whatever letter you've just typed ( it doesn't use ajax though, so it can make cross-domain requests ) and then "suggests" tags based on what you're typing in the tags field and what's been used in the past. Double/multi word tags are no problem.
Obviously the majority of the code/input elements and server side stuff etc are specific to our blog software, but it should be a doddle to convert to any other platform.
¥
__________________
I may have opened the door, but you entered of your own free will
|
|
Add to del.icio.us
Can you digg it?
|
|
|
07-26-2008, 10:10 PM
|
|
#12
|
 |
|
Mod of the Underlay
Join Date: Jun 2002
Location: At a desk, hooked up and ready to rock
Posts: 17,317
|
okay, this is gonna come off as naive - but you can have JS/Server communication without using XHR?
|
|
Add to del.icio.us
Can you digg it?
|
|
|
07-27-2008, 02:55 AM
|
|
#13
|
 |
|
Blonde Bimbo
Join Date: Jul 2004
Posts: 2,354
|
Yep
Basically it works by appending a script tag to the <body> to make a request. The server then processes the request and replies with javascript.
Advantages :
Request goes over http, which all browsers understand
No need to do browser detection to work out which XMLHTTP connection you need to make.
Cross domain communication is no problem ( useful when you're running a multi-(sub)domain / blog system where the admin side can be on a different (sub)domain from the front end  )
The request is still asynchronous and you can detect success/failure
Disadvantages :
The reply must be javascript
You have to be aware of any security holes you may open
¥
__________________
I may have opened the door, but you entered of your own free will
|
|
Add to del.icio.us
Can you digg it?
|
|
|
07-27-2008, 04:20 AM
|
|
#14
|
 |
|
Lord (Level 16)
Join Date: Aug 2007
Location: Melbourne, Australia
Posts: 522
|
Quote:
Originally Posted by ¥åßßå
Yep
Basically it works by appending a script tag to the <body> to make a request. The server then processes the request and replies with javascript.
Advantages :
Request goes over http, which all browsers understand
No need to do browser detection to work out which XMLHTTP connection you need to make.
Cross domain communication is no problem ( useful when you're running a multi-(sub)domain / blog system where the admin side can be on a different (sub)domain from the front end  )
The request is still asynchronous and you can detect success/failure
Disadvantages :
The reply must be javascript
You have to be aware of any security holes you may open
¥
|
That may be an issue.. Security.
Latest month we had a fair few security breeches, none that actually got through since we have plenty of Client Side Protection and ALOT of Server Side Protection... Since now we are expanding our client base for our CMS software - with new features & technologies like this auto tagging - we need to ensure its 110% secure.
I'll download the script at work and have a look, see if I can get a lightbulb in my head.
__________________
Joey,
DRUNKhooligan - visit blog
|
|
Add to del.icio.us
Can you digg it?
|
|
|
07-27-2008, 08:47 PM
|
|
#15
|
 |
|
Mod of the Underlay
Join Date: Jun 2002
Location: At a desk, hooked up and ready to rock
Posts: 17,317
|
Quote:
Originally Posted by ¥åßßå
Basically it works by appending a script tag to the <body> to make a request. The server then processes the request and replies with javascript.
|
Still fuzzy on the mechanics here...
could you post a simple example (probably in a new thread?)
|
|
Add to del.icio.us
Can you digg it?
|
|
 |
|
|
KEEP TABS |
|
SPONSORS |
| |
|
| |
|
|
| |
|