[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Tags

To: Gregory Leblanc <GLeblanc@cu-portland.edu>, "ldp-discuss@lists.linuxdoc.org" <ldp-discuss@lists.linuxdoc.org>
Subject: Re: Tags
From: Joe Cooper <joe@swelltech.com>
Date: Mon, 12 Jun 2000 21:41:07 -0500
References: <A5F46F4ED18FD211ABEE00105AC6CF070109377D@email.cu-portland.edu>
Resent-date: Mon, 12 Jun 2000 22:37:27 -0400 (EDT)
Resent-from: ldp-discuss@lists.debian.org
Resent-message-id: <ia7eNB.A.Xa.r5ZR5@murphy>
Resent-sender: ldp-discuss-request@lists.debian.org
Sender: joebuck@mail.swelltech.com

You're my hero, Greg.  Something needs doing, and I'll be damed if you
didn't just jump right in there and do it.  Now that I'm the one that
suggested somebody do it, I feel obliged to actually help.

Gregory Leblanc wrote:
> 
> > > We need a subset, and
> > > the template marks out that subset pretty well.
> >
> We've got most of this already, sort of.  I think that there should be three
> lists, as you propose, although I think that your second list shouldn't ever
> be clearly defined.  The first list should be all of the tags that we use to
> search/present better tailored content.  The third list (yeah, I was a math
> major :) should be all of the tags that the LDP considers "depreciated", and
> that we'd prefer that nobody used.  The second list should be just a list of
> all of the tags in DocBook, minus our other two lists.  Since we don't have
> anything to define that first set, and we've got only a very short third
> set, we don't have much of a list right now.  Basically, we've got two sets
> until we have some viewer/searcher that speaks DocBook.

Ok, what are the deprecated and to-be deprecated tags?  

<graphic> is the only one I know off hand.  Godoy and Mark know these
better than I, for sure.

> sgmlnorm does two of three here.  The third can't be done because we don't
> have an "allowed subset", and have no reason to define one.  We can say that
> these tags are depreciated, but that takes about 8 tags out of use.  Hardly
> worthwhile, if you ask me.

I agree.  I figure if Gary wants it, he can write it.  I don't see a
point in completely ruling out things and being hard assed about not
using them.  There should be some agreed upon standards (which a script
cannot understand, because it is content specific), however.  Things
like indexing and FAQ's and examples (code, command line, etc.), and
figures.  Codifying these things would be a good idea simply to make the
search engines job easier in the future.

I will do this while I am working on my next two documentation projects
(I'll announce one of them here when I get started on it...the other is
internal to my company).  I'll submit the way I think things ought to
be, and we can discuss it in a week or two.  If someone has ideas about
it now, let them flow, I will be glad to work from others ideas too.

> > I agree that it's a useful reference. My concern
> > was that it was being offered as a substitute for
> > the hard work of deciding on subsets.
> 
> Again, I don't think we have anything to define subsets, except to remove
> those "depreciated" tags.  Until we have some search engines that take
> advantage of the DocBook markup, there's no reason to define any more than
> "ok to use" and "don't use"

Yes and no.  If the search engine is designed to our task, then wouldn't
it be nice (and easier for the search engine) if we all agreed to mark
out our figures in the same way?  And our code examples?  And our
command line examples?  Most of these things are obvious from the
DocBook specification really...but maybe some of it needs to be
codified.  I don't know.

> > Yes. How about "Subsets". We need three -
> > Required, permitted, and searchable.
> 
> Hmm, that would make four, the way that I count.  However, they would
> definitely have some overlap.  Required would be the ones that you MUST
> have, in order to have a valid HOWTO document.  Permitted would be ones that
> are allowed in HOWTOs, but not required.  Searchable would be some from both
> sets, although not necessarily all of either set.  These would be the ones
> that our search engine/viewer understands.  The last set would be restricted
> tags, which would basically be any tags that we don't want people to use.

Ok.  Four sets it is.  And work on making the 'searchable' set match the
superset containing both required and permitted?  No reason not to keep
improving the search engine until it can provide complete indexing of
the entire LDP.

> I've put some minor thought into doing this, but it's a big enough project
> that I need to get back up to speed with programming first.

I know the feeling.  I can whip out a perl script to do little
things...but a database caliber query tool is a very daunting task.  I
keep hoping a perl master will step up to the plate.  My $50USD offer
still stands for an intelligent context sensitive DocBook search script
that works well for the LDP.  Others are welcome to add to that pot.  I
imagine if the pot gets big enough someone will take the week or two it
will take to make something happen.

> > We need required structure tags (like
> > <sect1>,<Article> etc.) required identification
> > tags (like <Author> and subsidiary tags), required
> > history tags (like <RevisionHistory> and
> > subsidiary tags), search tags (like keyword
> > lists), indexing tags (I'm not sure what they are,
> > but they should mark points in the text. Maybe
> > link tags.) Deprecated tags. Other tags that are
> > OK, but not special. Whichever of us gets to it
> > first.
> 
> Alrighty, I think I'll give that a shot this evening, in between Solaris
> installs.

I'll work on this as well as best I can with my limited knowledge of
DocBook.  I'm writing two projects right now, so I'll start grabbing out
tags as I use them.  We can merge our thoughts in a week or two and see
what we come up with.

> > > > 3) Put together an
> > > > on-line thesaurus of keywords.
> > >
> > > Ok, I'm seen a Glossary suggested, but no thesaurus
> > suggestion so far.
> > > Why a thesaurus?
> >
> > A glossary would make a good howto. I suggested a
> > thesaurus because keywords can get out of hand. A
> > thesaurus would do two things: authors could avoid
> > new keywords if one already existed that met their
> > requirements. People doing searches could find out
> > which keywords were likely to hit their subject.
> 
> What kind of structure are you looking at for the thesaurus?   Is this for
> people to read, or for authors/maintainers to use in trying to make their
> document show up in searches more appropriately?

Why not a thesaurus that the search engine refers to?  Why make someone
look it up themselves?  Have a cross reference check on every
search...then put the results that match the expanded list (i.e. matched
terms gathered from the thesaurus) at the bottom of the results page.

But how is it to be done?  Manually?  Harvested from tags in the
HOWTO's?  Sounds rather complicated.

Ok...Well I'll get back to this subject after I've actually done
something about it in a week or so.
                                 -- 
                    Joe Cooper <joe@swelltech.com>
                Affordable Web Caching Proxy Appliances
                       http://www.swelltech.com

--  
To UNSUBSCRIBE, email to ldp-discuss-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Follow-Ups:
- Re: Tags
  - From: Mark Komarinski <markk@cgipc.com>

References:
- re: Tags (was: RE: New Threads (Was...))
  - From: Gregory Leblanc <GLeblanc@cu-portland.edu>

Prev by Date: re: Tags (was: RE: New Threads (Was...))
Next by Date: Re: Tags (was: RE: New Threads (Was...))
Previous by thread: re: Tags (was: RE: New Threads (Was...))
Next by thread: Re: Tags (searching)
Index(es):
- Date
- Thread