Talk:Naive Bayes classifier

Could someone please add an introduction which explains comprehensible to someone who is not a mathematician, what this thing is? --Elian 22:01 Sep 24, 2002 (UTC)

Just as soon as someone adds an introduction which explains comprehensible to someone, who is a mathematician (well, technically physics and computing stuff, only just started school, but anyway...), what those symbols are supposed to mean...

D is an object of type document, C is an object of type class, how can both p(D|C) and p(C|D) be meaningful (with two different values, even)? Can only guess whether "p(D and C)" is supposed to be something like boolean operations, set operations, surgical operations, or CIA operations... Cyp 19:42 Feb 10, 2003 (UTC)

I think the notation is clear to persons familiar with probability theory, but it could probably be explained more clearly for those who are not. Michael Hardy 19:44 Feb 10, 2003 (UTC)

Under the assumption that Probability axiom is right and meaningful, I've added a "(see Probability axiom)" and used that particular and symbol. Was an edit conflict, someone else LATEΧed the last two lines before I could submit the new and symbol... (Person put the text "and", hope I was right to replace it with <math>\cap<math>... Cyp 20:12 Feb 10, 2003 (UTC)

Aaargh... Now that I know what the notation means... Either I'm going mad, or all the fractions in the entire article are upsidedown... Cyp 20:41 Feb 10, 2003 (UTC)

From the article:

Important: Either I'm going mad, or the following formula, along with the rest of the formulas, are upsidedown (D/C instead of C/D)... If I wasn't considering the possibility of me going mad, I would correct this article myself. (Triple-checked I didn't accidentally reverse them myself, when adding <math>L_A T^E \chi<math>. If I'm mad, just remove this line. If I'm not, let me know, or correct it yourself (and remove this line anyway).

Fixed the upside-down equations. Please review my changes to make sure I've made the right changes.

Seems like what I'd have done... So I guess I wasn't mad, then. Cyp 17:20 Feb 11, 2003 (UTC)

Calling this page Naive Bayesian is extremely misleading. It is more generally known as a Naive Bayes classifier. For something to be Bayesian the parameters are treated as random variables. In the Naive Bayes Classifier this doesn't happen. I strongly suggest that the name is changed. Note that Google has 22,600 hits for "Naive Bayes" and 6,400 hits for "Naive Bayesian". A naive Bayesian is a Bayesian who is naive. Naive Bayes is a simple independance assumption. --Lawrennd 16:49 Sep 20, 2004

For something to be Bayesian the parameters are treated as random variables. This is simply not so. "Bayesian" has a much broader meaning than treating parameters as random variables. I agree that "naive Bayes classifier" is more commonly used (and therefore it's a more appropriate title), but the current name is not "extremely misleading". Wile E. Heresiarch 21:39, 20 Sep 2004 (UTC)
I agree that the title is somewhat inappropriate. "Naive Bayes" is clearly the more common name, which is sufficient motivation for changing the title. However, the very first paragraph of the article in fact points out that NB classification does not require any Bayesian methods. While that discussion could be improved, it is not deficient to the point of being misleading. The term "Bayesian" is often vague and can refer to something as generic as automatically trained methods: cf. "Bayesian spam filtering", which is usually not Bayesian in the sense of treating parameters (but not hyperparameters) as random variables. --MarkSweep 18:49, 21 Sep 2004 (UTC)
I'd prefer "Naïve Bayesian classification" to the current title. Κσυπ Cyp   23:00, 21 Sep 2004 (UTC)

"Naïve Bayesian classification" moved to "Naive Bayes classifier"

Hello. I have reverted "naïve" to "naive" in the article text, as "naive" is the usual English spelling, and occurs more often than "naïve" in texts (papers, books, web pages, etc). I have also moved naïve Bayesian classification to naive Bayes classifier. For various combinations of terms I find the following:

  • "naive Bayes classifier" yields approx 11,000 Google hits
  • "naive Bayesian classifier" yields approx 5000 Google hits
  • "naïve Bayes classifier" yields approx 1000 Google hits
  • "naïve Bayesian classifier" yields approx 500 Google hits
  • "naive bayesian classification" -wikipedia -encyclopedia yields approx 500 Google hits
  • "naïve bayesian classification" -wikipedia -encyclopedia yields approx 150 Google hits

As this classifier is very common in computer-related texts, it is reasonable to suppose Google is a reliable indication of the currency of different variations of the name. Regards & happy editing, Wile E. Heresiarch 04:51, 27 Dec 2004 (UTC)

But "naïve" is proper English (with the umlaut), so wouldn't that "overrule" the "most common" phrase? WhisperToMe 05:28, 27 Dec 2004 (UTC)

For the benefit of other readers, I'll copy here some comments I put on user talk:WhisperToMe: (1) Re: standard English. I can't find any dictionaries or other sources which state that the correct spelling is "naïve". Every source I have found shows "naive" as the primary spelling, and shows "naïve" as an acceptable variation of "naive". It is clear that both spellings are acceptable. Naïve/naive isn't mentioned at Wikipedia:Manual of Style or American and British English differences. If you have some other sources I'd like to hear about it. (2) Agreed that the Google test only shows what's more common. However, since both spellings are acceptable and "naive" is more common, and much more common in a mathematics context, a crusade to change "naive" to "naïve" seems pointless at best. Wile E. Heresiarch 06:57, 27 Dec 2004 (UTC)
Searching for "naïve Bayesian classifier" and "naive Bayesian classifier" come up with exactly the same pages (6660 pages each, in the same order). Wouldn't it be best to use the spelling "naïve", since it's easier to read? Otherwise many people will be reading it as "knave Bayes classifier" and getting confused... It's not as bad as trying to use "resume" as a noun, but I think it's better to use an ï, since it becomes easier for some people to read. Does anyone have trouble reading "naïve" but no trouble reading "naive"..? (If so, we can be nice, and make it even easier for them to read, by writing "naıve".) Κσυπ Cyp   15:55, 27 Dec 2004 (UTC)
Searching for "naïve Bayesian classifier" and "naive Bayesian classifier" come up with exactly the same pages (6660 pages each, in the same order). – Could the reason be that Google naively treats "naïve" and "naive" as interchangeable? On what basis do you assert that "naive" is harder to read than "naïve"? How can "naive" be confused with "knave" by someone who's likely to understand the article? Sure, if a child or someone learning English is completely unfamiliar with the word "naive" they may think that it's pronounced like "knave", but then the real problem is that they don't know what "naive" means in the first place. If they decide to look it up in a dictionary, they would also learn about the correct pronunciation. I would say in the case where two forms like "naive" and "naïve" exist and are equally acceptable, it's better to use the form that is easier to type. If we only had "naïve" everywhere with no redirects, someone might try to search for "naive Bayes" (because that's easy to type for lots of people, whereas "naïve" is not, even on many types of European keyboards); they would be unable to find the article, and either give up or start a duplicate article. In the context of this article, I would say that "naive Bayes" is far more frequent than "naïve Bayes", but check for yourself: do a search for "naïve Bayes" on http://scholar.google.com/ and see how many occurrences of "naïve" you actually find. --MarkSweep 01:10, 28 Dec 2004 (UTC)
Ummm, yes, it could be because Google naïvely treats "ï" and "i" as interchangable. (As searching for either finds the same pages, disregarding whether they use the easy to read or easy to type version.) The scholar.google.com does the same thing, except it doesn't display the diaeresis until I follow the links. (Clicked on a random link that it found, and it used the "ï", not the "i".) I assert that "naive" is harder to read than "naïve", becuase "naive" looks rather strange and distracting to me. It's obvious what it meant, after spending an extra second reading it, but why make people spend an extra second reading it to understand it? I would say that in the case where two forms like "naïve" and "naive" exist and are equally acceptable, it's better to use the form that is easier to read. We did not only have "naïve" everywhere with no redirects, and problems arising from not having any redirects will remain purely hypothetical. Κσυπ Cyp   04:59, 28 Dec 2004 (UTC)
Let's not make decisions based on a single random link. Furthermore, while I don't have any evidence that "naive" won't cause any additional confusion (except for people who don't know the concept in the first place), you don't seem to have any evidence that "naïve" is easier to read either. I would say the burden of proof is on you here: can you demonstrate empirically that "naive" actually causes confusion? Significant confusion? Utterly hopeless cannot-make-heads-or-tails-of-it confusion? The other issue is with instances of "naïve" that do not occur in an article title. Does the new Mediawiki search facility treat "naive" and "naïve" as equivalent? (I don't know.) I suspect (without proof) that "naive" is a more frequent search query than "naïve", since it's easier to type for just about anyone. Unless both terms are treated as equal, searching for "naive" will miss pages that only have "naïve" in them (on second thought, this is turning into an argument in favor of inconsistent spelling, using all variants of a relevant word in an article).
Empirically, I find "naïve" easier to read than "naive". I do not, and have not claimed, that it is significant confusion, just that "naïve" is easier for me to read. If some people find both equally easy to read, and some find "naïve" easier to read, then it seems that "naïve" is easier to read on average. (As far as I can tell, noone has claimed that they find "naive" easier to read than "naïve".) The Mediawiki search seems to be disabled at the moment, although I would guess that it wouldn't treat them the same. I also suspect (also without proof) that "naive" is a more frequent search query than "naïve", since it's easier for many/most people to type. Since (hopefully) noone is going round deleting redirects between "naïve" and "naive", searching should find both spellings. (I certainly think that searching for "naive" should find the articles, as well as searching for "naïve"...) Κσυπ Cyp   17:40, 28 Dec 2004 (UTC)
I'm sorry but "empirically, I find" just doesn't make sense: you're not stating an empirical observation, you're only stating your own opinion, which you are certainly entitled to. But since you have a stake in the outcome, you cannot count your own preferences in an empirical study. I could claim that I find "naive" easier to read (since it has fewer dots and looks more normal to me), but I would have to discount that as my own biased opinion, which isn't empirical evidence. Regarding the issue of full text search, I was referring to articles that do not have "naive" in the title and which can only be found by a full text search. However, my argument is not particularly good: by the same token, someone might search for "colour" and not find a relevant article that mentions "color" in the body text but not in the title. So all we have now in terms of arguments is (1) your opinion that "naïve" is easier to read; (2) the fact that in a non-random sample of 16 relevant publications (see below) "naïve" occurs in 2, but "naive" in 14; and (3) opinions from several editors that "naive Bayes" is more common. For all I know the conjunction of these three propositions is not a contradiction: it could be the case that "naïve" is in fact easier to read (though I will remain skeptical) and that "naive" is more common (for which I believe there is sufficient empirical evidence). In that case, we still need to make a decision which form we should pick, and there is precedent for choosing the more common form. --MarkSweep 19:52, 28 Dec 2004 (UTC)
After looking up "empirically" in dictionary.com, I'm not sure that I was using the word correctly. I meant, subjectively/personally, I find "naïve" a bit easier to read than "naive". I hadn't thought of full-text searches, before. If you do actually find "naive" easier for you to read, not just easier to write, then I'm fine with it being left as "naive". (I got the impression that noone here actually found "naive" easier to read, just thought it should be used because of being easier to type or more common.) (wɛn wɪl piːpl ɑːfɪʃəliː swɪtʃ tuː juzɪŋ ʌ fənɛtɪk əlfəbɛt fɔː ɪŋgɫɪʃ..?) Κσυπ Cyp   02:00, 29 Dec 2004 (UTC)
Some data points: The first two references cited in the present article both use "naive", not "naïve" (I was unable to check the third reference). Russell and Norvig use "naive", not "naïve". Among the first ten results returned by scholar.google.com, 8 use "naive" and 2 use "naïve". Added later: Mitchell's Machine Learning textbook (ISBN 0070428077), Data Mining by Han and Kamber (ISBN 1558604898), and Data Mining by Witten and Frank (ISBN 1558605525) all use "naive" exclusively. Score: "naive" 14, "naïve" 2. --MarkSweep 19:52, 28 Dec 2004 (UTC)
Finally, the insidious slippery slope argument: would you be in favor of writing "coördinate" and "reëlect" as well? How about "reärmed", since that could easily be confused with "rear med"? --MarkSweep 07:06, 28 Dec 2004 (UTC)
I wouldn't support or oppose a diaeresis on those words. The "ö" in "coördinate" seems slightly more appropriate than the "ä" in "reärmed", although I'm not sure why. Possibly because reading the "ä" as an umlaut would make the pronunciation completely wrong. (I think that pronouncing "naïve" without a diaeresis would sound much worse than pronouncing the other three words without a diaeresis.) Κσυπ Cyp   17:40, 28 Dec 2004 (UTC)
User:Cyp, I can't tell what you're taking about. Searching for "naïve Bayesian classifier" and "naive Bayesian classifier" come up with exactly the same pages (6660 pages each, in the same order). Googling for the exact phrase (with quote marks, and with -wikipedia -encyclopedia) I get 5000 for "naive" [1] (http://www.google.com/search?hl=en&lr=&q=%22naive+Bayesian+classifier%22+-wikipedia+-encyclopedia&btnG=Search) and 500 for "naïve" [2] (http://www.google.com/search?hl=en&lr=&q=%22na%C3%AFve+Bayesian+classifier%22+-wikipedia+-encyclopedia&btnG=Search) as reported above. Without quote marks (and with -wikipedia -encyclopedia) I get about 21,000 for "naive" [3] (http://www.google.com/search?hl=en&lr=&q=naive+Bayesian+classifier+-wikipedia+-encyclopedia&btnG=Search) and 6000 for "naïve" [4] (http://www.google.com/search?hl=en&lr=&q=na%C3%AFve+Bayesian+classifier+-wikipedia+-encyclopedia&btnG=Search). So on what basis are you trying to claim "naïve Bayesian classifier" and "naive Bayesian classifier" are equally common? -- In any event, if you want to claim "naïve" is easier for some people to read you're going to have to come up with some evidence for that; "naive" looks rather strange and distracting to me simply doesn't count. -- We did not only have "naïve" everywhere with no redirects, and problems arising from not having any redirects will remain purely hypothetical. -- I'm sorry, I simply don't understand what you're getting at here. Wile E. Heresiarch 05:27, 28 Dec 2004 (UTC)
When I search with google, it treats "ï" and "i" as completely identical. I have no idea why it behaves differently, when you search. Last time I checked, I was a person, so I have already come up with evidence that some (at least one) people find "naïve" easier to read than "naive". Κσυπ Cyp   17:40, 28 Dec 2004 (UTC)
Navigation

  • Art and Cultures
    • Art (https://academickids.com/encyclopedia/index.php/Art)
    • Architecture (https://academickids.com/encyclopedia/index.php/Architecture)
    • Cultures (https://www.academickids.com/encyclopedia/index.php/Cultures)
    • Music (https://www.academickids.com/encyclopedia/index.php/Music)
    • Musical Instruments (http://academickids.com/encyclopedia/index.php/List_of_musical_instruments)
  • Biographies (http://www.academickids.com/encyclopedia/index.php/Biographies)
  • Clipart (http://www.academickids.com/encyclopedia/index.php/Clipart)
  • Geography (http://www.academickids.com/encyclopedia/index.php/Geography)
    • Countries of the World (http://www.academickids.com/encyclopedia/index.php/Countries)
    • Maps (http://www.academickids.com/encyclopedia/index.php/Maps)
    • Flags (http://www.academickids.com/encyclopedia/index.php/Flags)
    • Continents (http://www.academickids.com/encyclopedia/index.php/Continents)
  • History (http://www.academickids.com/encyclopedia/index.php/History)
    • Ancient Civilizations (http://www.academickids.com/encyclopedia/index.php/Ancient_Civilizations)
    • Industrial Revolution (http://www.academickids.com/encyclopedia/index.php/Industrial_Revolution)
    • Middle Ages (http://www.academickids.com/encyclopedia/index.php/Middle_Ages)
    • Prehistory (http://www.academickids.com/encyclopedia/index.php/Prehistory)
    • Renaissance (http://www.academickids.com/encyclopedia/index.php/Renaissance)
    • Timelines (http://www.academickids.com/encyclopedia/index.php/Timelines)
    • United States (http://www.academickids.com/encyclopedia/index.php/United_States)
    • Wars (http://www.academickids.com/encyclopedia/index.php/Wars)
    • World History (http://www.academickids.com/encyclopedia/index.php/History_of_the_world)
  • Human Body (http://www.academickids.com/encyclopedia/index.php/Human_Body)
  • Mathematics (http://www.academickids.com/encyclopedia/index.php/Mathematics)
  • Reference (http://www.academickids.com/encyclopedia/index.php/Reference)
  • Science (http://www.academickids.com/encyclopedia/index.php/Science)
    • Animals (http://www.academickids.com/encyclopedia/index.php/Animals)
    • Aviation (http://www.academickids.com/encyclopedia/index.php/Aviation)
    • Dinosaurs (http://www.academickids.com/encyclopedia/index.php/Dinosaurs)
    • Earth (http://www.academickids.com/encyclopedia/index.php/Earth)
    • Inventions (http://www.academickids.com/encyclopedia/index.php/Inventions)
    • Physical Science (http://www.academickids.com/encyclopedia/index.php/Physical_Science)
    • Plants (http://www.academickids.com/encyclopedia/index.php/Plants)
    • Scientists (http://www.academickids.com/encyclopedia/index.php/Scientists)
  • Social Studies (http://www.academickids.com/encyclopedia/index.php/Social_Studies)
    • Anthropology (http://www.academickids.com/encyclopedia/index.php/Anthropology)
    • Economics (http://www.academickids.com/encyclopedia/index.php/Economics)
    • Government (http://www.academickids.com/encyclopedia/index.php/Government)
    • Religion (http://www.academickids.com/encyclopedia/index.php/Religion)
    • Holidays (http://www.academickids.com/encyclopedia/index.php/Holidays)
  • Space and Astronomy
    • Solar System (http://www.academickids.com/encyclopedia/index.php/Solar_System)
    • Planets (http://www.academickids.com/encyclopedia/index.php/Planets)
  • Sports (http://www.academickids.com/encyclopedia/index.php/Sports)
  • Timelines (http://www.academickids.com/encyclopedia/index.php/Timelines)
  • Weather (http://www.academickids.com/encyclopedia/index.php/Weather)
  • US States (http://www.academickids.com/encyclopedia/index.php/US_States)

Information

  • Home Page (http://academickids.com/encyclopedia/index.php)
  • Contact Us (http://www.academickids.com/encyclopedia/index.php/Contactus)

  • Clip Art (http://classroomclipart.com)
Toolbox
Personal tools