glupi jebeni bot
-
This post did not contain any content.
-
This post did not contain any content.
“Optimizing for things people love” aka talking to you like an hr team building seminar
It’s frustrating, or maybe it’s a good thing given the tendency for some people to form weird pseudo social relationships with LLMs, to see the evolution of chatgpts language processing
Public chatgpt only had the 3.5, 4, and 4o model but you can play with earlier models like 2 and 3 on huggingface. These were far weirder, often robotic and stilted but sometimes mirroring more natural colloquial English more based on the input
Rather than make something that is authentic and more natural to interact with they instead go for the ultra sanitized HR corporate speak bullshit. Completely bland and inoffensive with constant encouragement and reinforcement to drive engagement that feels so inauthentic (unless you are desperate for connection with anything, I guess). It’s mirrored in other models to some degree, deepseek, llama, etc (I don’t know about grok, fuck going on twitter).
3-5 years until it’s ruined by advertising, tops. If that
-
This post did not contain any content.
i see this all the time with software designed by americans. on an old job we used a tool called "officevibe" where you'd enter your current impression of your role and workplace once a month. you got some random questions to answer on a 10-degree scale.
when we were presented with the result the stats were terrible because the scale was weighted so that everything below 7 was counted as negative. we were all just answering 5 for "it's okay", 3-4 for "could use improvement", and 6-7 for "better than expected". there had never been a 10 in the stats, and the software took that as "this place sucks".
like, of course you downvote a bad response. you're supposed to help the model get better, right?
-
i see this all the time with software designed by americans. on an old job we used a tool called "officevibe" where you'd enter your current impression of your role and workplace once a month. you got some random questions to answer on a 10-degree scale.
when we were presented with the result the stats were terrible because the scale was weighted so that everything below 7 was counted as negative. we were all just answering 5 for "it's okay", 3-4 for "could use improvement", and 6-7 for "better than expected". there had never been a 10 in the stats, and the software took that as "this place sucks".
like, of course you downvote a bad response. you're supposed to help the model get better, right?
Recently, saw some survey that explicitly said 1-7 is "poor", 7-8 is "OK", and 9-10 is "great". Wild, not sure what the point of the scale is then.
Same with book ratings. Looking at StoryGraph, the average ratings I see is somewhere between 3.5 and 4.5. While I would rate a decent book a 3.
Born in Eastern Europe, live in the US, maybe that's why.
-
Recently, saw some survey that explicitly said 1-7 is "poor", 7-8 is "OK", and 9-10 is "great". Wild, not sure what the point of the scale is then.
Same with book ratings. Looking at StoryGraph, the average ratings I see is somewhere between 3.5 and 4.5. While I would rate a decent book a 3.
Born in Eastern Europe, live in the US, maybe that's why.
I wonder if it's like the grading system we use in school? <60% is F for fail, 60% to <70% is D which depending on the class can be barely passing or barely failing. >=70% would be A, B, and C grades which are all usually passing, and A in particular means doing extremely well or perfect (>=90%). I just noticed that that rating scale kind of lines up with the typical American grading scale, maybe that's just a coincidence
-
I wonder if it's like the grading system we use in school? <60% is F for fail, 60% to <70% is D which depending on the class can be barely passing or barely failing. >=70% would be A, B, and C grades which are all usually passing, and A in particular means doing extremely well or perfect (>=90%). I just noticed that that rating scale kind of lines up with the typical American grading scale, maybe that's just a coincidence
most countries i know mark <50% as a failing grade
-
most countries i know mark <50% as a failing grade
i was unaware most countries still use this terrible score system at all
-
“Optimizing for things people love” aka talking to you like an hr team building seminar
It’s frustrating, or maybe it’s a good thing given the tendency for some people to form weird pseudo social relationships with LLMs, to see the evolution of chatgpts language processing
Public chatgpt only had the 3.5, 4, and 4o model but you can play with earlier models like 2 and 3 on huggingface. These were far weirder, often robotic and stilted but sometimes mirroring more natural colloquial English more based on the input
Rather than make something that is authentic and more natural to interact with they instead go for the ultra sanitized HR corporate speak bullshit. Completely bland and inoffensive with constant encouragement and reinforcement to drive engagement that feels so inauthentic (unless you are desperate for connection with anything, I guess). It’s mirrored in other models to some degree, deepseek, llama, etc (I don’t know about grok, fuck going on twitter).
3-5 years until it’s ruined by advertising, tops. If that
i don't understand how people can find it appealing when computers speak like humans, i genuinely find HAL-9000 more appealing.
the ideal computer response style is how it works in star trek voyager