|
|||||||
![]() |
|
|
Thread Tools |
|
|
#1 |
|
That's Dr. Antar to you
![]() ![]() ![]()
Super Moderator
Join Date: Feb 2010
Posts: 2,051
DC Metro Area
|
I'll be brief.
I'd like to be able to upload each month's battle logs to some file hosting site for people to be able to pore over, either to double-check my findings or to look at new and interesting things. Compressed, we're talking about ~300MB per month. Megaupload can handle that. My concern, however, is privacy. It's one thing for trusted Smogon staff members to be allowed access to logs of every single battle by every single trainer who logs onto the server. It's another thing to give everyone the same right. It's hard for me to think of ways people would abuse the system--things might be different if there were no team preview--but I'm not very creative in thinking up these things. So? What do people think?
__________________
Codes and Hacks I Use PBR FC: 4898-8739-8815 (See here) Black FC: 4040 5386 0128 / White 2 FC: 4771 3664 7215 My Narrated PBR & Gen V Battles My Trade Thread Convert any sim team to pkms Pokemetrics: A Blog |
|
|
|
|
|
#2 |
|
maybe I just misunderstood
![]() ![]() ![]()
Join Date: Aug 2007
Posts: 3,695
|
I think something along the lines of this was brought up before. There are some pretty big advantages for people wanting to use the data for other fun things, and ways to mostly solve privacy issues. A combination of anonymizing usernames to make it awkward to track down specific foe's teams (perhaps giving a rating score rather than username to aid people wanting to make ranked stats) and perhaps delaying release of each month's stats for a few weeks to allow team turnover should almost entirely clear those worries. In favor, so long as it's not going to be practical for someone to code something which mines the data easily and picks out/predicts exactly the foe's team based on their username/revealed pokes mid battle. Looking forward to the stats that can come out of it.
__________________
For people who like storing things: The Box Reading and LC? LCF, LC Guide, LC Analyses Good channels: #littlecup, #C&C, #1v1, others And for SCMS editors: SCMS group |
|
|
|
|
|
#3 | |||
|
That's Dr. Antar to you
![]() ![]() ![]()
Super Moderator
Join Date: Feb 2010
Posts: 2,051
DC Metro Area
|
Quote:
Quote:
PMmed to me: Quote:
Keep in mind, unless specifically specified, spectators are allowed to watch any battle and essentially record their own log. If the concern is access to full moveset data, those logs are now being generated, but they're stored separately, and I can keep those private.
__________________
Codes and Hacks I Use PBR FC: 4898-8739-8815 (See here) Black FC: 4040 5386 0128 / White 2 FC: 4771 3664 7215 My Narrated PBR & Gen V Battles My Trade Thread Convert any sim team to pkms Pokemetrics: A Blog |
|||
|
|
|
|
|
#4 |
|
Fast-moving, smart, sexy and alarming.
![]() ![]() ![]() ![]() ![]()
Join Date: Aug 2005
Posts: 5,152
|
If we do this, we should probably remove all commentary from the logs. People probably assume that if it's just two people in a battle talking, that it will remain private information between those two people.
__________________
Previously obi. Technical Machine, a Pokemon AI. "Strategy without tactics is the slowest route to victory. Tactics without strategy is the noise before defeat." - Sun Tzu |
|
|
|
|
|
#5 | |||
|
maybe I just misunderstood
![]() ![]() ![]()
Join Date: Aug 2007
Posts: 3,695
|
Quote:
Quote:
Quote:
And agreeing with david stone.
__________________
For people who like storing things: The Box Reading and LC? LCF, LC Guide, LC Analyses Good channels: #littlecup, #C&C, #1v1, others And for SCMS editors: SCMS group |
|||
|
|
|
|
|
#6 | |
|
That's Dr. Antar to you
![]() ![]() ![]()
Super Moderator
Join Date: Feb 2010
Posts: 2,051
DC Metro Area
|
Quote:
As with anonymizing, removing chats from logs will likely involve making significant changes to the PO source. Actually, I could probably come up with an anonimyzing script that works on raw battle logs, but it'll be very processor- and time-consuming.
__________________
Codes and Hacks I Use PBR FC: 4898-8739-8815 (See here) Black FC: 4040 5386 0128 / White 2 FC: 4771 3664 7215 My Narrated PBR & Gen V Battles My Trade Thread Convert any sim team to pkms Pokemetrics: A Blog |
|
|
|
|
|
|
#7 |
|
Fast-moving, smart, sexy and alarming.
![]() ![]() ![]() ![]() ![]()
Join Date: Aug 2005
Posts: 5,152
|
It is available to spectators, yes, but it's one thing to have to join all battles and log it yourself (with all people in the battle knowing that there are spectators there and who those spectators are). It's quite another thing to give that information to everyone.
__________________
Previously obi. Technical Machine, a Pokemon AI. "Strategy without tactics is the slowest route to victory. Tactics without strategy is the noise before defeat." - Sun Tzu |
|
|
|
|
|
#8 |
|
That's Dr. Antar to you
![]() ![]() ![]()
Super Moderator
Join Date: Feb 2010
Posts: 2,051
DC Metro Area
|
Oh hey, I never replied to this.
Okay, based on the feedback I'm getting, I won't release the logs until I've come up with an "anonymizing" script. It's gonna be obnoxious, but it's definitely doable. Don't expect anything soon, though.
__________________
Codes and Hacks I Use PBR FC: 4898-8739-8815 (See here) Black FC: 4040 5386 0128 / White 2 FC: 4771 3664 7215 My Narrated PBR & Gen V Battles My Trade Thread Convert any sim team to pkms Pokemetrics: A Blog |
|
|
|
|
|
#9 |
|
Fast-moving, smart, sexy and alarming.
![]() ![]() ![]() ![]() ![]()
Join Date: Aug 2005
Posts: 5,152
|
Do you have an idea for how you'll anonymize, and what degree of anonymity you are trying to get?
As an example of the kind of pitfalls anonymization can have, if you give each user a unique ID, this allows looking at a player's overall performance, which could theoretically be useful (but I don't imagine it being that useful). However, it also means that if you can identify a user in any battle, you can identify all of that user's battles. The unique ID solution also has the problem of generating that ID. You can't do anything like a hash of the user name (as I could then just hash any user that I want to investigate). It seems like the best solution to this is probably to generate a long, random string that you use per data set (or per user, but that's not essential with solid encryption like AES), and have the user ID be the encryption of the user name with that long, random string as the key. You could then discard this randomly generated key and there would be no way to find out who any user is (with as high level of certainty as you can possibly get), and there is no risk of collision (because encrypted results are guaranteed to be unique, unlike a hash). What I would recommend for maximum anonymity is also the easiest to implement: "Player1" and "Player2". Ideally, each battle log would include both players ratings at the start of the battle so we can generate weighted stats with it and also measure the performance of our rating system. Most rating systems (such as Elo and Glicko, I don't know what PO uses) allow you to calculate the odds that a particular player beats another player, given only their rating. We could see how accurate this is, or if certain groups of players tend to be overrated or underrated.
__________________
Previously obi. Technical Machine, a Pokemon AI. "Strategy without tactics is the slowest route to victory. Tactics without strategy is the noise before defeat." - Sun Tzu |
|
|
|
|
|
#10 | |||||
|
That's Dr. Antar to you
![]() ![]() ![]()
Super Moderator
Join Date: Feb 2010
Posts: 2,051
DC Metro Area
|
Quote:
Quote:
Quote:
Quote:
Quote:
Bottom line, though: is it really worth going to all this trouble to protect anonymity? Seriously, I'm asking. Because I see only a limited amount of mischief anyone could accomplish by analyzing all battles done by a certain player a month in the past, and if you make the process more difficult by introducing a unique ID, I can't imagine anyone going through the trouble. But, then again, I'm the trusting sort.
__________________
Codes and Hacks I Use PBR FC: 4898-8739-8815 (See here) Black FC: 4040 5386 0128 / White 2 FC: 4771 3664 7215 My Narrated PBR & Gen V Battles My Trade Thread Convert any sim team to pkms Pokemetrics: A Blog |
|||||
|
|
|
|
|
#11 |
|
Don't tell me what to do.
![]() ![]() ![]() ![]() ![]()
Super Moderator
Join Date: Aug 2007
Posts: 3,374
|
Doesn't obi's suggestion to just change every name to player 1 and player 2 (arbitrarily for each battle) and then remove comments remove any issues of anonymity we might have?
Btw, I definitely do think it is important to do as much as we can to protect player anonymity, because a large part of winning in 4th and fifth gen singles is stylistic surprise.
__________________
beast mode |
|
|
|
|
|
#12 | ||
|
That's Dr. Antar to you
![]() ![]() ![]()
Super Moderator
Join Date: Feb 2010
Posts: 2,051
DC Metro Area
|
Quote:
Quote:
__________________
Codes and Hacks I Use PBR FC: 4898-8739-8815 (See here) Black FC: 4040 5386 0128 / White 2 FC: 4771 3664 7215 My Narrated PBR & Gen V Battles My Trade Thread Convert any sim team to pkms Pokemetrics: A Blog |
||
|
|
|
|
|
#13 |
|
Hmmm... A name for the plan...
![]() ![]() ![]() ![]()
Join Date: Aug 2007
Posts: 6,946
Sea Forest
|
Why don't we only make battles from the ladder public? Then we can skip any issues with tournaments. We could just make the logs public 2 weeks after they're collected. For the most part, at that point, people will have changed their teams or their teams will have lost edge anyway.
|
|
|
|
|
|
#14 |
|
Fast-moving, smart, sexy and alarming.
![]() ![]() ![]() ![]() ![]()
Join Date: Aug 2005
Posts: 5,152
|
Well, that's where my suggestion of using AES comes in. You can encrypt the username and include that at the top of the battle in the form of Player1 = Aerw9sfdjlk45, Player2 = CXx09j345nsd. Then if you want to get all of the stats about a particular player, you can search for all battles in which that encrypted unique ID is present. But as I said, this has the drawback of creating a unique token for a player that allows all of their battles to be tracked. Without that token, you cannot learn very much more than you already know.
Pokemon Online does give a player ID, but every time you reconnect, you get the "next" player ID available (and when the server restarts, the player IDs go back to 1). This does not provide monthly tracking, unless there is yet another player ID hidden some where. I'm not saying that we need to go one way or the other, I'm just saying the right way to do it with maximum anonymity (only Player1 and Player2), or the way to maximize anonymity while still allowing monthly tracking (with the AES encrypted user name with a randomly generated token used as the key, and that key is not saved / given out). If we add player ratings to the logs, then we won't even need to track who is who to determine the '1337 stats', because the rating would be built right into the log.
__________________
Previously obi. Technical Machine, a Pokemon AI. "Strategy without tactics is the slowest route to victory. Tactics without strategy is the noise before defeat." - Sun Tzu |
|
|
|
|
|
#15 |
|
goes to eleven
![]() ![]() ![]()
Join Date: Jul 2006
Posts: 3,570
Sitting on the edge of time
|
This is excellent. I was actually directed to ask you about getting exactly this data, without knowing you had already proposed uploading it!
Basically, I am attempting to create a pokemon-playing robot that doesn't really have any intelligence but just uses data like this to make decisions. This is exactly what I need. No source code modification is necessary, all the anonymization is just a sed script away.
__________________
This signature had to be removed :-( |
|
|
|
|
|
#16 | |||
|
That's Dr. Antar to you
![]() ![]() ![]()
Super Moderator
Join Date: Feb 2010
Posts: 2,051
DC Metro Area
|
Quote:
Quote:
Quote:
Thanks to your post I'll prioritize trying to figure out the anonymization stuff. I think I'm just going to make a ruling here and now: Fully anonymized battle logs ("Player 1 vs. Player 2") will be available one month after I post the usage stats (so, in other words, November's logs will be available January 1). I will also make non-anonymized logs available individually, upon request, and only to people who have been vetted (probably only Smogon staff). Not sure what month I'll start with. It depends on how busy I am in the next two weeks.
__________________
Codes and Hacks I Use PBR FC: 4898-8739-8815 (See here) Black FC: 4040 5386 0128 / White 2 FC: 4771 3664 7215 My Narrated PBR & Gen V Battles My Trade Thread Convert any sim team to pkms Pokemetrics: A Blog |
|||
|
|
|
|
|
#17 |
|
That's Dr. Antar to you
![]() ![]() ![]()
Super Moderator
Join Date: Feb 2010
Posts: 2,051
DC Metro Area
|
As much as I wanted to keep my post count at 666, I knew I couldn't do it forever.
I've written an anonymizer script. It's gonna need some testing, but so far it seems to successfully remove trainer names from the battle logs (but not pokemon nicknames). Everything I've written so far is up on my shiny new github repo! Feel free to take a look and, if you have anything to add, feel free to contribute!
__________________
Codes and Hacks I Use PBR FC: 4898-8739-8815 (See here) Black FC: 4040 5386 0128 / White 2 FC: 4771 3664 7215 My Narrated PBR & Gen V Battles My Trade Thread Convert any sim team to pkms Pokemetrics: A Blog |
|
|
|
|
|
#18 |
|
Fast-moving, smart, sexy and alarming.
![]() ![]() ![]() ![]() ![]()
Join Date: Aug 2005
Posts: 5,152
|
You could also release a torrent of these logs to minimize server strain now that megaupload doesn't exist.
__________________
Previously obi. Technical Machine, a Pokemon AI. "Strategy without tactics is the slowest route to victory. Tactics without strategy is the noise before defeat." - Sun Tzu |
|
|
|
|
|
#19 |
|
That's Dr. Antar to you
![]() ![]() ![]()
Super Moderator
Join Date: Feb 2010
Posts: 2,051
DC Metro Area
|
Then I have to seed the torrent.
Anonymized logs have continued to be in the back of my mind, but seeing as how no one's been bugging me for them, I haven't made it a high priority.
__________________
Codes and Hacks I Use PBR FC: 4898-8739-8815 (See here) Black FC: 4040 5386 0128 / White 2 FC: 4771 3664 7215 My Narrated PBR & Gen V Battles My Trade Thread Convert any sim team to pkms Pokemetrics: A Blog |
|
|
|
|
|
#20 | |
|
Don't tell me what to do.
![]() ![]() ![]() ![]() ![]()
Super Moderator
Join Date: Aug 2007
Posts: 3,374
|
Quote:
__________________
beast mode |
|
|
|
|
|
|
#21 |
|
Fast-moving, smart, sexy and alarming.
![]() ![]() ![]() ![]() ![]()
Join Date: Aug 2005
Posts: 5,152
|
I guess I should bug you for them.
I decided that it would probably be better if I had the ability to analyze the logs myself rather than having to bug you to write a script every time I have a new idea for Technical Machine.
__________________
Previously obi. Technical Machine, a Pokemon AI. "Strategy without tactics is the slowest route to victory. Tactics without strategy is the noise before defeat." - Sun Tzu |
|
|
|
|
|
#22 |
|
pronounced "Honko"
![]()
Join Date: Dec 2009
Posts: 1,023
|
I bugged you for them several months ago via PM. Consider yourself re-bugged!
__________________
![]() |
|
|
|
|
|
#23 |
|
That's Dr. Antar to you
![]() ![]() ![]()
Super Moderator
Join Date: Feb 2010
Posts: 2,051
DC Metro Area
|
If you bugged me several months ago, that would've been right in the middle of the PO<->PS transition, which is why I didn't register it.
Okay, so in this new PS era, we have some different issues:
So what should I be doing to anonymize PS logs? I'm thinking,
I could also quite easily remove the moveset data as well, but I'd prefer to give people who want it access to that data. Thoughts?
__________________
Codes and Hacks I Use PBR FC: 4898-8739-8815 (See here) Black FC: 4040 5386 0128 / White 2 FC: 4771 3664 7215 My Narrated PBR & Gen V Battles My Trade Thread Convert any sim team to pkms Pokemetrics: A Blog Last edited by Antar; Oct 20th, 2012 at 9:51:22 PM. |
|
|
|
|
|
#24 |
|
Fast-moving, smart, sexy and alarming.
![]() ![]() ![]() ![]() ![]()
Join Date: Aug 2005
Posts: 5,152
|
The move set data is one of the most important things for me to have, so that I can generate move set stats similar to team mate stats (but far more in-depth).
If you torrent, it wouldn't just be you uploading, though. I would upload to at least a ratio of 1.5. At the very least, people who were downloading it would also be uploading at the same time. I don't know how what the average ratio would be for seeding (excluding you from the stats). As for compression, you may want to look into 7zip compression. My understanding is that it compresses text better than most other compression algorithms. http://www.codinghorror.com/blog/200...-core-era.html Removing chat should also reduce the size of the logs a bit.
__________________
Previously obi. Technical Machine, a Pokemon AI. "Strategy without tactics is the slowest route to victory. Tactics without strategy is the noise before defeat." - Sun Tzu |
|
|
|
|
|
#25 | |||
|
That's Dr. Antar to you
![]() ![]() ![]()
Super Moderator
Join Date: Feb 2010
Posts: 2,051
DC Metro Area
|
Quote:
Re: Torrenting I get the sense that there are very few people who would actually be interested in getting these logs. I think in that case that the best way to distribute is via p2p transfer (ssh, skype, ftp) on a case-by-case basis. Basically, you ask me for the logs, I send them to you, and you agree to send them to another person if someone else wants them. Quote:
Quote:
__________________
Codes and Hacks I Use PBR FC: 4898-8739-8815 (See here) Black FC: 4040 5386 0128 / White 2 FC: 4771 3664 7215 My Narrated PBR & Gen V Battles My Trade Thread Convert any sim team to pkms Pokemetrics: A Blog |
|||
|
|
|
![]() |
| Thread Tools | |
|
|