Log of the #duraspace-ff channel on chat.freenode.net

Using timezone: Eastern Standard Time
* eddies leaves04:16
* eddies joins04:36
* eddies leaves
* eddies joins
* fasseg joins05:17
* JasonDGI joins07:56
<fasseg>Hey Jason..08:17
<eddies>jasondgi: fasseg and i were just talking about you ;-)08:18
<JasonDGI>uh oh
<fasseg>Could I ask you to rerun the jmeter-databank tests on your Ubuntu VM to see if the perf decay is related to my installation?
<eddies>heh, nothing bad
actually, i'll let frank catch you up…i need to finish drafting an email
talk to you guys on scrum
<fasseg>theroetically it just means, fetching the EDRM dataset, unzippingit into the "data" directory and customize the variables on the JMEter TestPlan called DATA_DIRECTORY DATABANK_HOST and DATABANK_PORT08:20
EDRM File Formats Data Set
<JasonDGI>this is the same link as on the readme? or something new?08:22
just a direct dl link
I'm also on skype if you run into trouble with the test plan..08:23
* eddies leaves08:24
<JasonDGI>does jmeter requrire x11?08:29
<fasseg>hmm this is were it falls down of course
...nah you can also run it from a shell...08:30
but can't you run JMeter from you Host system against the Databank instance?
<JasonDGI>im thinking thats probably easier
<fasseg>think so too but just in case: http://blogs.amd.com/developer/2009/03/31/using-apache-jmeter-in-non-gui-mode/08:32
there's also a maven plugin08:33
<JasonDGI>which parts of the test plan do i have to modify? just the params near the top?08:43
<fasseg>When you click on the testplan "DatabankMadness" they appear on the right side08:44
DATA_DIRECTORY should point to the directory where the EDRM dataset is extracted into08:45
DTABANK_HOST and _PORT anre the connection details for RDFDatabank
DATABANK_PREFIX is the context prefix for the apache web server, which is not needed on my installation since databank is in Apache's DocumentRoot08:46
but if e.g. databank would be accessible at localhost:80/db-rest/* you should set the prefix to "db-rest/"
the paths in the index file "files.txt" are relative to DATA_DIRECTORY but you should not need to modify this08:47
* eddies joins
* eddies leaves
* eddies joins
<fasseg>It's a pity I can't find a way comment this variables in JMeter, that would be much easier for usage then08:49
<JasonDGI>that would be easier
i keep getting HTTPSamplerProxy errors when i try to open the test plan08:52
im running it on the databank vm and x forwarding the gui08:53
<fasseg>hmm never had these08:55
<JasonDGI>i updated the data_directory, databank_host and in the authmanager set the user/password08:56
<fasseg>and what's the HTTPSampleProxy error? do you get a message`08:57
<JasonDGI>the gui says "Error in TestPlan - see log file"08:58
<fasseg>can you enable the log viewer in "options"?08:59
<JasonDGI>the log says http://pastebin.com/8bDB71tM
<fasseg>no idea what this means ;)
<JasonDGI>ill keep hunting
<fasseg>oh this seems to be related Unix/Windows file format09:01
good old CR/LF
what should the line endings be? im getting just CR09:04
<fasseg>you're on window right?
then I'll just send you a converted jmx file09:05
<fasseg>oh hmm well
<JasonDGI>should i try converting to CRLF?
<fasseg>worth a try im guessingfrom what i read at:
it's the same error it seems
and maybe try the same with files.txt ;)09:07
<JasonDGI>ok im running jmeter on my mac now and its getting past that error09:21
<JasonDGI>Add Items from files.txt -> Internal Server Error -- did you get that?09:27
<fasseg>yes sometimes, check your apache log, it's probably a FileNotFound error or sth.09:29
do you have a folder "data/EDRM_Data-Set_File-Formats_1-0-1/data-set" with lots of subfolders of file formats in your jmx folder?09:31
<JasonDGI>looks like
INFO:PersistentState:No JSON information could be read from the persistence file - could be empty: /silos/jmeter-test/pairtree_root/ds/-1/obj/__manifest.json
<fasseg>because the files.txt point to such a directory
<JasonDGI>should i worry about it?
<fasseg>errrm dunno, never seen this09:32
i only get those in my apache logs:
<JasonDGI>im guessing i should, as its consistent for me
<fasseg>[Mon Jan 14 13:16:23 2013] [error] /usr/lib/python2.7/site-packages/SQLAlchemy-0.7.6-py2.7-linux-x86_64.egg/sqlalchemy/engine/default.py:461: SAWarning: Unicode type received non-unicode bind param value.
that time i forgot to fire up the DB instance e..g
<JasonDGI>would it help if i unpacked the data onto the databank server as well as my machine?09:33
<fasseg>hm no normally JMeter sends the file in a multipart POST
in "Add item from files.txt" there's a section "Send Files With the request"09:34
where the Path from the files.txt is concatenated with the DATA_DIRECTORY variable to create the path to a test file
hmm is ouyr databank instance reachable from the web, than i could try to run the tests from here ;)09:35
I guess there are some issues with running it on macox maybe?09:36
*oops macos
<JasonDGI>maybe i have a variable/value problem, im gonna check that first
<fasseg>Oh and check the path of the result file in the "View results in table" this atm points to a unix folder on my local machine ;)09:37
and in this log file you can find the response data from the HTTP request...
<JasonDGI>it looks like it got farther, after i re-entered the "send files with request", but it still eventually fails09:40
<fasseg>I pushed an update with a new variable for the result file pathm that can be set as the others09:41
<JasonDGI>im wondering if my databank is not entirely stable
What's the error message?09:42
<JasonDGI>jmeter isnt giving me an error message, did you want the apache error?09:43
I got lots of python exceptions in there until i got it up and runnin gmyself, so maybe I already saw it...
<JasonDGI>same error09:44
INFO:PersistentState:No JSON information could be read from the persistence file - could be empty: /silos/jmeter-test/pairtree_root/ds/-1/obj/__manifest.json
<fasseg>hmm but that's just an INFO no ERROR, SEVERE or FATAL
<JasonDGI>its completing a different nunber if files each time
<fasseg>oi that's bad i guess...09:45
<JasonDGI>it looks like its moemtarily causing databank to choken09:47
can i tell it to pause between files?
<fasseg>i think so...09:48
you can add a "Constant Timer" to the While controller in Jmeter and set a pause there
<cbeer>fasseg: i've tried to consolidate the modeshape/fedora/databank testing into https://github.com/futures/ff-jmeter-madness09:50
<fasseg>I haven't got modeshape installed yet, but I'll try it out later..09:52
and I think we should discuss a bit further about Databank before making a large commitment there..
so not sure If iI should go ahead and add the Databank tests to the project already...09:53
<cbeer>databank tests are in there already
<fasseg>oh ok cool, that was quick...
<cbeer>i had time on my hands :)
<fasseg>not sure what to do now though....I guess I'll have a break before the Scrum call, but I'll stay in here...09:55
* barmintor joins10:03
cbeer: idk about the scanned books, but I can ask around10:04
are we on the call right now?
<cbeer>barmintor: k. maybe not an issue if we go with the digitalcorpora data10:06
<barmintor>fire alarm, back in a while
<fasseg>call is in an hour, right?10:07
run ben run!
<JasonDGI>@fasseg Error processing Javascript: [EDRM_Data-Set_File-Formats_1-0-1/data-set/rnd/Pushpins.rnd!=<EOF>10:08
<cbeer>yes, 52 minutes from now
<fasseg>oh that's a bug, form my side just ignore that...10:09
ill fix it soon10:10
<fasseg>hoy is your result graph looking?
hoy is your result graph looking?*how
jesus "*how" i tried to type
<JasonDGI>its only been able to get upto about 20 points so far, so not very useful
<cbeer>using fasseg's tests?10:14
<fasseg>hrmph, then you run into the javascript issue?10:15
Ill fix this right now, and let you know in a minute10:16
<JasonDGI>the error i posted comes up a lot and doesnt seem to cause any problems
<fasseg>heh that was easy just replace the __javascript thingie in the While Controller "Condition" with this: ${TEST_ITEM_PATH}
the thread is set to stop at EOF in the CSV Controller anyways so it wont run into an endless loop10:18
<JasonDGI>is that pushed?
but pulling will overwrite your variables again ;)10:20
<cbeer>fasseg: i was looking at ways to externalize those host/port/path configurations, and it seemed like it was more trouble than it was worth10:22
<fasseg>indeed...i think there's an easy way to override them though using something like -Djmeter.DATA_DIRECTORY=...10:23
let me check
<cbeer>ah, good idea. is that the __properties stuff?10:24
<fasseg>from the web:
I suggest you just define all of these as properties:
jmeter -Jserver.name=si -Jserver.name.port=9778 etc
and refer to them using as ${__P(server.name)} etc.
<cbeer>cool. i think i wasn't very good about that in ff-jmeter-madness10:25
<fasseg>there also the possibility to add a user.properties file for such variabels if i didnt misread this: http://jmeter.apache.org/usermanual/get-started.html#configuring_jmeter10:27
<cbeer>" The file will be automatically loaded if it is found in the current directory or if it is found in the JMeter bin directory."10:28
<fasseg>so you can set a path to "user.properties" in jmeter.properties and this will be interpreted as a key value pair for user defined vars...10:29
but i never tried this...
even better...
Okay Ill update my test as well then Jason can keep his vars when checking out the jmx file ;)10:30
<JasonDGI>i can fix my params pretty quickly if they get reverted, dont worry about that10:32
<fasseg>btw @cbeer just stumbled over this gem: Note that variables cannot currently be nested; i.e ${Var${N}} does not work. The __V (variable) function (versions after 2.2) can be used to do this: ${__V(Var${N})}. In earlier JMeter versions one can use ${__BeanShell(vars.get("Var${N}")}.
they added a __V function to allow some nesting of vars
@jason: okay..10:33
<cbeer>i should remember to note on the call:10:45
i added releases to our icebox to help keep it organized
<JasonDGI>just to check, if I put a constant timer inside the files.txt loop, will that affect the tests in any way?
<cbeer>it should pause the thread for the time specified
<JasonDGI>but jmeter won't be counting that delay will it?
<cbeer>not sure.
<fasseg>i dont think so since the whole thread is paused including the timer
<JasonDGI>cool - i think i have the test running after i put a delay in10:47
<JasonDGI>but it looks like i need the delay because databank can't keep up with the constant bombardment - isnt that kinda bad?
<fasseg>heh yeah kinda ;)10:48
I'd even say one can DoS it then ;)
but this might be a db thread pool config error or sth10:49
but how are the results looking? especcialy throughput?
<JasonDGI>its currently at 31.7/minute10:50
<fasseg>is it degrading over time?
<fasseg>does the graph look like this: https://docs.google.com/file/d/0B5nd_qlYdcqyS2haMlZLajRMc2M/edit?
<JasonDGI>can i export a graph after its been created?
<fasseg>i just screenshotted it ;)
hmm so it might not be my config....10:52
hmm would simple past of "to screenshot" "screenshot"? or would it turn to "-ed" form becaus eof the length of word..=10:53
jesus ignore that question, that's hardly english at all ;)
<cbeer>i'm not convinced screenshot is actually a verb :P
<JasonDGI>we should convene on this topic further10:54
<cbeer>cancel the standup, we have important matters of conjugation to discuss!
<cbeer>the OED doesn't believe it's a verb
<JasonDGI>so i have the graph but i cant access my google docs10:56
does that work?
fasseg: oh, i remember the other thing i changed when porting it into ff-databank-jmeter:
<JasonDGI>i put a 750ms delay between objects
<cbeer>i think you were adding all the files into the same dataset, right?
<cbeer>that seemed to have different performance characteristics than creating a bunch of smaller datasets
* jonathangee joins11:00
<cbeer>i'm on.11:01
<jonathangee>frank are you around?11:03
<cbeer>and i'd note, that's the result of a test that adds files to a single dataset, which explains the difference between that and...
<JasonDGI>INFO:PersistentState:No JSON information could be read from the persistence file - could be empty: /silos/jmeter-test/pairtree_root/ds/-1/obj/__manifest.json11:06
<fasseg>i still vote for "to screenshot" as a verb ;)
@ben before i forget, i asked about the supervisor script and its out-of-the-box, not touched by anyone here11:10
<barmintor>ok, thanks Jason
<cbeer>comments: https://github.com/futures/RDFDatabank/commit/cf3ea7e4584af84e1d40416be3278eda5b8b002911:12
and if github wasn't slow..
here's my branch with development environment niceties: https://github.com/cbeer/RDFDatabank/commits/dev-environment11:15
barmintor: you dropped?
one sec
<cbeer>jonathangee: i missed your question
fasseg: work from https://github.com/futures/ff-jmeter-madness, i think.11:17
<barmintor>Should we close out that "Get Databank info for dev team" ticket?11:19
<cbeer>barmintor: here's the bad java stuff I was working on: https://github.com/futures/akubra-glacier/tree/serialization_stuff11:23
<barmintor>eddies++ I'm sympathetic to where they are, and I want them on board with our work going forward11:26
oh, well that's a dealbreaker
<cbeer>i'm planning on dropping off the call, unless anyone thinks i should stick around11:49
nuxeo task added.
dropping and getting around to go into the office, feel free to ping me if needed.11:51
<jonathangee>thanks chris
<barmintor>scanning a number of mailing list threads, people seem to find that amazon s3 works nicely with fuse, but that the retrieval lag from glacier makes it unworkable
if fuse really just mocks a file system, I'd think you could effectively "pretend" by suggesting that the "file" is locked, but overloading that way would probably make it difficult to build apps on top of such a system11:52
<eddies>fuse is now asynchronous
or there are a new set of patches since last month
i'd like to have the filesystem conversation on list/
good point
<cbeer>i think e.g. glacier and HSM systems make FUSE untenable, and i think there's a population strongly interested in good support for that type of system11:54
but maybe i'm biased from working at one of those institutions
<barmintor>I'm sympathetic to those concerns myself :)11:55
I thnik that's asynchrony for fuse internally to facilitate multiple requests, not asynchrony in the layer between fuse and the storage11:57
but I might be wrong
<fasseg>chris: i'll take over "Add Nuxeo demo to the performance test harness", if it's alright with you...12:10
<cbeer>fasseg: go for it12:13
<cbeer>to https://github.com/futures/ff-jmeter-madness, right?
<barmintor>kind of interesting: I dont see anyone doing Nuxeo-over-asynch or Modeshape-over-asynch12:15
<cbeer>and if you see any clean-up to do in there, please go for it. i saw you made some other changes to the databank jmeter, not sure if any of them are useful here or not
barmintor: i think nuxeo had an async api, didn't it?
<fasseg>moslty cosmetic though...
<barmintor>nor fuse modules over hsm/asynch
cbeer: if it does, I can't find it12:16
<cbeer>fuse + hsm just sounds like a bad idea to me
barmintor: hm, i'll look again. maybe i'm getting things confused
(and, now that i think about it, maybe it was escidoc that i'm thinking of..)
nuxeo has some good docs vs modeshape, although getting to the nuxeo docs is a lot harder
<fasseg>nah no asyn api ther i think12:17
<barmintor>cbeer: it *does* have a notion of asynch jobs/processing already, which is a mark in their favor imo
<cbeer>fasseg: i was thinking about e.g. https://github.com/escidoc/escidoc-core/blob/master/fedora-service-client/src/main/java/org/escidoc/core/services/fedora/internal/FedoraServiceClientImpl.java#L114 in escidoc
maybe i was thinking about async in other places, but there's one place async occurs :P12:18
barmintor: nuxeo does?
<barmintor>cbeer: https://jira.nuxeo.com/browse/NXP-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:changehistory-tabpanel12:19
http://www.nuxeo.com/en/developers is a nice page, and makes me think nuxeo is more open than eddies believed
ooh, hm: http://funkload.nuxeo.org/12:20
<barmintor>cbeer: yeah, seeing some things that make me think the same thing
<cbeer>barmintor: i'm thinking about reworking your os x install directions to be "install directions for development purposes".. does that make sense to you? (and make it use my development.ini with sqlite, local data dir, etc)12:23
<barmintor>cbeer: yes
I don't think we need to consider OSX servers
<cbeer>k. i'll do that and have it ready for jonathangee and eddies by the end of the day, i hope.
gotta figure out branch management, and maybe ping oxford folk to make sure they agree with the development.ini12:24
<barmintor>eddies: is there a point where we just say "we're iterating on FCR3"?
<fasseg>im off guys see you tomm12:25
<eddies>barmintor: if the field is exhausted and the only other option on the table is build everything anew from the ground up, then yes, iterating on FC3 by necessity becomes a contender
bye frank
* fasseg leaves
<barmintor>on that note, sandwich12:26
<eddies>ok, well i'm going to just continue:
i think we could improve FC3 performance quite a bit, for example on ingest & update
(single node)12:27
i'm reasonably confident that once we try for transaction support, we'll slow back down and i'm doubtful we'd beat something like modeshape for performance12:28
* jonathangee leaves
<eddies>and i have no idea how we'll get multi-node from fc3 beyond sharding12:29
<cbeer>eddies: probably unrelated, but i've been playing with solr and the new solrcloud stuff in 4.0
and they do sharding/replication using zookeeper
i'm not sure how, but it seems like maybe it's a thin shim on top of solr?12:30
maybe worth looking into if we go that way.
* jonathangee joins12:45
<cbeer>barmintor, eddies: i'm going to get help setting up modeshape inside a "fedora" webapp today, hopefully.. but, then i was thinking.. should we start with nuxeo instead?13:19
* jonathangee leaves13:55
<eddies>i say start w/ modeshape
let's let frank at least get us some early numbers w/ nuxeo before we take that one further13:56
cbeer: you were recommending that i start from https://github.com/cbeer/RDFDatabank?13:59
<cbeer>eddies: i owe you better directions
<eddies>that would be much appreciated ;-)
hmm. how about i hold off on installing databank till my am?14:01
you think you could give me some instructions by end of your day?
<cbeer>yeah, i should have something for you by then
<barmintor>cbeer: would you rather have the java test for icemelt in a separate project w/ icemelt as a submodule (cleaner java project), or would you rather have some java in icemelt/spec and run command line procs (keeps tests with code)14:06
<cbeer>uh. what's the purpose? the icemelt project is already using a ruby aws client successfully14:07
<barmintor>I guess to use the amazon client code to test? That is what the ticket suggests, but I can just close it & move on14:08
<cbeer>barmintor: i'm thinking about creating a databank-jetty (fork of blacklight-jetty) that's all preconfigured and loaded as a submodule. thoughts?
<barmintor>cbeer: +1 to databank-jetty
<cbeer>barmintor: hm. i guess that might make sense. i'm using a third-party lib right now
<cbeer>so, sure. go for it.
in icemelt, i guess
<barmintor>okey doke
* jonathangee joins
<barmintor>re: databank-jetty- will it be loaded with solr, modeshape, and nuxeo? or different branches for the different candidates?14:10
<cbeer>just solr, for databank.
i guess we could make a ff-jetty
and different branches for things.
* barmintor shrugs
<cbeer>eh. i'll do databank-jetty first14:11
and then decide
barmintor: can the supervisor code be collapsed to:14:18
python message_workers/broker.py
python message_works/solr_worker.py
<barmintor>cbeer: yes, I think all supervisor does is run those two programs as daemons
<cbeer>(with some arguments, it looks like..)
<barmintor>yes to arguments14:19
<cbeer>opposed to me scrapping the supervisor stuff to just launch the things manually? i'm way too lazy to do all those steps :P
(how do they know what configuration to use, anyway?)14:20
<barmintor>No: it will mean even less "link this into /usr/bin" stuff, which gets us closer to having an actual testing rig
<cbeer>oh, there's production.ini hard-coded
*grumble grumble grumble*
<barmintor>re: which configurations - they configure supervisord to load all configs in the directory
<cbeer>barmintor: sorry, i was talking about the rdfdatabank configs,14:21
<barmintor>oh. yeah.
<cbeer>which means they've never tried to use these not in production mode before
<barmintor>it could also mean they have tried development mode, and it didn't work :P14:22
<cbeer>ok, now i'm more confused than i was before14:23
<barmintor>amazon doens't make it easy to test your notification listeners, do they? *grrr*
<cbeer>there must be some magic default behavior in here...
so, on that line, we supposedly load some config
and pluck values out of it onlines 90-94
<barmintor>from LogConfigParser14:25
<cbeer>ah, cool.
so.. they've wrapped ConfigParser for some reason
can you see any reason that kind of config shouldn't be in development.ini/production.ini?14:26
<barmintor>haven't looked
it looks like they wrapped ConfigParser just to get the default
& I guess nominally to validate, but that's an empty method14:27
I might have made that a static factory method, but w/e14:28
<cbeer>i wonder if supervisor is loading it too?
<barmintor>loading ConfigParser?
<cbeer>or that cfg file
<barmintor>I don't think so
supervisord runs solr_worker, which instantiates Config, which defaults to loading and parsing the logConfig file in that directory14:29
<cbeer>hm. and the workers themselves have a .conf that duplicates that same information.
<cbeer>i'll just change it and run the tests to see what breaks..14:30
oh, wait.
maybe it's just lunch time.
<barmintor>Really? That config has the same key-values as the .conf files?
<cbeer>loglines.cfg seems to have information that's also in the .conf files, or in the .ini files
e.g. numprocs14:31
or configuration (e.g. idletime) that is just ignored. gah.
<barmintor>i don't see numprocs referred to in the client code14:32
<cbeer>yeah, that's why i think it's a supervisor cofnig
commit_time = datetime.now() + timedelta(hours=hours_before_commit)
i think they're doing it wrong.
hm, ok. so this .conf => .cfg => .ini thing is just a weird way of doing indirection14:33
i'm just going to pass the full path to the .ini file into the workers and call it good.14:34
i really with anusha or benosteen would join us some time
<cbeer>ok, eddies. i give up. but the directions should get you to the point where you can play with databank, at least
<barmintor>it actually looks like solr_worker gets all its config from the locally parsed file
<cbeer>i still don't know what the solr index is for
barmintor: nope, it loads up the production.ini too14:38
<barmintor>the only cli args are for identifying worker number, I mean
<cbeer>oh, yes.
and it uses the loglines.cfg to find databank
and the production.ini
<barmintor>so the .conf files are all supervisord config for managing the processes
<cbeer>eddies: so, i think you should use this branch (and the configuration directions in doc/Databank_OSX_Installation.txt)14:39
eddies: https://github.com/cbeer/RDFDatabank/tree/launchd
eddies: the directions are misnumbered now, but you should be able to follow them until "Test your Pylons installation"
it's alive! (modeshape). guess that's a good time to take a break15:14
* JasonDGI leaves15:44
* jonathangee leaves16:48
<cbeer>barmintor: ping?18:05
* jonathangee joins18:54
barmintor: yeah, i can't fix up those workers as we discussed. the code is just too convoluted for me to fix up right.20:09
<jonathangee>woo. got databank up and running in debug mode on my laptop!21:49
all the tests fail, but i'm still excited
that was a bit of an odyssey21:50
anyone know if i have to create specific users for the tests?21:51
<cbeer>jonathangee: congrats. not sure, i haven't gotten to the tests yet21:59