Log of the #duraspace-ff channel on chat.freenode.net

Using timezone: Eastern Standard Time
* jonathangee leaves00:53
* jonathangee_ joins
* jonathangee_ leaves03:24
* jonathangee joins03:25
* eddies leaves04:23
* eddies joins04:40
* eddies leaves
* eddies joins
* fasseg joins05:15
I can't get my head around the nuxeo REST API(s), firts of all there more than one, and second of all there's no plain example of Document.Create that i can find...06:40
But the installation was done in 1minute so that's a plus06:41
* eddies leaves06:47
* jonathangee_ joins07:42
* jonathangee leaves07:44
<fasseg>wo just took me two hours to find out how to create an empty document in nuxeo...07:56
but that might be just me ;)
* eddies joins08:45
* eddies leaves
* eddies joins
installing databank has been so much fun :-P08:46
<eddies>i can only imagine how fun it was before
<fasseg>Im also struggling with nuxeo's JSON api
and im unable to fetch the fixtures from github it seems, gonna have to ask chris..08:47
<eddies>is this nuxeo's api? http://doc.nuxeo.com/display/NXDOC/REST+API08:49
<fasseg>Yes that's the one i use, although there seems to be a second RESTLet api at /nuxeo/tesAPI08:50
and the nuxeo webapp uses some *.faces API which I haven't looked at yet, so there seem to be alot of apis08:51
and also a lot of api methods actually
and there's this at a higher level: http://doc.nuxeo.com/display/NXDOC/Operations+Index
<fasseg>these are actually trhe operations form the automation REST api
so you do sth. like /nuxeo/server/automation/Document.Create e.g.08:55
where Document.Create is the operation
so these are the operatiions im currently using to create the jmeter test
oops phone brbr08:56
hmm t seems you have to do a DOcument.Fetch before a Document.Create can use it... So the API is quite complex I guess ;)09:06
and Blob.Attach too...09:07
<cbeer>fasseg: there's also a CMIS api too.09:45
<fasseg>and everyone of them is hard to use in jmeter
it's seems to be kind of a curse
<cbeer>fasseg: i tried to install it into a project using maven and didn't get very far either09:46
<fasseg>the use a conten-type of multipart/related for Blob.attach which i dont seem to get right in JMeter
<cbeer>at least with modeshape, i think i'm able to programmatically create objects, even if they aren't being persisted
<fasseg>well i can creat documents now in nuxeo but no luck with actually ingesting binary content09:47
example 2 on this page: {"params":{"document":"/default-domain/workspaces/myws/file"},"context":{}}
on this page: http://doc.nuxeo.com/plugins/viewsource/viewpagesrc.action?pageId=11044418
oh christ: In that case you MUST use a multipart/related request that encapsulate as the root part your JSON request as an application/json+nxrequest content and the blob binary content in a related part.09:52
whats wrong with good old multipart/formdata?
<cbeer>i'm in favor of declaring nuxeo not a viable candidate for this group09:53
i'm willing to stay on the fence about modeshape09:54
<fasseg>heh atm im all in favour for kicking nuxeo in the bin...
<cbeer>and i think it'd be easier to rewrite databank than to try to work with it as-is. or working with it as-is will just be slowly and painfully rewriting it
<fasseg>you mean just remodel the API?09:55
on top of an self written backend
<cbeer>yeah, i guess.
<fasseg>I like that, but I guess this is a No-Go for Eddie ;)09:56
<cbeer>eddies: did you get databank actually installed?09:59
hey, i just figured out why the code i was looking at before was really confusing. looking in the API docs, a POST to this API endpoint could mean 5 different things10:00
<fasseg>yes and it's not very "resty" either... e.g. a delete operation is a HTTP POST with Json content10:02
trying the restlet api now...
<cbeer>(i was actually talking about databank.. but not surprised it applies to lots of things too)10:03
<fasseg>heh, yes idd
hmm did anyone take a look at this: http://www.lilyproject.org/lily/about.html10:06
<- likes the hadoop
<cbeer>new to me.10:07
<fasseg>they claim: "rock-solid performance at scale"10:08
Id like to check this out when I get Eddies ok...
btw: do you want me to push the failed attempt for nuxeo to github?10:09
at least creating an empty document and delting it works ;)
<cbeer>fasseg: go for it
<cbeer>fasseg: i don't know, what's the preservation characteristics of HDFS?
<fasseg>Depends on your installation, normally you replicate every block over a number of nodes10:11
which makes it quite robust
* Anusha joins
<fasseg>there has been a talk from a yahoo guy about their experiences with failures and performance with HDFS which were quite encouraging
lemme check..10:12
nice talk to: http://www.youtube.com/watch?v=zbycDpVWhp0
<eddies>oi. just finished my call w/ the cdc10:15
i see my name mentioned at various points, what did i miss? =)
<cbeer>did you get databank going,
<eddies>not exactly10:16
<cbeer>we don't like nuxeo
i'd rather rewrite databank than use it as-is
and fasseg found http://www.lilyproject.org/lily/about.html
<eddies>i got up to but didn't get through VII. Initialize databank and Create the main admin user to access Databank
* anusha_ joins
<cbeer>eddies: following my branch?10:17
* Anusha leaves
just going back through term history to find the errors
<cbeer>ran into an error?
i should probably delete my virtualenv and run through the directions again10:18
anusha_: there's nothing in https://github.com/dataflow/RDFDatabank/blob/master/message_workers/loglines.cfg that (is actually used that..) wouldn't be better in development.ini or production.ini10:20
<eddies>ok, here's the error i was getting: https://gist.github.com/453938510:21
<anusha_>Ah, I kept the config separate for the message workers,
<fasseg>eddie: lily seems to have a simple test install package where everything (hadoop, Solr, hbase) runs in one JVM for a single node system. Do you think it makes sense to do a jmeter test agains this?
<eddies>i don't know what lily is
ah i see the link
<fasseg>a content repo on top of HFDS, HBase and Solr10:23
they claim: "rock-solid performance at scale"
<cbeer>anusha_: do they need to be? it seems like a pain for spliting dev + production environments
(e.g. change this, and then make sure you change it to the same value over there)
i was thinking about passing in a configuration file to the worker, would that make sense?
(and then, in a production scenario, you could just pass in a minimal config if you really wanted)10:24
<eddies>can cbeer and fasseg do a quick skype call to talk nuxeo/lily/etc right now (and anyone else who wants to join)
<cbeer>eddies: it's the same error if you run 'paster setup-app development.ini', i assume ?
<eddies>i have an SG call later so i won't have as much time to talk after scrum10:25
ok, i'll on the conf line
*i'll be on
<cbeer>barmintor, ping?
on the conf line or skype?
<eddies>conf line i guess
<fasseg>skype pls, conf line is off for me...
<eddies>oh ok
<anusha_>cbeer: what you say makes sense. It is very easy to add it to the production / development.ini file. Just need to make a small change to LofConfigParser.py
<fasseg>I'll invite, ok?
<cbeer>anusha_: great, i'll take a look at that, thanks.10:26
<anusha_>eddies: about th error, you need the recordsilo package. Let me have a look at requirmeents.txt10:28
<cbeer>anusha_: it should be in there. eddies, maybe try pip install -r requirements.txt and look for errors?10:29
<anusha_>it is in there.10:30
<barmintor>cbeer: did you have a question for me?
<cbeer>oh, i wondered if you wanted to talk nuxeo/lily with us10:33
<barmintor>loking at lily now
<cbeer>but i'm not sure we have anything else to say?
<barmintor>docs anyway. what did y'al conclude on skype?10:34
<eddies>still on skype. wanna join?
<jonathangee>hmm… missed that
Skype call over?10:35
<eddies>just impromptu call
to talk nuxeo and related
<barmintor>getting on skype
<eddies>just ping frank
we're not on the conf line
<cbeer>jonathangee: did you want on too?10:36
<cbeer>fasseg: can you add jonathangee too?10:37
<cbeer>"Unlike Content-Automation, Restlet in Nuxeo were never targeted at providing a uniform high level API, these are just helpful REST Apis exposed for"10:38
doesn't sound promising.
<cbeer>docs: http://docs.ngdata.com/lily-docs-trunk/ext/toc/10:41
i've put a ticket in pivotal for lily.10:47
<cbeer>oh, that's not how we were supposed to feel? huh.10:56
<cbeer>anusha_: do you know if there are any tests for the solr and redis workers? else i'll just do this and hope for the best11:01
very clever.
<barmintor>Amazon's multipart upload API kind of sucks.11:09
<cbeer>barmintor: the glacier api?11:10
<cbeer>hm. i kinda liked it. really low level, but works well enough
<barmintor>I mean the Java library11:11
your tests with Fog are pretty clean
<cbeer>oh, yeah. i didn't like the AWS SDK at all11:12
it's missing some obvious mid-level apis.11:13
<fasseg>cbeer: how can I use the JSON response in a Javascript expression? do you have a link?
<cbeer>"Hey AWS. here's a file, go upload it" (I think it does that)
"Hey AWS, here's an input stream, go upload it" (missing.)
<fasseg>oh this seems nice and easy too:11:14
<cbeer>fasseg: if you trust the json (and, i think we do..), i think you can just eval the response and get a JS array back
<cbeer>fasseg: i guess i was thinking you'd write some BSF (in the javascript flavor) to parse the response body and set some variables11:15
rather than doing it in __javaScript
<fasseg>that was my first idea since i didn't find anything else in JMeter...
but since "__jaascript()" doesnt require an additional component, i think ill go for your solution11:16
<barmintor>also: "changing back and forth between MultiPart and Multipart" --
<fasseg>and how do I eval the response? JSR-223 postporcessor?11:18
<cbeer>fasseg: no, BSF postprocessor. look at the ff-jmeter-madness project, i use it a lot there11:19
(because there's no way i'm writing beanshell!)11:20
<fasseg>ah ok i get it now, you write the var using the BSF processor and extract it using $__javascript()
i thought you were using a javascript postprocessor
<cbeer>the BSF postprocessor does javascript.11:21
(and, you write the variables back to jmeter, no $__javascript required)
<anusha_>cbeer: No.
<fasseg>oh i though that was beanshell
<cbeer>nope, BSF lets you pick javascript, beanshell or something else11:22
(and i think jmeter lists the variables it provides to the javascript in there somewhere..)
not sure how you read the response body, but it should be easy enough
<fasseg>ah jesus no wonder i couldn�t find it, I was looking at adam's FedoraMadness.jmx11:25
okay this was easy, just 2 lines...11:32
<cbeer>looks like ajs6f's change worked. no more errors in modeshape11:33
seemingly big hit to performance, but i'll run it again later and see11:36
<fasseg>off for a break...11:40
<cbeer>i guess it's not so bad. maybe twice as slow with the new locking model11:41
still faster than glacier!
<barmintor>so… 50% faster than fcrepo?11:42
<cbeer>yup, more or less11:43
i'm going to run the fedora test again just to be fair-ish.11:44
yup, about 2x as fast as fedora11:52
<eddies>modeshape is now 2x faster than fedora?12:33
so the thing i remember about that modeshape thread was that you didn't necessarily have to switch locking models. i seem to remember the point being made that the test needed to be rewritten somewhat to avoid all threads trying to start by creating the same container/endpoint or somesuch…i should probably just go back and re-read.12:35
anusha_ was just talking to wolfram, he said you might have some idea of what was causing performance problems in the databank test harness?12:37
<cbeer>eddies: me?
<eddies>last was for anusha
the modeshape stuff i guess was for you or fasseg12:38
oh no, you or adam
<cbeer>ok, i don't know about modeshape
and the awful performance people were observing with databank was using fasseg's original tests, where we were adding 300+ files to a single dataset12:39
multiple datasets didn't show that dramatic performance degredation
but were still slower than fedora
<eddies>ah ok
<cbeer>so i don't know which anusha_ can address
hm, eddies: are you sure you were using my launchd branch of databank? i'm not seeing the httplib2 problem there (but was definitely seeing it in the dataflow verion)12:42
oh, but i am able to recreate your recordsilo error12:43
<eddies>$ git branch
* launchd
<eddies>i confess i had no idea what i was doing w/ virtualenv
so maybe i did something wrong there?12:44
'rvm use python@rdfdatabank --create'
<eddies>i added a bunch of stuff to my .bashrc for virtualenv that i googled
<cbeer>oh, look at that. i think i got requirements.txt and requirements-dev.txt out of sync12:45
<eddies>yeah, that's all it seemed like
<cbeer>eddies: try pip install -r requirements.txt
and then try paster
then i get into other dependency errors12:46
it would help if i changed the right files in the right directory :/12:47
<eddies>ah yes, coz this one wants mysql too12:49
<cbeer>ok, pull down the new launchd
i already miss bundler.12:50
<eddies>and use which requirements? the dev one?12:54
paster setup-app development.ini seems to work now
<eddies>so what should i set granary_uri_root to?
<cbeer>in development, you shouldn't need to touch it12:57
<eddies>the significance of the base URI for getting something just up and running is lost on me :-P
ah good
<cbeer>or, i didn't touch it and it seemed to work
is that in the directions?
<fasseg>btw: when using the CMIS api, one has to base64encode the binary data, but this should not really impact perfomance, since base64 can be streamed, am i right?
<fasseg>or will the whole binary data get loaded into mem first?
<eddies>depends on the impl =)
<fasseg>that'd be ugly
because it has to go into the atom xml?12:59
<fasseg>i sense a disturbance...
but it's not compressed, so in the best case it just takes a stream and en/decodes it..13:00
im not mistaken there
i Could use the Restlet API, which does can handle a RESTful post request with a binary data body13:01
It would certainly take some plexity out of the tests...
what do you guys think?
my keyboard gives up his ghost it seems
although that doesn't account for the grammatical errors ;)13:03
<eddies>i would have aimed for the restlet api, except for the fact that it seemed…underdocumented
<cbeer>and advised against.
<fasseg>there's one example for file upload i actually found...
<eddies>oh is it actually advised against?
<cbeer>cbeer: "Unlike Content-Automation, Restlet in Nuxeo were never targeted at providing a uniform high level API, these are just helpful REST Apis exposed for"13:04
<fasseg>Well i guess the question is: Is it good enough for a perf test?
It's time to cast your votes ;)13:05
<eddies>i dunno, then i'd say stick in one idiom, i.e. just take the cmis interface as far you can take it
<eddies>cbeer, is this step correct: sudo ./bin/paster serve development.ini
<cbeer>no sudo.
sorry, guess i didn't edit that step
kill all teh stuff about apache13:07
kill all the sudo.
and the example of running on another port is silly
<eddies>that blew up
<fasseg>here's a CMIS example for nuxeo for the interested: http://www.nuxeo.com/blog/development/2010/01/trying-cmis/13:08
<cbeer>eddies: gist?
<eddies>actually, i think it boils down toImportError: No module named sql
eep: ImportError: No module named sql
<eddies>cbeer: https://gist.github.com/454060313:11
<cbeer>and the pip install ran without errors?13:12
<eddies>but i wasn't clearing out the virtualenv or anything each time. so i just reran pip install a bunch times
i don't know what dependencies got borked, half-baked etc along the way
<fasseg>There seems to be yet another way to upload binaries w/o base64 encoding using CMIS, but it involves two requests, one for creating metadata only and one for posting the binary...that's just strange :/13:19
<eddies>yeah, you should have an undocumented endpoint where you upload binaries and then reference it ;-)13:21
<fasseg>Oh so a third way?13:22
this gets better and better...
ill stick to base64 for now, one request seems the best to me for now...13:23
*to me
* anusha_ leaves13:30
<cbeer>ok, pushed the config-unifying code. to cbeer/rdfdatabank/launchd13:41
hopefully it works.
heading to the office.13:51
<barmintor>JobParameters params = new JobParameters().withType("inventory-retrieval");
<fasseg>and beanshell----------------------------
<cbeer>config.get() takes at least 2 arguments; config.has_key only takes 1.
<fasseg>goold old maxim "always amaze the dev"14:39
<barmintor>cbeer: it's safe to poll icemelt until a job finishes in the tests, right?14:40
<cbeer>barmintor: yep14:41
eddies: hey, i get your sql error now too
<barmintor>what's the sql error?14:42
it's not about databankauthadmin vs databanksqladmin, is it?14:43
<cbeer>barmintor: no, missing repoze.what.plugins.sql from requirements.14:44
and then, version conflict. yay14:45
repoze.what-quickstart it is.14:46
<eddies>cbeer: yay for repeatability?14:56
<cbeer>yeah, my fault again, i think.
<eddies>i'm about to turn in for the night, though
<cbeer>k. i'm struggling to get the solr worker working.
who knows if i broke something or it's just broken.
<eddies>it you sort it out, shoot me an email later so i catch it tomorrow…err later today :-P14:57
<barmintor>cbeer: I'm remembering a weird dance where you were supposed to install a library with reposze dependencies, then delete them and install an older version of rpoze15:05
breaking: I cannot spell repoze
* barmintor looks at the old script15:06
ah, a *newer* version of repoze: https://github.com/futures/RDFDatabank/blob/master/docs/Databank_VM_Installation.txt#L13815:07
I guess it's possible that we're installing too-new a newer version
except that the right version is indicated i the requirements-dev.txt15:12
<fasseg>I'll do a commit and be off for today...15:15
see you guys tomorrow!
* fasseg leaves15:16
* fasseg joins15:20
ermm I get a 403 when trying to commit to github....did someone play with the permissions?
hmm only have readonly access, github tells me15:21
well then the commit will wait until I meet eddie again..see you guys tomrrow..15:22
* fasseg leaves
<barmintor>I could've sworn the AWS SDK included a json marshaller/unmarshaller...15:32
ah, there they are15:35
<cbeer>barmintor: you saw the broker.py and solr_worker.py scripts working, right?16:00
i've presumably just broken them somewhere along the line
<barmintor>I started them, but that's all I can testify to16:01
<barmintor>you lapped me over the weekend
<cbeer>@decide databank or modeshape16:04
bah. no zoia.
barmintor: i'm going to switch back to the akubra serialization stuff, unless you tell me not to.17:13
(because you've gouged your eyes out over my java code and in a fit of rage deleted it all)
<barmintor>cbeer: ok. I'm working on identifying the source of the AWS SDK test failures against icemelt
<cbeer>previously, i was thinking about tapping into the java serialization to do some work for us.. but now i'm thinking about just maintain a log file and calling it good17:14
<barmintor>the java serialization of the inventory?17:15
<barmintor>AFAIK it just returns the JSON
<cbeer>AWS does; but we need to also track things in AWS that aren't in the inventory yet17:16
hm. but maybe we should only serialize out things that aren't in the latest manifest, i suppose17:17
<barmintor>so you're thinking about a log file that we roll over every 24 hours?
<cbeer>yeah, that was my thought. but maybe it's a bad one.
<barmintor>i think you have to do something like that: keep the last 24 hours of archive ids uploaded, and merge them with the inventory17:18
<cbeer>yeah, i have the merge code, i think... just not that tracking the last 24 hours in a durable way
<barmintor>all I can think off the top of my head is timestamp subdirs, then clea out anything more than 24 hrs old whenever you do a new inventory17:19
or, alternately, keep it all, and allow the local cache to be validated against inventory calls (since they cost money, too)17:20
that's a more straightforward approach in some ways
and it saves money17:24
ie, always serve the cached inventory, but if you verify the inventory then check that discrepancies are < 24 hours old17:25
Raised Question: Are deletions also subject to the 24 lag, or is only additions
<cbeer>i believe the inventory is only updated every 24 hours17:28
<barmintor>icemelt is realy giving me a tour of the AWS SDK. That's good, right?17:35
<cbeer>it should be a complete reimplementation of glacier17:36
minus all the annoying bits
and is fast and doesn't cost money :P
it just occurred to me.. i was trying to write performant code to handle that inventory serialization17:38
i should just write it the lazy way.
it's still faster than glacier!
i'll figure out how to test it appropriately.. some time17:43
(and, i just realized, i'm writing terrible, bug-ridden code. oh well)17:48
why would anyone have more than one glacier instance going at a time, anyway.
(other than as a very back-door way to bill people for their usage, if they use asj6f's plugin)17:49
<barmintor>apparently the SDK expects a JSON message back for some error its not getting17:58
cbeer: does icemelt handle errors, or does it just raise exceptions?18:06
that is, does it do Amazon's serialization of errors to json18:07
<cbeer>barmintor: probably not.18:08
but it should return the right error codes at least
that's enough for me, but it's not enough for the Java client :)
<cbeer>hm. maybe i don't trigger any errors in the akubra-glacier tests18:09
maybe also why i didn't add anything
<barmintor>no, I don't think you do. Let me assure you that I did
I'll try to figure out how to do it when I have these last few tests working
<cbeer>k. let me know if you want help in icemelt, but it should be pretty straightforward18:10
<barmintor>there we go18:12
one junit-ism to track down, but the test suite is reproducible with the amazon java client18:13
<barmintor>I think more familiar users mioght have better ways to do some of that, but UGH18:31
<cbeer>hence why i just wrote some ruby tests for it :P18:32
thanks for doing that though. it's nice to know we're actually mostly compliant
<barmintor>I'll look into submoduling it into icemelt later. I need a beer and some food.
np- I think icemelt is actually a useful thing in and of itself, so tests with the "official" client are a no-brainer18:33
it'll give us something to talk about at code4lib :P
* barmintor leaves18:34
* jonathangee leaves20:10
* jonathangee joins
* eddies leaves21:30
* eddies joins21:31
* eddies leaves
* eddies joins