Log of the #fcrepo channel on chat.freenode.net

Using timezone: Eastern Standard Time
* peichman joins00:59
* peichman leaves01:05
* github-ff joins07:20
[fcrepo-import-export] escowles pushed 1 new commit to master: https://git.io/v1P9Z
fcrepo-import-export/master 202cd60 Esmé Cowles: Merge pull request #62 from fcrepo4-labs/FCREPO-2351...
* github-ff leaves
* github-ff joins
[fcrepo-import-export] escowles deleted FCREPO-2351 at b9c1855: https://git.io/v1P9n
* github-ff leaves
* coblej joins08:24
* dwilcox joins09:07
* bseeger joins09:12
* whikloj joins09:15
* github-ff joins09:18
[fcrepo-import-export] whikloj pushed 1 new commit to master: https://git.io/v1XeU
fcrepo-import-export/master 28a17e5 Jared Whiklo: Merge pull request #54 from fcrepo4-labs/upload-fixity...
* github-ff leaves
* dwilcox leaves09:24
* osmandin joins09:28
* osmandin leaves09:35
* westgard joins09:43
* dhlamb joins09:44
<escowles>[Import/Export standup]09:47
* yesterday: initial BagIt export implementation and wrote tickets for rest of BagIt work
* today: working through the BagIt tickets and reviewing PRs
* blockers: some dependencies between the BagIt tickets, but shouldn't be a problem
* github-ff joins09:52
[fcrepo-import-export] whikloj pushed 1 new commit to master: https://git.io/v1XTB
fcrepo-import-export/master 1105158 Jared Whiklo: Merge pull request #60 from fcrepo4-labs/export-bags-techmd...
* github-ff leaves
<westgard>[Import/Export Standup]09:56
Finished yesterday:
Only a few discussions; my day job reared its ugly head.
Working on today:09:57
Hopefully the external binaries and error handling ticket for the verification tool.
Blockers:
None.
* awead joins10:01
* dwilcox joins
<awead>[Import/Export Standup]10:02
Finished Monday:
Nothing, was in transit.
Working on today:
Hope to finish reviewing https://jira.duraspace.org/browse/FCREPO-2225
Blockers:
time.
<whikloj>[Import/Export Standup]10:03
Yesterday: Reviewing some tickets and IT for Bagit
Today: Thinking about Bagit Import (FCREPO-2350)
Blockers: None
<escowles>whikloj: i think basic import of bags just works — do you think we should do verification using the bag manifests? or should we leave that for external bag-validation tools?10:08
<whikloj>escowles: I was just reading your note there, I'm not sure. If we use the bagit manifests then we allow for importing stuff that wasn't necessarily exported using the same tool no?10:09
* coblej leaves
<whikloj>escowles: but adds more complexity
<escowles>whikloj: we probably shouldn't do anything to make that easier at this point, so we can assume all of the files are going to be the same for bag vs. non-bag10:10
* peichman joins
<escowles>but, in theory, it seems like a good idea to allow that in the future
<whikloj>escowles: Then we could also leave it as is and add that as a next phase goal.
<escowles>whikloj++
i was mostly thinking we could add an extra step — if we know there is a bag, then we have checksums for the RDF files and could validate those too (not just the binaries)10:11
<informatician>[Import/Export Standup]
Finished Yesterday:
- Successfully setup python 3.x on rhel 6.8 (prod data os) using "pyenv" and "pyenv-virtualenv".
- Successfully ran a new export of production data using latest version of export tool.
Working on Today:
- Setting up a python debugging environment.
- Troubleshooting http 401 errors (described below). Troubleshooting suggestions appreciated.
Blockers:
- Still hitting http 401 errors when attempting to run verify.py.
- When using valid credentials, receive (via stdout): urllib.error.HTTPError: HTTP Error 401: Unauthorized
- When using invalid credentials, receive (via log): Error communicating with repository. Response: 401 for node [url]
* bseeger leaves
<whikloj>escowles: that's not a bad idea10:12
* github-ff joins10:13
[fcrepo-import-export] escowles created bag-profile (+1 new commit): https://git.io/v1XLN
fcrepo-import-export/bag-profile 5073c41 Esmé Cowles: Initial Bag profile implementation
* github-ff leaves
* github-ff joins10:14
[fcrepo-import-export] escowles opened pull request #63: Initial Bag profile implementation (master...bag-profile) https://git.io/v1XLj
* github-ff leaves
* coblej joins10:15
<escowles>awead: more testing on https://jira.duraspace.org/browse/FCREPO-2225 is good, the next thing that could use some good testing is https://jira.duraspace.org/browse/FCREPO-2354 (profiles to change which manifests get used)10:17
<awead>escowles++10:18
escowles: yeah, I had the wrong link
I can’t manage my Jira tickets.
* coblej leaves10:20
<escowles>awead: i've got mixed feelings about Jira: i generally like github issues better, but Jira's got some nice features (I've seen some really good work with custom Jira workflows too)
<awead>I’m usually good with Jira, but it’s hard to keep track of things when it’s large, like Duraspace’s instance.10:21
escowles: I was looking for the ticket on corrupted binaries, but it looks like that got merged. Did you need me to do any more testing on that?10:23
<escowles>awead: which ticket is that?10:24
<awead>escowles: umm… yeah… good question :)
<escowles>awead: i think it was this pr: https://github.com/fcrepo4-labs/fcrepo-import-export/pull/54
looks like dbernstein tested it
* peichman leaves
<awead>yes.10:25
TFW you use Github to find your Jira tickets.
<escowles>awead: so i'd say that https://github.com/fcrepo4-labs/fcrepo-import-export/pull/63 is the highest priority
<awead>ok
<westgard>informatician: so is the issue that the error messages/response codes are the opposite of what they should be? The tool currently won't deal well with not being able to reach Fedora and will just break off. There's already a ticket to fix that.
<escowles>awead: the other thing is that the readme doesn't have any instructions on exporting/importing bags yet, so adding that would be good10:26
<westgard>If the response codes are reversed, that is very strange, but hopefully we can sort that out as well in the context of better error handling.
* acoburn joins
<escowles>basically, there's a --bag argument that can take the name of an existing profile (default, aptrust, metaarchive) or a filename if you create your own or customize
* coblej joins10:27
<awead>escowles++
escowles: are the profiles “built-in” or do I need to supply one?
* peichman joins
<awead>nm.
“existing profiles"
<escowles>the default, metaarchive and aptrust ones are built-in in the jar file
so you can just do "--bag default" and it'll load it from the jar file10:28
<awead>escowles++10:29
I’m on it.
* manez joins
<awead>ruebot: amending my standup: Working on today:10:31
Hope to finish reviewing https://jira.duraspace.org/browse/FCREPO-2354
* github-ff joins
[fcrepo-camel-toolbox] acoburn opened pull request #124: Separate blueprint from java code (master...fcrepo-2339) https://git.io/v1Xmb
* github-ff leaves
* coblej leaves
* manez leaves
* dwilcox leaves10:32
* coblej joins10:34
<westgard>informatician: it's probably also worth noting that with the changes to the import-export tooling over the past week, not every case is currently being handled by the verification tool, so testing it against production data (i.e. data that's not as uniform/tightly controlled as some of the test datasets), while helpful in uncovering problems, will also be hard to troubleshoot since there are known bugs and10:35
other issues with the tool rising out of the recent changes to import/export.
That said, it should work against data which includes or excludes binaries as long as they are in Fedora (not external), at least for exports (imports haven't been thoroughly tested as far as I know).10:38
* manez joins10:43
* manez leaves10:44
<informatician>westgard: It's not that they're switched or flipped but rather that the differing behavior is confirmation that the credentials are valid but that the script is still failing. It's possible that the script is unable to reach this particular fedora instance for some reason. I could look into the related ticket (I feel it may be at least closely rela10:47
ted). Do you remember off hand which ticket that is?
westgard: As a side note, I did have issues earlier with records where there were references to binaries that were not present locally, but I've resolved all those issues so the production data set doesn't have those obstacles at least.10:48
* informatician leaves10:51
* informatician joins10:52
* manez joins10:53
* bseeger joins11:04
* github-ff joins11:06
[fcrepo-camel-toolbox] acoburn closed pull request #123: Adds developer information (master...dev-update) https://git.io/v1iG9
* github-ff leaves
* dwilcox joins11:07
* github-ff joins11:16
[fcrepo-camel-toolbox] acoburn opened pull request #125: Update the ActiveMQ pooling defaults (master...fcrepo-2357) https://git.io/v1XCv
* github-ff leaves
<bseeger>informatician, westgard: did you figure out the auth issues? A 401 in either case implies that it reached the server. The first one looks the code threw an exception. But if you got a 401 then you reached the fedora server.11:18
* travis-ci joins11:19
fcrepo4-exts/fcrepo-camel-toolbox#318 (master - eeda117 : Aaron Coburn): The build passed.
Change view : https://github.com/fcrepo4-exts/fcrepo-camel-toolbox/compare/7f744c1a8183...eeda117c3f6e
Build details : https://travis-ci.org/fcrepo4-exts/fcrepo-camel-toolbox/builds/183636139
* travis-ci leaves
* thomz leaves
<westgard>bseeger, informatician: the ticket for better error handling is FCREPO-2329; it does seem like the credentials are not being passed through correctly with every request11:24
<f4jenkins>Project fcrepo-camel-tests build #101: UNSTABLE in 7 min 30 sec: http://jenkins.fcrepo.org/job/fcrepo-camel-tests/101/11:25
<bseeger>informatician: can you show us the command as you run it? (w/o credentials, of course)11:26
westgard: that would be odd - I know it needs better error handling, but that's odd about the credentials, though it could be the case. I'd be interested in finding out what part of the code threw the above exeception.11:27
informatician: I mean, can you show us the command you ran?11:29
<westgard>bseeger: agreed -- the invalid credentials are erroring on the first is_binary check, which is where you had noted better error handling was needed. The valid credentials are getting past that point, but then hitting a urllib error, which made me think that one of the functions communicatign with fedora might not be passing the auth info correctly.11:31
It would be great if we could pinpoint the location of the error, but at the same time I'm hoping to get to the refactoring we had discussed yesterday later this afternoon, so that will likely overlap with whatever fix we would need to implement for this error.11:33
<f4jenkins>Yippee, build fixed!11:36
Project fcrepo-camel-tests build #102: FIXED in 7 min 9 sec: http://jenkins.fcrepo.org/job/fcrepo-camel-tests/102/
* informatician leaves
<bseeger>westgard: good point - looking at it after the refactoring would be better, if it even still exists. It would be good to know what command informatician ran though.11:38
* informatician joins11:39
<bseeger>acoburn, ruebot, whikloj, anyone else who knows: the fcrepo-connector-file was removed in 4.7.0, right? So the tests on the 4.7.1 test page under "Sanity Builds" for fcrepo-connector-file should be removed, right?11:45
<acoburn>bseeger: yep
<bseeger>acoburn: but they are still listed on the 4.7.0 page and succeeding.11:46
https://wiki.duraspace.org/display/FF/Release+Testing+-+4.7.0
* apb18 joins
<acoburn>bseeger: you mean https://wiki.duraspace.org/display/FF/Release+Testing+-+4.7.1?11:47
<apb18>acoburn: Just FYI - I'm on the road and probably won't get a chance to look at your PR until tonight
* coblej leaves
<acoburn>apb18: no rush and travel safe11:48
<apb18>acoburn: .. but I do see them, and will get to them as soon as I can
<acoburn>apb18++
<apb18>Thanks!
* apb18 leaves
<bseeger>acoburn: I was looking to see what was on the 4.7.0 page - and it's tested there. Wondering why it's there if it was removed in that release.11:49
<acoburn>bseeger: it was part of the 4.7.0 release11:50
bseeger: with (I believe) a deprecation warning
<bseeger>acoburn: ah, that being the last one. Okay, I get it now.
<acoburn>bseeger: and since then no one has stepped up to maintain that project
* manez leaves11:53
<whikloj>acoburn++ # Turf it bseeger
* manez joins11:54
* manez leaves11:58
<whikloj>escowles: How is the bagit importing working? Based on the code it seems like it should fail to resolve the extra "data" directory layer?
* coblej joins
<westgard>bseeger informatician: I have a strong suspicion that this is the problem: https://github.com/fcrepo4-labs/fcrepo-import-export-verify/blob/master/verify.py#L20611:59
<escowles>whikloj: i think it should add the "data" if you have the --bag option specified, so it uses the data dir as the dir to import from
* dwilcox leaves
<whikloj>escowles++ # I totally missed that, you're right
<westgard>bseeger informatician: this line is using the rdflib.parse() method and does not pass any sort of credentials; we should be using requests.get() there instead and then parse the response with rdflib12:00
<whikloj>escowles: So we want to load the manifest and use it to set a Digest for each resource (rdf or non-rdf) as we put them into Fedora?12:01
<awead>escowles: ping
<escowles>whikloj: yeah, i think that's a good idea
awead: pong
<awead>escowles: is importing from bags a requirement of that PR?
<escowles>awead: no — there's another ticket for that — is there an issue with importing?12:02
<awead>escowles: yes, but I can document that elsewhere.
<escowles>awead: cool, the ticket for the import side is https://jira.duraspace.org/browse/FCREPO-2350
<bseeger>westgard: good find!
* coblej leaves12:03
<awead>escowles: exporting is working, but I need to verify the bags a bit more. Could you tell me what the included profiles are? right now I’m only using default, but is there aptrust or others?
<escowles>the others are aptrust and metaarchive
<awead>escowles++12:06
<informatician>bseeger: apologies for the delay, here's the command that I'd run - verify.py -u username:password -c summary.csv -l verify.log -v importexport.config12:08
config file is just -
mode: export
dir: /dlt/scholarsphere/ssrepo1stage/fedora/export
binaries: true
resource: http://localhost:8080/SSstagingFedora4/rest
* coblej joins12:10
<informatician>westgard: thanks for the tips there on the authentication issue. Going to look into those details and see what I can discover.12:11
<bseeger>informatician: thanks for the info.12:14
* coblej leaves12:15
<westgard>informatician++ I would start with line 206 and try changing that to use requests.get instead of parsing the rdf resource directly I'm pretty sure that's the problem
* github-ff joins12:22
[fcrepo-import-export] escowles pushed 1 new commit to bag-profile: https://git.io/v1Xgl
fcrepo-import-export/bag-profile a3c841f Esmé Cowles: Updating profiles based on review of tickets, separating payload digest algorithms from tag digest algorithms
* github-ff leaves
* coblej joins12:42
* coblej leaves12:46
* awead leaves12:47
* dwilcox joins12:51
* coblej joins
* bseeger leaves12:54
* manez joins12:56
* bseeger joins12:59
* peichman leaves13:00
* peichman1 joins13:02
* westgard1 joins13:04
* awead joins13:05
* westgard leaves13:07
* coblej leaves13:09
* github-ff joins13:24
[fcrepo-import-export] escowles pushed 1 new commit to bag-profile: https://git.io/v1Xiu
fcrepo-import-export/bag-profile 067d7d2 Esmé Cowles: Use Hex.encodeHex for checksums with lowercase letters
* github-ff leaves
* bseeger leaves
* awead leaves13:43
* dbernstein joins13:46
[Import/Export Standup]13:48
Finished yesterday:
Nothing. I got pulled onto another project.
Working on today:
https://jira.duraspace.org/browse/FCREPO-235213:49
Add support for user-supplied Bag metadata
Blockers:
None
(sorry for the late report)
* awead joins13:51
* peichman1 leaves13:58
* peichman joins14:01
* dwilcox leaves14:03
* westgard1 leaves14:08
* westgard joins
* jjtuttle leaves14:15
* jjtuttle joins14:19
* dwilcox joins
* github-ff joins14:37
[fcrepo-import-export] whikloj pushed 1 new commit to master: https://git.io/v1XFs
fcrepo-import-export/master aa8f25c Jared Whiklo: Merge pull request #63 from fcrepo4-labs/bag-profile...
* github-ff leaves
<whikloj>escowles/awead: Are the profile tickets resolved too? Fcrepo-2222, Fcrepo-2223 & Fcrepo-222414:38
<awead>whikloj: don’t kno.
<escowles>whikloj: awead: i want to leave those open and let ruebot and other stakeholders confirm before closing14:39
<whikloj>awead/escowles: Sounds good
<awead>escowles++
* dwilcox leaves15:08
* informatician leaves15:13
* Informatician joins15:37
* Informatician leaves15:43
* westgard leaves15:50
* acoburn leaves15:59
* acoburn joins16:01
* acoburn leaves16:02
* bseeger joins16:05
<whikloj>escowles: What is your thoughts on the BagVerifier? I was adding it to check for a valid bag both after export and before import. But in my first test of exporting the LUBM_02 set I get "Too many open files."16:15
<escowles>whikloj: i haven't really looked at BagVerifier — but iirc, the default macosx file limit is very low, so just raising it may help16:17
(are you on macosx?)
<whikloj>escowles: yes I am, so you think it is a good idea to have it run as a sanity check?16:18
<escowles>whikloj: maybe with an option to enable it? presumably it needs to re-read all the files to checksum them, so i'd expect it to take a long time for large bags16:19
<whikloj>escowles: ok, that sounds reasonable
* awead leaves16:34
<whikloj>escowles: I don't know about ExecutorServices, but does this appear to open as many threads as there are files in the manifest?16:42
https://github.com/LibraryOfCongress/bagit-java/blob/master/src/main/java/gov/loc/repository/bagit/verify/BagVerifier.java#L170-L172
<escowles>whikloj: i *think* that opens a finite-size pool of worker threads, not one for each file all at once16:43
whikloj: oh, i think it does open as many threads as it needs: https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Executors.html#newCachedThreadPool--16:45
<whikloj>escowles: I did have a very low file limit (~ 256) but I went to 8192 and still it fails with Too many open files.16:47
* bseeger leaves
<escowles>it's newFixedThreadPool that has a fixed number of threads and reuses them to process all the workers
<whikloj>escowles: ahh okay
<escowles>whikloj: that sounds like a bug to me — there's no reason you should have 8192+ files open to verify a bag
and because it's going to be disk i/o bound anyway, i doubt having more than a handful of threads really speeds it up16:49
<whikloj>escowles: looks like the code on their Github master has changed, used to use a inputstream. So perhaps this is fixed in a newer version16:50
<escowles>whikloj: yes, we're using the last release published to maven, and it's way behind master16:56
i asked today, and they said a 5.0 final release is a few months off — i asked a followup question about pushing out a new beta release sooner but haven't heard back16:57
<whikloj>escowles++ # I'm going to try to build it locally and see if it works without changing my code
<escowles>whikloj: there are quite a few API changes, mostly using Path instead of File and so on16:58
<whikloj>escowles: hrm, well maybe I'll just not use their code to verify bags for now
<escowles>whikloj: that's fine too — i think people can use external tools to validate their bags anyway16:59
<whikloj>escowles: yeah, for export it was nice. But for import I thought it would be really useful to ensure the bag is ready before we start17:00
I'm out talk to you all tomorrow17:01
* whikloj leaves17:02
<escowles>whikloj: i don't think it would be hard to do our own verification — just load the Bag, get the checksums from the manifests, and then compute own checksums to verify they match
* dhlamb leaves18:15
* peichman leaves18:27
* peichman joins22:45
* peichman leaves22:50

Generated by Sualtam