forked from xrootd/xrootd-ceph
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buffered io merge into master #44
Open
Jo-stfc
wants to merge
19
commits into
master
Choose a base branch
from
bufferedIO
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Buffer implementation for XrdCeph * Better error return code values * Add timing into BufferIO * Add timing into BufferSimple * Utils code area * Update raw data access and copy * Adding Extents * ReadV simple logic * Add to own files the readV implementations * Add to own files the readV implementations; cmake updated * Logging improvements and write buffer updates * Add IOadapter with blocking aio access * Use IOadapter with blocking aio access * Small logging update * Reduce logging information; fix timeing to ms * Reduce logging information; * Reduced logging, and better use of aggregated metrics * comment clean and typo fixes * Remove uncessary file close * Additional logging in case of problems * Additional logging in case of problems * allow option for buffering with IO or AIO buffer Co-authored-by: james <[email protected]> Co-authored-by: root <[email protected]>
* variable rpm name * Update xrootd-ceph.spec.in * Update makesrpm.sh * Update makesrpm.sh
* Buffer implementation for XrdCeph * Better error return code values * Add timing into BufferIO * Add timing into BufferSimple * Utils code area * Update raw data access and copy * Adding Extents * ReadV simple logic * Add to own files the readV implementations * Add to own files the readV implementations; cmake updated * Logging improvements and write buffer updates * Add IOadapter with blocking aio access * Use IOadapter with blocking aio access * Small logging update * Reduce logging information; fix timeing to ms * Reduce logging information; * Reduced logging, and better use of aggregated metrics * comment clean and typo fixes * Remove uncessary file close * Additional logging in case of problems * Additional logging in case of problems * allow option for buffering with IO or AIO buffer * fix conflicts * Allow for finite retries on EBUSY, else fail with EIO. It is possible for a read/write from the buffer to return EBUSY due to an underlying issue. In these cases, if the -EBUSY is returned out of XrdCeph, a large number of retries can originate. It is better at this point for the transfer to be flagged as failed, and retried properly. The code allows for 5 retries with a 1s sleep between them. If this doesn't work - which it might not - then an -EIO error is returned to xrootd. Other error messages are not affected. * Better summary stats output for CephIOAdapterRaw * Comment out a comment Co-authored-by: james <[email protected]> Co-authored-by: root <[email protected]>
* variable rpm name (#17) * variable rpm name * Update xrootd-ceph.spec.in * Update makesrpm.sh * Update makesrpm.sh * Master cephnamelib (#16) * Allow ceph.namelib to take params and apply translation to full path * Reduce logging Remove extraneous logging messages * simplify parsing of namelib and added a log line for any remapped file Co-authored-by: James <[email protected]> * XRD-22 Fix ensuring the correct filename is passed to the CephFile instance. (#24) A regression in previous commit meant that the filename was not correctly passed to the CephFile instance. This fix ensures that the filename is set correctly. Co-authored-by: james <[email protected]> * re-introduce variable names to spec input (#27) Co-authored-by: Jo-stfc <[email protected]> Co-authored-by: James <[email protected]>
Reduced printouts. Only summary stats now produced, rather than the logging per read. Co-authored-by: James Walder <[email protected]>
* XRD-12 Add timestamp information for ceph logging methods Update the logwrapper method to print out the current timestamp in the initial section of output. * Return permission denied on write attempt on existing file with EXCL set (#31) Co-authored-by: James Walder <[email protected]> * disable posc (#30) posc is disabled for proxies, but not for a unified setup. XrdCeph does not support the posc flag as it misinterprets objects as folders Co-authored-by: James Walder <[email protected]> Co-authored-by: Jo-stfc <[email protected]>
* Add multiple buffer support for reads in case of simultaneous threads reading the same file. * Further refinements to the simultaneous file reads code - Ensure all relevent read / write methods will create a buffer if needed - Validty check on close that a buffer was actually created (or bypass code if not) - Bugfix in case of odd read sizes combined with multi/split buffer reads (critical) - Clean of comments included for development * Enhanced logging for cluster metrics and readV layer improvments (#35) - dumpCLusterInfo to check on the rados connection info - extra logging in a delete to give info on delete times - update the readV basic alg to do a simple bulk request Co-authored-by: James Walder <[email protected]> * Add time taken to unlink a file in the logging message - Logging an unlink now includes the time taken, in cases of (un)successful deletes - Remove some extraneous comments * - Fix issue with buffer passthrough read - Add maximum number of simultaneous buffers for a given file Once a given number of opens have been made against the same file, don't create a large buffer, and only create a 1MiB buffer for each new file. This should avoid issues with small paged reads, but would normally hope the pasthrough mode would be triggered in each read. * Additional statistics on buffered reading added. - Will report bytes read from ceph, bytes read but bypassed the cache, and the cache hit fraction --------- Co-authored-by: James Walder <[email protected]>
…40) * Bug fix for writes with bufferedIO when extending over buffer range. - Fix for case where multiple writes to the buffer are needed for a given xrd write request - Previously threw an error; now will correctly perform the multiple writes as required. - Set the Simple Data buffer capacity to the input size, rather than the capacity of the vector, which could be larger. --------- Co-authored-by: James Walder <[email protected]>
* test * fix merge conflict * extra bracket * misplaced bracket * StatLS only takes pool name from section of object path before first colon ':' * Tidy reporting of pool name to ignore some exraneous characters * Add XrdSys/XrdSysPlatform.h to get MAXPATHLEN * Bug fix for writes with bufferedIO when extending over buffer range. (#40) (#41) * Bug fix for writes with bufferedIO when extending over buffer range. - Fix for case where multiple writes to the buffer are needed for a given xrd write request - Previously threw an error; now will correctly perform the multiple writes as required. - Set the Simple Data buffer capacity to the input size, rather than the capacity of the vector, which could be larger. --------- Co-authored-by: snafus <[email protected]> Co-authored-by: James Walder <[email protected]> --------- Co-authored-by: Ian Johnson <[email protected]> Co-authored-by: snafus <[email protected]> Co-authored-by: James Walder <[email protected]>
* variable rpm name (#17) * variable rpm name * Update xrootd-ceph.spec.in * Update makesrpm.sh * Update makesrpm.sh * Master cephnamelib (#16) * Allow ceph.namelib to take params and apply translation to full path * Reduce logging Remove extraneous logging messages * simplify parsing of namelib and added a log line for any remapped file Co-authored-by: James <[email protected]> * XRD-22 Fix ensuring the correct filename is passed to the CephFile instance. (#24) A regression in previous commit meant that the filename was not correctly passed to the CephFile instance. This fix ensures that the filename is set correctly. Co-authored-by: james <[email protected]> * XRD-12 Add timestamp information for ceph logging methods Update the logwrapper method to print out the current timestamp in the initial section of output. * re-introduce variable names to spec input (#27) * Return permission denied on write attempt on existing file with EXCL set (#31) Co-authored-by: James Walder <[email protected]> * disable posc (#30) posc is disabled for proxies, but not for a unified setup. XrdCeph does not support the posc flag as it misinterprets objects as folders * Disk space reporting (#36) * Provide XrdCephOss::StatLS and ceph_posix_stat_pool to enable disk space reporting. Responds to the 'xrdfs query space' command as requested by ALICE VO * Remove ts() timestamp function and unnecessary #defines * Read ceph.poolnames setting from XRootD config to specify reportable pools. * Support 'xrdfs spaceinfo' via Stat() method returning XrdOssOK for stat'ing 'pool:' * Tidy up tracing of Stat* calls * Remove unwanted method isPathReportablePool * Add comments for need to support stat-ing '/' * Return -ENOMEM if malloc fails * Return -ENOMEM if malloc fails * Rename disk space reporting config item to ceph,reportingppols and log if the list of names is not present. Report if ceph_posix_stat_pool call to get the amount of used space fails * Sanitize incoming pool name and allow for MonALISA format * Optional tracing of Stat* incoming paths and response. Remove double logging of ceph.reporting pools. * Check that sanitized pool name is not marked invalid * Use ceph namelib translation at Oss level by copying translateFileName logic from Posix level. More error checking if stat can't find pool name. * Remove superfluous comments * Ensure tracing of path arguments to Stat() and StatLS(). Add Doxygen-style commments to changed methods * Make source tarball only as minimum output * Add make-src-tar.sh to additionally place required source tarball in '--output' destination * Change back usedSpace to totalSpace in ceph_posix_statfs * feat: improve (vector) read implementation (#37) Try to avoid usage of libradosstriper for readv operations since it may impact performance significantly. To do so we explicitly determine the objects that constitute a file and read from them using rados only. Reads are async. To do these async reads conveniently we introduce a class for handling multiple async read requests. * Initial implementation of ReadV at the XrdOss level * Correct the signature of ReadV to XrdCephOssFile * feat: do not use libradosstiper for readv operation * feat: use atomic operations for readv requests This should be the most efficient way of handling multiple read ops. * feat: use nonstriper reads for pread requests * feat: use nonstriper reads for read operations also To do so we do complete refactoring: bulkAioRead class moved to a separate file, and its features extended. Namely, it can do reads from files, not only objects, now. * feat: print warning message if waiting for aio reads from ceph takes long This is useful for debugging the reasons of failures for read(v) requests. * Added some comments * fix: use size_t for start_block We can use "%zx" in sprintf, so let's unify the types of variables in the function. This will also allow us to extend limitations on the file size. * feat: refactor BulkAioRead::read method, suggested during review 1. Rename end_block to last_block 2. Move variable definitions closer to its usage 3. Use 'std::min' instead of 'if' for chunk_len determination 4. Use more efficient chunk_start calculation * feat: add options to allow one to switch to standard read mechanisms This may be useful for testing. * feat: rename block_size to object_size in BulkAioRead New name better describes reality, since we are talking about the size of ceph objects. * feat: rename wait_for_complete to submit_and_wait_for_complete New name describes this function better. * feat: use more meaningful names for variables that loops over operations map op_data should describe the contents of the variables better. * feat: move type definitions into the class * feat: added comments with method's description * feat: remove unnecessary semicolons * feat: convert wait_for_complete method from void to int This allow one to improve several things. Here we change key to the operations and use object number instead of full its name. * fix: fixed comment * fix: fixed comments * feat: refactor bulkAioRead class Pointers were dropped from objectReadOperation and ceph_bufferlist objects. The objects are moved to appropriate classes to simplify memory management and usage. * feat: take into account completion's return value We can retrieve return code from completion and get meaningful status of the whole operation with this value. * feat: allow reading of sparse file Since we do not really expect sparse files, we use a fallback mechanism: if a read(v) failed with -ENOENT exit status, then just resubmit it using striper-based functions. * lint: remove trailing whitespaces * feat: use meaningful names for read(v) functions The name now indicates whether read(v)s are striper or non-striper ones. * feat: fallback to striper-based read if number of stripes > 1 Just in case, such files should not be present in our production setup * feat: allow zero-sized reads In principle, this is a correct request, so we should support it. * fix: make sure we do not delete completion objects until submitted operation is completed This is done to prevent some nasty side-effects, e.g. writing to a deleted buffer. * fix: remove move constructor from bulkAioRead We do not use it. * fix: handle failure to allocate completion Completion allocation can fail, we should take that into an account. * feat: use file reference to construct readOp objects There is no need to extract (and the copy) file name and object size from file reference to construct read object, we can use file reference directly. * feat: replace conversion operator with explicit method Implicit conversion was making code less readable. * feat: remove call to is_complete() in completion wrapper destructor There is no need to check for completion, we can call wait_for_complete multiple times. * feat: put warning threshold to config file It is better to have this value as configurable instead of hardcoded. * fix: initialize return code variable in ReadOpData * Added comment * feat: add comment for future optimization. We should use `aio_cancel` to cancel all pending read operations in future. * fix: remove vim's swp file Commited by accident * feat: improve logging Add file descriptor to sparse file's logging, fix typos. * fix: minor fixes Remove unnecessary include, move variable declaration closer to the usage, fix spelling in the comment. * feat: BulkAioRead::read method refactoring Refactoring was made to increase (hopefully) readability. * fix: better wording for comment * feat: BulkAioRead::read -- change loop exit condition We can exit when `to_read == 0`. This allow us to drop `end_block` variable. * fix: add call to `clear` after getting results This is to allow clients to use the same readOp object for future operations. --------- Co-authored-by: Ian Johnson <[email protected]> Co-authored-by: Alexander Rogovskiy <[email protected]> * duplicate struct definition * move struct definition to headers * use bufferedIO version of path * remove MAXPATHLEN redefinition --------- Co-authored-by: snafus <[email protected]> Co-authored-by: James <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: Ian Johnson <[email protected]> Co-authored-by: alex-rg <[email protected]> Co-authored-by: Alexander Rogovskiy <[email protected]>
* Add capability for buffer io raw to use striperless reads * Add capability for buffer io raw to use striperless reads * Add a maybe striper for reading in ceph posix * Use striperless reads when bypassing the buffer
Remove verbose logging for case when cache is bypassed, as the read size is at least the size of the buffer.
* catch division by 0 in CephIOAdapterRaw.cc, increase granularity to nanoseconds * long to unsigned long long explicit typecasting
return read return value when triggering error while read
* get stripeunit and object size from xattr of first stripe use striper.layout.object_size, not striper.size as that is the size of the whole object get the striper layout info on file open use min of return code of object striper layout metadata * use striper.layout.object_size, not striper.size as that is the size of the whole object * improvements from review --------- Co-authored-by: root <[email protected]>
* clean garbage from rados read * static alloc * static alloc * static alloc needs manual null * comments and warning for nondefault params * add filename in log * add filename in log * code review changes * c++14 compatibility fixes --------- Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.