[comment {--- punk::docgen generated from inline doctools comments ---}] [comment {--- punk::docgen DO NOT EDIT DOCS HERE UNLESS YOU REMOVE THESE COMMENT LINES ---}] [comment {--- punk::docgen overwrites this file ---}] [manpage_begin punkshell_module_punk::fileline 0 0.1.0] [copyright "2024"] [titledesc {file line-handling utilities}] [comment {-- Name section and table of contents description --}] [moddesc {punk fileline}] [comment {-- Description at end of page heading --}] [require punk::fileline] [keywords module text parse file] [description] [para] - [section Overview] [para]Utilities for in-memory analysis of text file data as both line data and byte/char-counted data whilst preserving the line-endings (even if mixed) [para]This is important for certain text files where examining the number of chars/bytes is important [para]For example - windows .cmd/.bat files need some byte counting to determine if labels lie on chunk boundaries and need to be moved. [para]Despite including the word 'file', the library doesn't deal with reading/writing to the filesystem. It is for operating on text-file like data. [subsection Concepts] [para]A chunk of textfile data (possibly representing a whole file - but usually at least a complete set of lines) is loaded into a punk::fileline::class::textinfo instance at object creation. [example_begin] package require punk::fileline package require fileutil set rawdata [lb]fileutil::cat data.txt -translation binary[rb] punk::fileline::class::textinfo create obj_data $rawdata puts stdout [lb]obj_data linecount[rb] [example_end] [subsection Notes] [para]Line records are referred to by a zero-based index instead of a one-based index as is commonly used when displaying files. [para]This is for programming consistency and convenience, and the module user should do their own conversion to one-based indexing for line display or messaging if desired. [para]No support for lone carriage-returns being interpreted as line-endings. [para]CR line-endings that are intended to be interpreted as such should be mapped to something else before the data is supplied to this module. [subsection dependencies] [para] packages needed by punk::fileline [list_begin itemized] [item] [package {Tcl 8.6}] [list_end] [comment {- end dependencies list -}] [subsection {optional dependencies}] [para] packages that add functionality but aren't strictly required [list_begin itemized] [item] [package {punk::ansi}] [para] - recommended for class::textinfo [method chunk_boundary_display] [item] [package {punk::char}] [para] - recommended for class::textinfo [method chunk_boundary_display] [item] [package {overtype}] [para] - recommended for class::textinfo [method chunk_boundary_display] [list_end] [comment {- end optional dependencies list -}] [section API] [subsection {Namespace punk::fileline::class}] [para] class definitions [list_begin enumerated] [enum] CLASS [class textinfo] [list_begin definitions] [para] [emph METHODS] [call class::textinfo [method constructor] [arg datachunk] [opt {option value...}]] [para] Constructor for textinfo object which represents a chunk or all of a file [para] datachunk should be passed with the file data including line-endings as-is for full functionality. ie use something like: [example_begin] fconfigure $fd -translation binary set chunkdata [lb]read $fd[rb]] or set chunkdata [lb]fileutil::cat -translation binary[rb] [example_end] [para] when loading the data [call class::textinfo [method chunk] [arg chunkstart] [arg chunkend]] [para]Return a range of bytes from the underlying raw chunk data. [para] e.g The following retrieves the entire chunk [para] objName chunk 0 end [call class::textinfo [method chunklen]] [para] Number of bytes/characters in the raw data of the file [call class::textinfo [method chunk_boundary_display]] [para]Returns a string displaying the boundaries at chunksize bytes between chunkstart and chunkend [para]Defaults to using ansi colour if punk::ansi module is available. Use -ansi 0 to disable colour [call class::textinfo [method linecount]] [para] Number of lines in the raw data of the file, counted as per the policy in effect [call class::textinfo [method line] [arg lineindex]] [para]Reconstructs and returns the raw line using the payload and per-line stored line-ending metadata [para]A 'line' may be returned without a line-ending if the unerlying chunk had trailing data without a line-ending (or the chunk was loaded under a non-standard -policy setting) [para]Whilst such data may not conform to definitions (e.g POSIX) of the terms 'textfile' and 'line' - it is useful here to represent it as a line with metadata le set to "none" [para]To return just the data which might more commonly be needed for dealing with lines, use the [method linepayload] method - which returns the line data minus line-ending [call class::textinfo [method linepayload_find_glob] [arg globsearch] [opt {option value...}]] [para]Return a lineinfolist (see [method lineinfo] and [method lineinfolist]) of lines where payload matches the [arg globsearch] string [para]To limit the returned results use the -limit n option - where -limit 0 means return all matches. [para]For example: [method linepayload_find_glob] "*test*" -limit 1 [para]The result is always a list of lineinfo dictionaries even if one item is returned [para] -limitfrom can be start|end [para]The order of results is always the order as they occur in the data - even if -limitfrom end is specified. [para]-limitfrom end means that only the last -limit items are returned [para]Note that as glob accepts [lb]chars[rb]] to mean match any character in the set given by chars, searching for literal square brackets should be done by escaping the bracket with a backslash [para]This is true even if only a single square bracket is being searched for. e.g {*[lb]file*} will not find the word file followed by a left square-bracket - even though the search didn't close the square brackets. [para]In the above case - the literal search should be {*\[lb]file*} [call class::textinfo [method linepayload] [arg lineindex]] [para]Return the text of the line indicated by the zero-based lineindex [para]The line-ending is not returned in the data - but is still stored against this lineindex [para]Line Metadata such as the line-ending for a particular line and the byte/character range it occupies within the chunk can be retrieved with the [method linemeta] method [para]To retrieve both the line text and metadata in a single call the [method lineinfo] method can be used [para]To retrieve an entire line including line-ending use the [method line] method. [call class::textinfo [method linepayloads] [arg startindex] [arg endindex]] [para]Return a list of just the payloads in the specified linindex range, with no metadata. [call class::textinfo [method linemeta] [arg lineindex]] [para]Return a dict of the metadata for the line indicated by the zero-based lineindex [para]Keys returned include [list_begin itemized] [item] le [para] A string representing the type of line-ending: crlf|lf|none [item] linelen [para] The number of characters/bytes in the whole line including line-ending if any [item] payloadlen [para] The number of character/bytes in the line excluding line-ending [item] start [para] The zero-based index into the associated raw file data indicating at which byte/character index this line begins [item] end [para] The zero-based index into the associated raw file data indicating at which byte/character index this line ends [para] This end-point corresponds to the last character of the line-ending if any - not necessarily the last character of the line's payload [list_end] [call class::textinfo [method lineinfo] [arg lineindex]] [para]Return a dict of the metadata and text for the line indicated by the zero-based lineindex [para]This returns the same info as the [method linemeta] with an added key of 'payload' which is the text of the line without line-ending. [para]The 'payload' value is the same as is returned from the [method linepayload] method. [call class::textinfo [method lineinfolist] [arg startidx] [arg endidx]] [para]Returns list of lineinfo dicts for each line in line index range startidx to endidx [call class::textinfo [method linerange_to_chunkrange] [arg startidx] [arg endidx]] [call class::textinfo [method linerange_to_chunk] [arg startidx] [arg endidx]] [call class::textinfo [method lines] [arg startidx] [arg endidx]] [call class::textinfo [method linepayloads] [arg startidx] [arg endidx]] [call class::textinfo [method chunkrange_to_linerange] [arg chunkstart] [arg chunkend]] [call class::textinfo [method chunkrange_to_lineinfolist] [arg chunkstart] [arg chunkend] [opt {option value...}]] [para]Return a list of dicts each with structure like the result of the [method lineinfo] method - but possibly with extra keys for truncation information if -show_truncated 1 is supplied [para]The truncation key in a lineinfo dict may be returned for first and/or last line in the resulting list. [para]truncation shows the shortened (missing bytes on left and/or right side) part of the entire line (potentially including line-ending or even partial line-ending) [para]Note that this truncation info is only in the return value of this method - and will not be reflected in [method lineinfo] queries to the main chunk. [call class::textinfo [method numeric_linerange] [arg startidx] [arg endidx]] [para]A helper to return any Tcl-style end end-x values given to startidx or endidx; converted to their specific values based on the current state of the underlying line data [para]This is used internally by API functions such as [method line] to enable it to accept more expressive indices [call class::textinfo [method numeric_chunkrange] [arg startidx] [arg endidx]] [para]A helper to return any Tcl-style end end-x entries supplied to startidx or endidx; converted to their specific values based on the current state of the underlying chunk data [call class::textinfo [method normalize_indices] [arg startidx] [arg endidx] [arg max]] [para]A utility to convert some of the of Tcl-style list-index expressions such as end, end-1 etc to valid indices in the range 0 to the supplied max [para]Basic addition and subtraction expressions such as 4-1 5+2 are accepted [para]startidx higher than endidx is allowed [para]Unlike Tcl's index expressions - we raise an error if the calculated index is out of bounds 0 to max [call class::textinfo [method regenerate_lines]] [para]generate a list of lines from the current state of the stored raw data chunk and keep a map of line-endings indexed by lineindex [para]This is called automatically by the Constructor during object creation [para]It is exposed in the API experimentally - as chunk and line manipulation functions are considered. [para]TODO - review whether such manual control will be necessary/desirable [list_end] [list_end] [comment {--- end class enumeration ---}] [subsection {Namespace punk::fileline}] [para] Core API functions for punk::fileline [list_begin definitions] [list_end] [comment {--- end definitions namespace punk::fileline ---}] [subsection {Namespace punk::fileline::lib}] [para] Secondary functions that are part of the API [list_begin definitions] [call [fun lib::range_spans_chunk_boundaries] [arg start] [arg end] [arg chunksize]] [para]Takes start and end offset, generally representing bytes or character indices, and computes a list of boundaries at multiples of the chunksize that are spanned by the start and end range. [list_begin arguments] [arg_def integer start] [para] zero-based start index of range [arg_def integer end] [para] zero-based end index of range [arg_def integer chunksize] [para] Number of bytes/characters in chunk - must be positive and > 0 [list_end] [para]returns a dict with the keys is_span and boundaries [para]is_span 0|1 indicates if the range specified spans a boundary of chunksize [para]boundaries contains a list of the spanned boundaries - which are always multiples of the chunksize [para]e.g [example_begin] range_spans_chunk_boundaries 10 1750 512 is_span 1 boundaries {512 1024 1536} [example_end] [para]The -offset option [example_begin] range_spans_chunk_boundaries 10 1750 512 -offset 2 is_span 1 boundaries {514 1026 1538} [example_end] [para] This function automatically uses lseq (if Tcl >= 8.7) when number of boundaries spanned is approximately greater than 75 [list_end] [comment {--- end definitions namespace punk::fileline::lib ---}] [section Internal] [subsection {Namespace punk::fileline::system}] [para] Internal functions that are not part of the API [subsection {Namespace punk::fileline::ansi}] [para]These are ansi functions imported from punk::ansi - or no-ops if that package is unavailable [para]See [package punk::ansi] for documentation [list_begin definitions] [call [fun ansi::a]] [call [fun ansi::a+]] [call [fun ansi::stripansi]] [list_end] [comment {--- end definitions namespace punk::fileline::ansi ---}] [manpage_end]