TL:DR; I have to:

a) get names of all the markdown files in a directory

b) sort them by date of creation (new to old)

c) select n number of entries in the above array of entries

d) starting from z position

all without having to read from memory or having to perform complex, intensive operation - any way to do so?

Yes, this is a TypeScript question, but since my codebase is polluted with both JS and TS syntax, I’m okay with a JS answer for now - it should also be almost similar, minus the type stuff. I’m trying to create a blog using mdsvex, but I do not want to use a database at least for now, since I cannot afford a paid hosting solution for now, and I don’t want to lose my written stuff either by the platform locking the database, or erasing it after the free quota is over. I also want to implement pagination here.

In SvelteKit, I’m making use of Vite’s import.meta.glob() to get all the MDX files, and then filter them, based on the date of creation. The issue is that I’m forced to add my own createdAt YAML-like data entry that MDX uses at the top of it’s markdown syntax, because I’m not sure how to access the file’s metadata to get it’s birth (creation) date using TypeScript. I also feel like it does not end just there - since it is an Object type, I’m also forced to convert it to array using the Object.entries() function, as well as slicing of the array later on, after which I’m able to use createdAt to sort files and then filter, which is adding to the server overhead.

Is it possible to filter markdown files, such that they’re sorted by the file’s internal metadata (something similar to stat Linux command, but TypeScript)? I’m interested in the date of birth of file here, in the descending order - new files at top, old files at the bottom. I want to also be able to use the slice() function effectively to get the first n number of markdown files (n here is the pageSize), while also being able to specify to read from z position (z = n * paginationCount), so that I can filter the file without reading it?

Here’s an example tutorial code from @collinsworth@hachyderm.io.

    • velox_vulnus@lemmy.mlOP
      link
      fedilink
      arrow-up
      1
      ·
      7 months ago

      Actually, I did ask Bard, but I’ve been told by the LLM to use fs.readdirSync() - with the warning that it is still going to be memory-intensive. Here’s the prompt:


      Click to reveal prompt

      Yes, fs.readdirSync can be an efficient way to perform the tasks you described, but there are a few things to keep in mind:

      1. Efficiency Considerations:

      • File System Access: fs.readdirSync does involve accessing the file system, which can be slower than accessing data from memory. However, for a relatively small number of files, the overhead of file system access is likely to be negligible.

      • Memory Usage: fs.readdirSync will read the entire contents of the directory into memory, which can potentially consume a significant amount of memory if the directory contains a large number of files. However, if you have sufficient memory available, this is unlikely to be a major concern.

      2. Optimization Techniques:

      • Sorting by Creation Time: You can sort the list of files by creation time using the fs.statSync function. This function returns the file’s modification time, which you can use to compare the creation times of two files.

      • Selecting Subset of Entries: You can select a subset of entries in the list of files by slicing the array. For example, to select the most recent 10 files, you can use the following code:

      const selectedFiles = files.slice(-10);
      
      • Starting from Z Position: You can start from a specific position in the list of files by using the slice function with the start parameter. For example, to select the 10 files starting from position 20, you can use the following code:
      const selectedFiles = files.slice(20, 30);
      

      3. Alternative Approaches:

      • Async Iteration: If you want to avoid reading the entire directory into memory at once, you can use an asynchronous iteration approach. This involves reading the directory contents in chunks and processing each chunk as it becomes available.

      • Recursive Processing: If you need to process files in multiple directories, you can use a recursive approach. This involves recursively processing each subdirectory until you have reached the desired depth.

      In general, fs.readdirSync is a simple and efficient way to get names of all the files in a directory, sort them by date of creation, and select a subset of entries without having to read from memory or having to perform complex, intensive operations. However, if you have a very large number of files or if you need to perform more complex operations on the files, you may need to consider alternative approaches.


      I’ve also been thinking of alternative approach, like for example, instead of title as the file name, make use of date in UTC format (generate them using date -Is) as the file name. But I feel like this is pretty redundant, as I should be using the metadata.

      Right now, I’m thinking of moving to Prisma to avoid this mess.

      • ubermeisters@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        7 months ago

        I haven’t tried Bard for this, as I’ve had such success via Bing chat, but since they are built on similar tech, I expect they give similar answers. Might be worth trying big chat just to see if it has a slightly different answer though.

        I know people talk shit about LLMs and talk down to folk like me that use them ad a modern day search engine… but I swear they are super fucking useful in this kind of research.

      • ubermeisters@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        7 months ago

        Input your post text into CoPilot and it returned::


        Sure, you can use the fs module in Node.js to read the directory and get the metadata of the files. Here’s a simple example in JavaScript:

        javascript
        const fs = require('fs');
        const path = require('path');
        
        // Get the directory name
        const dir = './your-directory';
        
        // Read the directory
        fs.readdir(dir, (err, files) => {
          if (err) {
            console.error("Could not list the directory.", err);
            process.exit(1);
          } 
        
          // files object contains all files names
          // log them on console
          files.forEach((file, index) => {
            fs.stat(path.join(dir, file), function(err, stat) {
              if (err) {
                console.error("Error stating file.", err);
                return;
              }
        
              if (stat.isFile())
                console.log("'%s' is a file.", file);
              else if (stat.isDirectory())
                console.log("'%s' is a directory.", file);
        
              console.log("created at %s", stat.birthtime);
            });
          });
        });
        

        This will give you the creation time of each file in the directory. You can then sort the files based on the creation time and slice the array as needed.

        Please note that this is a simple example and you might need to adjust it according to your needs. Also, keep in mind that this code will run asynchronously. If you need to run it synchronously, you can use the synchronous versions of these functions (readdirSync, statSync).

        In TypeScript, you would just need to add the appropriate types to your variables. The rest of the code would remain the same. For example, const dir: string = './your-directory'; and so on. Remember to install the Node.js types with npm install @types/node --save-dev if you haven’t already.

        I hope this helps! Let me know if you have any other questions. 😊